Abstract
Testing biomedical hypotheses is performed based on advanced and usually many-step analysis of biomedical data. This requires sophisticated analytical methods and data structures that allow to store intermediate results, which are needed in the subsequent steps. However, biomedical data, especially reference data, often change in time and new analytical methods are created every year. This causes the necessity to repeat the iterative analyses with new methods and new reference data sets, which in turn causes frequent changes of the underlying data structures. Such instability of data structures can be mitigated by the use of the idea of data lake, instead of traditional database systems.
The aim of this paper is to show system for researchers dealing with various types of biomedical data. Such a system provides a functionality of data analysis and testing different biomedical hypotheses. We treat a problem in a holistic way giving a researcher freedom in configuration his own multi-step analysis. This is possible by using a multiversion dynamic-schema data warehouse, performing parallel calculations on the virtualized computational environment, and delivering data in MapReduce-based ETL processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arfaoui, N., Akaichi, J.: Automating schema integration technique case study: generating data warehouse schema from data mart schemas. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. CCIS, vol. 521, pp. 200–209. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-18422-7_18
DePristo, M., Banks, E., Poplin, R., Garimella, K., Maguire, J., Hartl, C., Philippakis, A., del Angel, G., Rivas, M., Hanna, M., McKenna, A., Fennell, T., Kernytsky, A., Sivachenko, A., Cibulskis, K., Gabriel, S., Altshuler, D., Daly, M.: A framework for variation discovery and genotyping using next-generation dna sequencing data. Nature Genet. 43, 491–498 (2011)
Govindarajan, R., Duraiyan, J., Kaliyappan, K., Palanisamy, M.: Microarray and its applications. J. Pharm. Bioallied Sci. 4(Suppl 2), S310–S312 (2012)
Gullapalli, R., Desai, K., Santana-Santos, L., Kant, J., Becich, M.: Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J. Pathol. Inform. 3, 40 (2012)
Inmon, W., Linstedt, D.: Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault. 1st edn. Morgan Kaufmann, Waltham, MA, USA (2014)
Jaksik, R., Bensz, W., Smieja, J.: Nucleotide composition based measurement bias in high throughput gene expression studies. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man–Machine Interactions 4. AISC, vol. 391, pp. 205–214. Springer, Heidelberg (2016)
Jaksik, R., Iwanaszko, M., Rzeszowska-Wolny, J., Kimmel, M.: Microarray experiments and factors which affect their reliability. Biology Direct 10, 1–14 (2015). http://dx.doi.org/10.1186/s13062-015-0077-2
Kimball, R., Reeves, L., Margy, R., Thornthwaite, W.: The Data Warehouse. Lifecycle Toolkit. 3rd edn. John Wiley & Sons, Indianapolis, IN, USA (2013)
Lee, T., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D., Tenenbaum, J., Karp, P.: Biowarehouse: a bioinformatics database warehouse toolkit. BMC Bioinform. 7(170), 1–14 (2006)
Małysiak-Mrozek, B., Mrozek, D., Kozielski, S.: Processing of crisp and fuzzy measures in the fuzzy data warehouse for global natural resources. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part III. LNCS, vol. 6098, pp. 616–625. Springer, Heidelberg (2010)
Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. PP, 1–11 (2015). http://dx.doi.org/10.1109/TCBB.2015.2453944
Mazurek, M.: Applying NoSQL databases for operationalizing clinical data miningmodels. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures, and Structures: 10th InternationalConference, BDAS 2014, Ustron, Poland, May 27-30, 2014. Proceedings, Communications in Computer and Information Science, vol. 424, pp.527–536. Springer International Publishing (2014). http://dx.doi.org/10.1007/978-3-319-06932-6_51
Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inform. Sci. (2016). http://dx.doi.org/10.1016/j.ins.2016.02.029
Official web page of Apache Spark: accessed on dec 10, 2015. http://spark.apache.org/
Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M., Zschocke, J., Trajanoski, Z.: A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 15, 256–278 (2014)
Ponniah, P.: Data Warehousing Fundamentals. A Comprehensive Guide for IT Professionals. John Wiley & Sons, Hoboken, New Jersey, USA (2001)
Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W., Smyth, G.: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015). http://dx.doi.org/10.1093/nar/gkv007
Shah, S., Huang, Y., Xu, T., Yuen, M., Ling, J., Ouellette, B.: Atlas - a data warehouse for integrative bioinformatics. BMC Bioinform. 6(34), 1–16 (2005)
Shyr, D., Liu, Q.: Next generation sequencing in cancer research and clinical application. Biol. Proced. Online 15(1), 4 (2013)
Student, S., Danch-Wierzchowska, M., Gorczewski, K., Borys, D.: Automatic segmentation system of emission tomography data based on classification system. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part I. LNCS, vol. 9043, pp. 274–281. Springer, Heidelberg (2015)
Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(33), 1–20 (2012)
Topel, T., Kormeier, B., Klassen, A., Hofestädt, R.: Biodwh: A data warehouse kit for life science data integration. J. Integr. Bioinform. 5(2), 1–9 (2008)
Ulahannan, D., Kovac, M., Mulholland, P., Cazier, J.B., Tomlinson, I.: Technical and implementation issues in using next-generation sequencing of cancers in clinical practice. Br. J. Cancer 109, 827–835 (2013)
Wycislik, L., Augustyn, D.R., Mrozek, D., Pluciennik, E., Zghidi, H., Brzeski, R.: E–LT concept in a light of new features of Oracle Data Integrator 12c based on data migration within a Hospital Information System. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures: 11th International Conference, BDAS2015, Ustroń, Poland, May 26-29, 2015, Proceedings, Communications in Computer and Information Science, vol. 521, pp. 190–199. Springer International Publishing (2015). http://dx.doi.org/10.1007/978-3-319-18422-7_17
Acknowledgments
This work was supported by The National Centre for Research and Development grant No PBS3/B3/32/2015. Presented system was developed and installed on the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Psiuk-Maksymowicz, K. et al. (2016). A Holistic Approach to Testing Biomedical Hypotheses and Analysis of Biomedical Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)