Skip to main content

A Holistic Approach to Testing Biomedical Hypotheses and Analysis of Biomedical Data

  • Conference paper
  • First Online:
Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery (BDAS 2015, BDAS 2016)

Abstract

Testing biomedical hypotheses is performed based on advanced and usually many-step analysis of biomedical data. This requires sophisticated analytical methods and data structures that allow to store intermediate results, which are needed in the subsequent steps. However, biomedical data, especially reference data, often change in time and new analytical methods are created every year. This causes the necessity to repeat the iterative analyses with new methods and new reference data sets, which in turn causes frequent changes of the underlying data structures. Such instability of data structures can be mitigated by the use of the idea of data lake, instead of traditional database systems.

The aim of this paper is to show system for researchers dealing with various types of biomedical data. Such a system provides a functionality of data analysis and testing different biomedical hypotheses. We treat a problem in a holistic way giving a researcher freedom in configuration his own multi-step analysis. This is possible by using a multiversion dynamic-schema data warehouse, performing parallel calculations on the virtualized computational environment, and delivering data in MapReduce-based ETL processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arfaoui, N., Akaichi, J.: Automating schema integration technique case study: generating data warehouse schema from data mart schemas. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. CCIS, vol. 521, pp. 200–209. Springer, Heidelberg (2015). http://dx.doi.org/10.1007/978-3-319-18422-7_18

    Google Scholar 

  2. DePristo, M., Banks, E., Poplin, R., Garimella, K., Maguire, J., Hartl, C., Philippakis, A., del Angel, G., Rivas, M., Hanna, M., McKenna, A., Fennell, T., Kernytsky, A., Sivachenko, A., Cibulskis, K., Gabriel, S., Altshuler, D., Daly, M.: A framework for variation discovery and genotyping using next-generation dna sequencing data. Nature Genet. 43, 491–498 (2011)

    Article  Google Scholar 

  3. Govindarajan, R., Duraiyan, J., Kaliyappan, K., Palanisamy, M.: Microarray and its applications. J. Pharm. Bioallied Sci. 4(Suppl 2), S310–S312 (2012)

    Google Scholar 

  4. Gullapalli, R., Desai, K., Santana-Santos, L., Kant, J., Becich, M.: Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J. Pathol. Inform. 3, 40 (2012)

    Article  Google Scholar 

  5. Inmon, W., Linstedt, D.: Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault. 1st edn. Morgan Kaufmann, Waltham, MA, USA (2014)

    Google Scholar 

  6. Jaksik, R., Bensz, W., Smieja, J.: Nucleotide composition based measurement bias in high throughput gene expression studies. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man–Machine Interactions 4. AISC, vol. 391, pp. 205–214. Springer, Heidelberg (2016)

    Google Scholar 

  7. Jaksik, R., Iwanaszko, M., Rzeszowska-Wolny, J., Kimmel, M.: Microarray experiments and factors which affect their reliability. Biology Direct 10, 1–14 (2015). http://dx.doi.org/10.1186/s13062-015-0077-2

    Article  Google Scholar 

  8. Kimball, R., Reeves, L., Margy, R., Thornthwaite, W.: The Data Warehouse. Lifecycle Toolkit. 3rd edn. John Wiley & Sons, Indianapolis, IN, USA (2013)

    Google Scholar 

  9. Lee, T., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D., Tenenbaum, J., Karp, P.: Biowarehouse: a bioinformatics database warehouse toolkit. BMC Bioinform. 7(170), 1–14 (2006)

    Google Scholar 

  10. Małysiak-Mrozek, B., Mrozek, D., Kozielski, S.: Processing of crisp and fuzzy measures in the fuzzy data warehouse for global natural resources. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part III. LNCS, vol. 6098, pp. 616–625. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. PP, 1–11 (2015). http://dx.doi.org/10.1109/TCBB.2015.2453944

    Google Scholar 

  12. Mazurek, M.: Applying NoSQL databases for operationalizing clinical data miningmodels. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures, and Structures: 10th InternationalConference, BDAS 2014, Ustron, Poland, May 27-30, 2014. Proceedings, Communications in Computer and Information Science, vol. 424, pp.527–536. Springer International Publishing (2014). http://dx.doi.org/10.1007/978-3-319-06932-6_51

  13. Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inform. Sci. (2016). http://dx.doi.org/10.1016/j.ins.2016.02.029

  14. Official web page of Apache Spark: accessed on dec 10, 2015. http://spark.apache.org/

  15. Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M., Zschocke, J., Trajanoski, Z.: A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 15, 256–278 (2014)

    Article  Google Scholar 

  16. Ponniah, P.: Data Warehousing Fundamentals. A Comprehensive Guide for IT Professionals. John Wiley & Sons, Hoboken, New Jersey, USA (2001)

    Google Scholar 

  17. Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W., Smyth, G.: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015). http://dx.doi.org/10.1093/nar/gkv007

    Article  Google Scholar 

  18. Shah, S., Huang, Y., Xu, T., Yuen, M., Ling, J., Ouellette, B.: Atlas - a data warehouse for integrative bioinformatics. BMC Bioinform. 6(34), 1–16 (2005)

    Google Scholar 

  19. Shyr, D., Liu, Q.: Next generation sequencing in cancer research and clinical application. Biol. Proced. Online 15(1), 4 (2013)

    Article  Google Scholar 

  20. Student, S., Danch-Wierzchowska, M., Gorczewski, K., Borys, D.: Automatic segmentation system of emission tomography data based on classification system. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part I. LNCS, vol. 9043, pp. 274–281. Springer, Heidelberg (2015)

    Google Scholar 

  21. Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(33), 1–20 (2012)

    Google Scholar 

  22. Topel, T., Kormeier, B., Klassen, A., Hofestädt, R.: Biodwh: A data warehouse kit for life science data integration. J. Integr. Bioinform. 5(2), 1–9 (2008)

    Google Scholar 

  23. Ulahannan, D., Kovac, M., Mulholland, P., Cazier, J.B., Tomlinson, I.: Technical and implementation issues in using next-generation sequencing of cancers in clinical practice. Br. J. Cancer 109, 827–835 (2013)

    Article  Google Scholar 

  24. Wycislik, L., Augustyn, D.R., Mrozek, D., Pluciennik, E., Zghidi, H., Brzeski, R.: E–LT concept in a light of new features of Oracle Data Integrator 12c based on data migration within a Hospital Information System. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures: 11th International Conference, BDAS2015, Ustroń, Poland, May 26-29, 2015, Proceedings, Communications in Computer and Information Science, vol. 521, pp. 190–199. Springer International Publishing (2015). http://dx.doi.org/10.1007/978-3-319-18422-7_17

Download references

Acknowledgments

This work was supported by The National Centre for Research and Development grant No PBS3/B3/32/2015. Presented system was developed and installed on the infrastructure of the Ziemowit computer cluster (www.ziemowit.hpc.polsl.pl) in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Psiuk-Maksymowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Psiuk-Maksymowicz, K. et al. (2016). A Holistic Approach to Testing Biomedical Hypotheses and Analysis of Biomedical Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics