Skip to main content

The Era of Big Data: From Data-Driven Research to Data-Driven Clinical Care

  • Chapter
  • First Online:
Application of Clinical Bioinformatics

Part of the book series: Translational Bioinformatics ((TRBIO,volume 11))

Abstract

When the era of big data arrived in the early nineteen nineties, biomedical research boosted new innovations, procedures and methods aiding in clinical care and patient management. This chapter provides an introduction to the basic concepts and strategies of data-driven biomedical research and application, an area that is explained using terms such as computational biomedicine or clinical/medical bioinformatics. After a brief motivation it starts with a survey on data sources and bioanalytic technologies for high-throughput data generation, a selection of experimental study designs and their applications, procedures and recommendations on how to handle data quality and privacy, followed by a discussion on basic data warehouse concepts utilized for life science data integration, data mining and knowledge discovery. Finally, five application examples are briefly delineated, emphazising the benefit and power of computational methods and tools in this field. The author trusts that this chapter will encourage the reader to handle and interpret the huge amount of data usually generated in research projects or clinical routine to exploit mined bioinformation and medical knowledge for individualized health care.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It comprises so-called dimension tables containing data from different data sources and fact tables connecting various dimension tables.

  2. 2.

    It uses a transformation of the dimension tables to the third normal form with less data space needed, but more complex data queries.

References

  • Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’99), Philadelphia; 1999. p. 49–60.

    Google Scholar 

  • Aronica E, Baas F, Iyer A, ten Asbroek AL, Morello G, Cavallaro S. Molecular classification of amyotrophic lateral sclerosis by unsupervised clustering of gene expression in motor cortex. Neurobiol Dis. 2015;74:359–76.

    Article  CAS  PubMed  Google Scholar 

  • Assawamakin A, Prueksaaroon S, Kulawonganunchai S, Shaw PJ, Varavithya V, Ruangrajitpakorn T, Tongsima S. Biomarker selection and classification of “-omics” data using a two-step bayes classification framework. Biomed Res Int. 2013;2013:148014.

    Article  PubMed  PubMed Central  Google Scholar 

  • Baumgartner C, Graber A. Chapter 7: Data mining and knowledge discovery in metabolomics. In: Masseglia F, Poncelet P, Teisseire M, editors. Successes and new directions in data mining. Hershey, PA: Idea Group Inc; 2007. p. 141–66. ISBN 978-1-59904-639-6.

    Google Scholar 

  • Baumgartner C, Mátyás G, Steinmann B, Baumgartner D. Marfan syndrome: a diagnostic challenge caused by phenotypic and genetic heterogeneity. Methods Inf Med. 2005a;44:487–97.

    CAS  PubMed  Google Scholar 

  • Baumgartner D, Baumgartner C, Mátyás G, Steinmann B, Löffler J, Schermer E, Schweigmann U, Baldissera I, Frischhut B, Hess J, Hammerer I. Diagnostic power of aortic elastic properties in young patients with Marfan syndrome. J Thorac Cardiovasc Surg. 2005b;129:730–9.

    Article  PubMed  Google Scholar 

  • Baumgartner C, Mátyás G, Steinmann B, Eberle M, Stein JI, Baumgartner D. A bioinformatics framework for genotype-phenotype correlation in humans with Marfan syndrome caused by FBN1 gene mutations. J Biomed Inform. 2006;39:171–83.

    Article  CAS  PubMed  Google Scholar 

  • Baumgartner C, Rejtar T, Kullolli M, Akella LM, Karger BL. SeMoP: a new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data. J Proteome Res. 2008;7:4199–208.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Baumgartner C, Lewis GD, Netzer M, Pfeifer B, Gerszten RE. A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury. Bioinformatics. 2010;26(14):1745–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Baumgartner C, Osl M, Netzer M, Baumgartner D. Bioinformatic-driven search for metabolic biomarkers in disease. J Clin Bioinform. 2011;1:2.

    Article  CAS  Google Scholar 

  • Bodenreider O. Ontologies and data integration in biomedicine: success stories and challenging issues. In: Bairoch A, Cohen-Boulakia S, Froidevaux C, editors. Data Integration in the life sciences, Lecture notes in computer science, vol. 5109. Berlin/Heidelberg: Springer; 2008. p. 1–4.

    Chapter  Google Scholar 

  • Breit M, Baumgartner C, Weinberger KM. Chapter 9: data handling and analysis in metabolomics. In: Current applications of chemometrics. New York: Nova Sciences Publisher; 2015a. p. 181–203. ISBN: 978-1-63463-117-4.

    Google Scholar 

  • Breit M, Netzer M, Weinberger KM, Baumgartner C. Modeling and classification of kinetic patterns of dynamic metabolic biomarkers in physical activity. PLoS Comput Biol. 2015b; 11(8): e1004454.

    Google Scholar 

  • Brewis IA, Brennan P. Proteomics technologies for the global identification and quantification of proteins. Adv Protein Chem Struct Biol. 2010;80:1–44.

    Article  CAS  PubMed  Google Scholar 

  • Calì A, Calvanese D, De Giacomo G, Lenzerini M. Accessing data integration systems through conceptual schemas, conceptual modeling – ER 2001, Lecture notes in computer science, vol. 2224. Berlin/Heidelberg: Springer; 2001. p. 270–84.

    Google Scholar 

  • Calì A, Calvanese D, De Giacomo G, Lenzerini M. On the expressive power of data integration systems. In: Spaccapietra S, March S, Kambayashi Y, editors. Conceptual modeling – ER 2002, Lecture notes in computer science, vol. 2503. Berlin/Heidelberg: Springer; 2003. p. 338–50.

    Chapter  Google Scholar 

  • Cerqueira F, Graber A, Schwikowski B, Baumgartner C. MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification. J Proteome Res. 2010;9(5):2265–77.

    Article  CAS  PubMed  Google Scholar 

  • Chang PL. Clinical bioinformatics. Chang Gung Med J. 2005;28(4):201–11.

    PubMed  Google Scholar 

  • Chen G, Pramanik BN. Application of LC/MS to proteomics studies: current status and future prospects. Drug Discov Today. 2009;14(9-10):465–71.

    Article  CAS  PubMed  Google Scholar 

  • Cheung K, Smith A, Yip K, Baker C, Gerstein M. Semantic web approach to database integration in the life sciences. In: Baker CJO, Cheung K-H, editors. Semantic web. New York: Springer; 2007. p. 11–30.

    Chapter  Google Scholar 

  • Coveney P, Diaz V, Hunter P, Viceconti M. Computational biomedicine: modelling the human body. Oxford: Oxford University Press; 2014.

    Google Scholar 

  • Dander A, Baldauf M, Sperk M, Pabinger S, Hiltpolt B, Trajanoski Z. Personalized oncology suite: integrating next-generation sequencing data and whole-slide bioimages. BMC Bioinf. 2014;15:306.

    Article  Google Scholar 

  • Dawson B, Trapp RG. Basic & clinical biostatistics (LANGE basic science). 4th ed. New York: Lange Medical Books/McGraw-Hill; 2004.

    Google Scholar 

  • Dua S, Chowriappa P. Data mining for bioinformatics. Boca Raton: CRC Press; 2012.

    Book  Google Scholar 

  • Edelstein AD, Tsuchida MA, Amodaj N, Pinkard H, Vale RD, Stuurman N. Advanced methods of microscope control using μManager software. J Biol Methods. 2014;1(2):e10.

    Article  PubMed  PubMed Central  Google Scholar 

  • Elger BS, Iavindrasana J, Lo Iacono L, Müller H, Roduit N, Summers P, Wright J. Strategies for health data exchange for secondary, cross-institutional clinical research. Comput Methods Programs Biomed. 2010;99(3):230–51.

    Article  PubMed  Google Scholar 

  • Eliceiri KW, Berthold MR, Goldberg IG, Ibáñez L, Manjunath BS, Martone ME, Murphy RF, Peng H, Plant AL, Roysam B, Stuurman N, Swedlow JR, Tomancak P, Carpenter AE. Biological imaging software tools. Nat Methods. 2012;9(7):697–710.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference. on Knowledge Discovery and Data Mining (KDD’96). Menlo Park: AAAI Press; 1996. p. 226–231.

    Google Scholar 

  • Fang W, Chang X, Su X, Jian Xu, Zhang D, Ning K. A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data. In: IEEE 6th International Conference on Systems Biology (ISB), Xiang, China, 2012a; p. 106–112.

    Google Scholar 

  • Fang X, Netzer M, Baumgartner C, Bai C, Wang XD. Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer. Cancer Treat Rev. 2012b;2013(39):77–88.

    Google Scholar 

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P. Advances in knowledge discovery and data mining, chapter: from data mining to knowledge discovery: an overview. Menlo Park: AAAI Press. 1996a. p. 1–30.

    Google Scholar 

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: towards a unifying framework. In: Simoudis E, Han JW, Fayyad UM (Hrsg.), editors. Proceedings of 2nd international conference on knowledge discovery and data mining, Portland, Oregon, AAAI Press; 1996b. p. 82–88.

    Google Scholar 

  • FDA. Guidance for industry bioanalytical method validation. 2013. http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm368107.pdf

  • Fielding AH. Cluster and classification techniques for the biosciences. Cambridge: Cambridge University Press; 2006.

    Book  Google Scholar 

  • Galhardas H, Rahm E. Data integration in the life sciences, Lecture notes in bioinformatics, vol. 8574. Berlin: Springer; 2014.

    Google Scholar 

  • García S, Luengo J, Herrera F. Data preprocessing in data, mining, intelligent systems, Lecture notes in bioinformatics, vol. 72. Berlin: Springer; 2015.

    Google Scholar 

  • Grethe JS, Ross E, Little D, Sanders B, Gupta A, Astakhov V. Mediator infrastructure for information integration and semantic data integration environment for biomedical research. Methods Mol Biol. 2009;569:33–53.

    Article  CAS  PubMed  Google Scholar 

  • Hernandez T, Kambhampati S. Integration of biological sources: current systems and challenges ahead. SIGMOD Rec. 2004;33(3):51–60.

    Article  Google Scholar 

  • Holzinger A, Jurisica I. Interactive knowledge discovery and data mining in biomedical informatics, Lecture notes in computer science, vol. 8401. Berlin, Heidelberg: Springer; 2014.

    Google Scholar 

  • Hu H, Correll M, Kvecher L, Osmond M, Clark J, Bekhash A, Schwab G, Gao D, Gao J, Kubatin V, Shriver CD, Hooke JA, Maxwell LG, Kovatich AJ, Sheldon JG, Liebman MN, Mural RJ. DW4TR: a data warehouse for translational research. J Biomed Inform. 2011;44(6):1004–19.

    Article  PubMed  Google Scholar 

  • Kaever A, Landesfeind M, Feussner K, Morgenstern B, Feussner I, Meinicke P. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets. PLoS ONE. 2014;9(2):e89297.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kei-Hoi C, Robert F, Scott M, Matthias S, Jun Z, Adrian P. A journey to semantic web query federation in the life sciences. BMC Bioinf. 2009;10 Suppl 10:S10.

    Article  Google Scholar 

  • Kienast R, Baumgartner C. Chapter 3: data integration on biomedical data using semantic web technologies In: Mahdavi MA, editors. Bioinformatics/Book 1, ISBN 978-953-307-282-1. Rijeka: InTech Open Access Publisher; 2011; p. 57–82.

    Google Scholar 

  • Kotsiantis SB, Kanellopoulos D, Pintelas PE. Data preprocessing for supervised leaning. Int J Elect Comput Eng. 2006;1:2.

    Google Scholar 

  • Kugler K, Tejada M, Baumgartner C, Tilg B, Graber A, Pfeifer B. Bridging data management and knowledge discovery in the life sciences. Open Bioinform J. 2008;2:28–36.

    Article  CAS  Google Scholar 

  • Lewis GD, Wei R, Liu E, Yang E, Shi X, Martinovic M, Farrell L, Asnani A, Cyrille M, Ramanathan A, Shaham O, Berriz G, Lowry PA, Palacios I, Tasan M, Roth FP, Min J, Baumgartner C, Keshishian H, Addona T, Mootha VK, Rosenzweig A, Carr SA, Fifer MA, Sabatine MS, Gerszten RE. Metabolite profiling of blood from individuals undergoing planned myocardial infarction reveals early markers of myocardial injury. J Clin Invest. 2008;118:3503–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lyne M, Smith RN, Lyne R, Aleksic J, Hu F, Kalderimis A, Stepan R, Micklem G. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research. Database (Oxford). 2013;2013:bat060.

    Google Scholar 

  • Mak A, Cheung MW, Fu EH, Ho RC. Meta-analysis in medicine: an introduction. Int J Rheum Dis. 2010;13(2):101–4.

    Google Scholar 

  • Mikla VI, Mikla VV. Medical Imaging Technology, 1st ed. Waltham: Elsevier; 2013. ISBN: 9780124170216.

    Google Scholar 

  • Millonig G, Praun S, Netzer M, Baumgartner C, Mueller S, Villinger J, Vogel W. Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion-molecule reaction-mass spectrometry approach: a pilot study. Biomarkers. 2010;15(4):297–306.

    Article  CAS  PubMed  Google Scholar 

  • Mitchell TM. Machine learning. Boston: McGraw-Hill; 1997.

    Google Scholar 

  • Naz S, Vallejo M, García A, Barbas C. Method validation strategies involved in non-targeted metabolomics. J Chromatogr A. 2014;1353:99–105.

    Article  CAS  PubMed  Google Scholar 

  • Netzer M, Millonig G, Osl M, Pfeifer B, Praun S, Villinger J, Vogel W, Baumgartner C. A new ensemble based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry. Bioinformatics. 2009;25(7):941–7.

    Article  CAS  PubMed  Google Scholar 

  • Netzer M, Handler M, Pfeifer B, Dander A, Baumgartner C. Knowledge discovery in proteomic mass spectrometry data. In: Tran QN, Arabnia H, editors. Emerging trends in computational biology, bioinformatics, and systems biology – algorithms and software tools. Waltham: Elsevier/MK; 2015. ISBN 9780128025086.

    Google Scholar 

  • Neubauer T, Riedl B. Improving patients privacy with pseudonymization. In: Proceedings of the international congress of the European Federation for medical informatics. Studies in health technology and informatics, vol 136. Amsterdam: IOS Press; 2008. ISBN: 978-1-58603-864-9.

    Google Scholar 

  • Osl M, Dreiseitl S, Pfeifer B, Weinberger K, Klocker H, Bartsch G, Schäfer G, Tilg B, Graber A, Baumgartner C. A new rule-based data mining algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry. Bioinformatics. 2008;24:2908–14.

    Article  CAS  PubMed  Google Scholar 

  • Pardalos PM, Boginski VL, Vazacopoulos A. Data mining in biomedicine. Berlin: Springer; 2007.

    Book  Google Scholar 

  • Parmanto B, Scotch M, Ahmad S. A framework for designing a healthcare outcome data warehouse. Perspect Health Inf Manag, 2005;2:3.

    Google Scholar 

  • Pasquier C. Biological data integration using semantic web technologies. Biochimie. 2008;90(4):584–94.

    Article  CAS  PubMed  Google Scholar 

  • Porta M. A dictionary of epidemiology. 5th ed. Oxford: Oxford University Press; 2014.

    Google Scholar 

  • Putri SP, Yamamoto S, Tsugawa H, Fukusaki E. Current metabolomics: technological advances. J Biosci Bioeng. 2013;116(1):9–16.

    Article  CAS  PubMed  Google Scholar 

  • Shadbolt N, Hall W, Berners-Lee T. The semantic web revisited. IEEE Intell Syst App. 2006;21(3):96–101.

    Article  Google Scholar 

  • Sjöström M, Ossola R, Breslin T, Rinner O, Malmström L, Schmidt A, Aebersold R, Malmström J, Niméus E. A combined shotgun and targeted mass spectrometry strategy for breast cancer biomarker discovery. J Proteome Res. 2015;14(7):2807–18.

    Article  PubMed  Google Scholar 

  • Smith BS, Webb A. Introduction to medical imaging: physics, engineering and clinical applications (Cambridge texts in biomedical engineering). Cambridge: Cambridge University Press; 2010. ISBN 978-0521190657.

    Book  Google Scholar 

  • Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Mol Syst Biol. 2013;9:640.

    Article  PubMed  PubMed Central  Google Scholar 

  • Spanos DE, Stavrou P, Mitrou N. Bringing relational databases into the semantic web: a survey. J Sem Web. 2012;3(2):169–209.

    Google Scholar 

  • Swan AL, Mobasheri A, Allaway D, Liddell S, Bacardit J. Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. OMICS. 2013;17(12):595–610.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Swan AL, Stekel DJ, Hodgman C, Allaway D, Alqahtani MH, Mobasheri A, Bacardit J. A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genomics. 2015;16(Suppl 1):S2.

    Google Scholar 

  • Ting SL, Shum CC, Kwok SK, Tsang AHC, Lee WB. Data mining in biomedicine: current applications and further directions for research. J Softw Eng Appl. 2009;2:150–9.

    Article  Google Scholar 

  • Töpel T, Kormeier B, Klassen A, Hofestädt R. BioDWH: a data warehouse kit for life science data integration. J Integr Bioinform. 2008;5(2):93.

    Google Scholar 

  • Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 2012;40(9):3785–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang XD, Liotta L. Clinical bioinformatics: a new emerging science. J Clin Bioinform. 2011;1(1):1.

    Article  Google Scholar 

  • Wang J, Yang J, Mao S, Chai X, Hu Y, et al. MitProNet: a knowledgebase and analysis platform of proteome, interactome and diseases for mammalian mitochondria. PLoS ONE. 2014;9(10):e111187.

    Article  PubMed  PubMed Central  Google Scholar 

  • Woods AG, Sokolowska I, Ngounou Wetie AG, Wormwood K, Aslebagh R, Patel S, Darie CC. Mass spectrometry for proteomics-based investigation. Adv Exp Med Biol. 2014;806:1–32.

    Article  CAS  PubMed  Google Scholar 

  • Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. Curr Protoc Hum Genet. 2013;79:Unit 9.24.

    Google Scholar 

  • Xu R, Wunsch 2nd DC. Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng. 2010;3:120–54.

    Article  PubMed  Google Scholar 

  • Ye Xiao-rong. Analysis on network clustering algorithm of data mining methods based on rough set theory. 2011 fourth international symposium on Knowledge Acquisition and Modeling (KAM), Sanya, 8–9 October. 2011; p. 296–298. ISBN: 978-1-4577-1788-8.

    Google Scholar 

  • Zhang A, Sun H, Yan G, Wang P, Wang X. Mass spectrometry-based metabolomics: applications to biomarker and metabolic pathway research. Biomed Chromatogr. 2016;30(1):7-12.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Baumgartner Ph.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Baumgartner, C. (2016). The Era of Big Data: From Data-Driven Research to Data-Driven Clinical Care. In: Wang, X., Baumgartner, C., Shields, D., Deng, HW., Beckmann, J. (eds) Application of Clinical Bioinformatics. Translational Bioinformatics, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7543-4_1

Download citation

Publish with us

Policies and ethics