Abstract
Coupled with the rise of data science and machine learning, the increasing availability of digitized health and wellness data has provided an exciting opportunity for complex analyses of problems throughout the healthcare domain. Whereas many early works focused on a particular aspect of patient care, often drawing on data from a specific clinical or administrative source, it has become clear such a single-source approach is insufficient to capture the complexity of the human condition. Instead, adequately modeling health and wellness problems requires the ability to draw upon data spanning multiple facets of an individual’s biology, their care, and the social aspects of their life. Although such an awareness has greatly expanded the breadth of health and wellness data collected, the diverse array of data sources and intended uses often leave researchers and practitioners with a scattered and fragmented view of any particular patient. As a result, there exists a clear need to catalogue and organize the range of healthcare data available for analysis. This work represents an effort at developing such an organization, presenting a patient-centric framework deemed the Healthcare Data Spectrum (HDS). Comprised of six layers, the HDS begins with the innermost micro-level omics and macro-level demographic data that directly characterize a patient, and extends at its outermost to aggregate population-level data derived from attributes of care for each individual patient. For each level of the HDS, this manuscript will examine the specific types of constituent data, provide examples of how the data aid in a broad set of research problems, and identify the primary terminology and standards used to describe the data.
Similar content being viewed by others
References
AbouZahr C, Boerma T (2005) Health information systems: the foundations of public health. Bull World Health Organ 83(8):578–583
Adashi EY, Geiger HJ, Fine MD (2010) Health care reform and primary care—the growing importance of the community health center. England J Med 362 (22):2047–2050
Aiken LH, Clarke SP, Sloane DM (2002) Hospital staffing, organization, and quality of care: cross-national findings. Nurs Outlook 50(5):187–194
Alderwick H, Ham C, Buck D (2015) Population health systems. Going beyond integrated care. The King’s Fund
Appelboom G, Yang AH, Christophe BR, Bruce EM, Slomian J, Bruyère O., Bruce SS, Zacharia BE, Reginster JY, Connolly ES (2014) The promise of wearable activity sensors to define patient recovery. J Clin Neurosci 21(7):1089–1093
Ashley EA (2016) Towards precision medicine. Nat Rev Genet 17(9):507
Association AM Genetic testing. http://www.ama-assn.org/ama/pub/physician-resources/medical-science/genetics-molecular-medicine/related-policy-topics/genetic-testing.page. Accessed 31 May (2016)
Association AM (2007) Current procedural terminology: CPT. American Medical Association
Association AP et al. (2013) Diagnostic and Statistical Manual of Mental Disorders (DSM-5). American Psychiatric Pub
Association CE Guiding principles on the privacy and security of personal wellness data. Online (2015). https://fpf.org/wp-content/uploads/2015/10/CEA-Guiding-Principles-on-the-Privacy-and-Security-of-Personal-Wellness-Data-102215.pdf. Accessed 31 May (2016)
Barro AR (1973) Survey and evaluation of approaches to physician performance measurement. Acad Med 48(11):1047–93
Berger S (2008) Fundamentals of health care financial management: a practical guide to fiscal issues and activities. Wiley
Berwick DM, Nolan TW, Whittington J (2008) The triple aim: care, health, and cost. Health Aff 27(3):759–769
Bibb SCG (2007) Issues associated with secondary analysis of population health data. Appl Nurs Res 20(2):94–99
Bloomrosen M, Detmer DE (2010) Informatics, evidence-based care, and research; implications for national policy: a report of an american medical informatics association health policy conference. J Am Med Inform Assoc 17(2):115–123
Bradley P, Kaplan J (2010) Turning hospital data into dollars: healthcare financial executives can use predictive analytics to enhance their ability to capture charges and identify underpayments. Healthc Financ Manage 64(2):64–69
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al (2001) Minimum information about a microarray experiment (miame)—toward standards for microarray data. Nat Genet 29(4):365–371
Brennan N, Conway PH, Tavenner M (2014) The medicare physician-data release—context and rationale. England J Med 371(2):99–101
Brown ML, Riley GF, Potosky AL, Etzioni RD (1999) Obtaining long-term disease specific costs of care: application to medicare enrollees diagnosed with colorectal cancer. Med Care 37(12):1249–1259
Bureau UC Census product catalog (2012). http://www.census.gov/mp/www/cat/index.html. Accessed 31 May (2016)
Bureau UC Census bureau linkage infrastructure (cbli) (2016). https://www.census.gov/about/adrm/data-linkage/what.html. Accessed 31 May (2016)
Carroll R, Cnossen R, Schnell M, Simons D (2007) Continua: an interoperable personal healthcare ecosystem. Pervas. Comput. IEEE 6(4):90–94
Castle AL, Fiehn O, Kaddurah-Daouk R, Lindon JC (2006) Metabolomics standards workshop and the development of international standards for reporting metabolomics experimental results. Brief Bioinform 7(2):159–165
Centers for Medicare & Medicaid Services (1996) The Health Insurance Portability and Accountability Act of 1996 (HIPAA). Online at http://www.cms.hhs.gov/hipaa/
Chen R, Snyder M (2013) Promise of personalized omics to precision medicine. Wiley Interdiscip Rev Syst Biol Med 5(1):73–82
Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, Sansone SA, Stoeckert CJ, Taylor CF, Taylor R et al (2011) Data standards for omics data: The basis of data sharing and reuse. Bioinf. Omics Data: Methods Protocols, 31–69
Consortium GO et al. (2004) The gene ontology (go) database and informatics resource. Nucl Acids Res 32(suppl 1):D258–D261
Crimmins EM (1993) Demography: the past 30 years, the present, and the future. Demography 30(4):579–591
Crimmins EM, Seeman T (2001) Integrating biology into demographic research on health and aging (with a focus on the macarthur study of successful aging). In: Cells and surveys: should biological measures be included in social science research? National Academies Press (US)
Delaney C, Moorhead S (1995) The nursing minimum data set, standarized language, and health care quality. J Nurs Care Q 10(1):16–30
Demiris G, Afrin LB, Speedie S, Courtney KL, Sondhi M, Vimarlund V, Lovis C, Goossen W, Lynch C (2008) Patient-centered applications: use of information technology to promote disease management and wellness. A white paper by the amia knowledge in motion working group. J Am Med Inform Assoc 15 (1):8–13
Dettmer K, Hammock BD (2004) Metabolomics—a new exciting field within the “omics” sciences. Environ Health Perspect 112(7):A396
Centers for Disease Control and Prevention (2014) Classification of diseases, functioning, and disability. International classification of diseases, tenth revision, clinical modification (ICD-10-CM) CDC web site
Dolin RH, Alschuler L, Boyer S, Beebe C, Behlen FM, Biron PV, Shabo A (2006) Hl7 clinical document architecture, release 2. J Am Med Inform Assoc 13(1):30–39
Draper M, Cohen P, Buchan H (2001) Seeking consumer views: what use are results of hospital patient satisfaction surveys? Int J Qual Health Care 13 (6):463–468
Dwyer S.J. III, Weaver AC, Hughes KK (2004) Health insurance portability and accountability act. Secur Issues Digit Med Enterp 72(2):9–18
Eisenberg JM (2000) Quality research for quality healthcare: the data connection. Health services research 35(2) xii
Evans WE, Relling MV (1999) Pharmacogenomics: translating functional genomics into rational therapeutics. Science 286(5439):487–491
Fahy E, Subramaniam S, Murphy RC, Nishijima M, Raetz CR, Shimizu T, Spener F, van Meer G, Wakelam MJ, Dennis EA (2009) Update of the lipid maps comprehensive classification system for lipids. J Lipid Res 50(Supplement):S9–S14
Feldman B, Martin EM, Skotnes T (2012) Big data in healthcare hype and hope. October 2012. Dr Bonnie, 360
Feldman K, Chawla NV (2015) Does medical school training relate to practice? Evidence from big data. Big Data 3(2):103–113
Feldman K, Faust L, Wu X, Huang C, Chawla NV (2017) Beyond volume: the impact of complex healthcare data on the machine learning pipeline. In: Towards Integrative machine learning and knowledge extraction. Springer, pp 150–169
Fenton JJ, Jerant AF, Bertakis KD, Franks P (2012) The cost of satisfaction: a national study of patient satisfaction, health care utilization, expenditures, and mortality. Arch Intern Med 172(5):405–411
Fiehn O, Robertson D, Griffin J, van der Werf M, Nikolau B, Morrison N, Sumner LW, Goodacre R, Hardy NW, Taylor C et al (2007) The metabolomics standards initiative (msi). Metabolomics 3(3):175–178
Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV et al (2008) The minimum information about a genome sequence (migs) specification. Nat Biotechnol 26(5):541–547
Fisher ES, Baron JA, Malenka DJ, Barrett J, Bubolz TA (1990) Overcoming potential pitfalls in the use of medicare data for epidemiologic research. Am J Public Health 80(12):1487–1490
Food U, Administration D et al. National drug code directory. Internet address: http://www.fda.gov/cder/ndc/ (2011)
Gee J, Button M, Brooks G (2010) The financial cost of healthcare fraud: what data from around the world shows. Tech. rep., MacIntyre Hudson
Ginsburg GS, Willard HF (2009) Genomic and personalized medicine: foundations and applications. Transl Res 154(6):277–287
Gray KA, Yates B, Seal RL, Wright MW, Bruford EA (2014) Genenames. org: the hgnc resources in 2015. Nucleic acids research p gku1071
Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M (2001) Interrelating different types of genomic data, from proteome to secretome:’oming in on function. Genome Res 11(9):1463–1468
Greenfield S, Nelson EC, Zubkoff M, Manning W, Rogers W, Kravitz RL, Keller A, Tarlov AR, Ware JE (1992) Variations in resource utilization among medical specialties and systems of care: results from the medical outcomes study. Jama 267(12):1624–1630
Hall LM, Doran D, Pink GH (2004) Nurse staffing models, nursing hours, and patient safety outcomes. J Nurs Admin 34(1):41–45
Hattie JA, Myers JE, Sweeney TJ (2004) A factor structure of wellness: theory, assessment, analysis, and practice. J Counsel Develop 82(3):354–364
Hays RD, Spritzer KL, Thompson WW, Cella D (2015) Us general population estimate for ”excellent” to ”poor” self-rated health item. J Gen Intern Med 30(10):1511–1516
of Health UD, Services H et al. (1980) ICD 9 CM. The International Classification of Diseases. 9. Rev: Clinical Modification.; Vol. 1: Diseases: Tabular List. ; Vol. 2: Diseases: Alphabetic Index; Vol. 3: Procedures: Tabular List and Alphabetic Index. US Government Printing Office
of Health UD, Services H et al. (2011) Us department of health and human services implementation guidance on data collection standards for race, ethnicity, sex, primary language and disability status
on Accreditation of Healthcare Organizations JC (1991) Accreditation manual for hospitals, vol. 1 Joint Commission on Accreditation of Healthcare Organizations
Hettler B (1984) Wellness: encouraging a lifetime pursuit of excellence. Health Values 8(4):13
Hibbard JH, Stockard J, Tusler M (2005) Hospital performance reports: impact on quality, market share, and reputation. Health Aff 24(4):1150–1160
Horaitis O, Cotton RG (2004) The challenge of documenting mutation across the genome: the human genome variation society approach. Human Mutation 23 (5):447–452
Horgan RP, Kenny LC (2011) ’omic’technologies: genomics, transcriptomics, proteomics and metabolomics. Obstetr Gynaecol 13(3):189–195
Huang YT (2014) Integrative modeling of multiple genomic data from different types of genetic association studies. Biostatistics 15(4):587–602
Jacob SG, Ramani RG (2012) Data mining in clinical data sets: a review. IJAIS-ISSN: 2249-0868 Foundation of Computer Science FCS, New York USA 4(6)
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Gen 13(6):395
Kailas A, Chong CC, Watanabe F (2010) From mobile phones to personal wellness dashboards. Pulse, IEEE 1(1):57–63
Kaplan GA, Pamuk ER, Lynch JW, Cohen RD, Balfour JL (1996) Inequality in income and mortality in the united states: analysis of mortality and potential pathways. Bmj 312(7037):999–1003
Kaplan WA (2006) Can the ubiquitous power of mobile phones be used to improve health outcomes in developing countries? Global Health 2(1):1
Kass-Hout TA, Alhinnawi H (2013) Social media in public health. British Med Bull 108(1):5–24
Kayyali B, Knott D, Van Kuiken S (2013) The big-data revolution in us health care: accelerating value and innovation. Mc Kinsey & Company, pp 1–13
Kindig D, Stoddart G (2003) What is population health? Am J Public Health 93(3):380–383
Landon BE, Normand SLT, Blumenthal D, Daley J (2003) Physician clinical performance assessment: prospects and barriers. Jama 290(9):1183–1189
Lave JR, Pashos CL, Anderson G, Brailer D, Bubolz T, Conrad D, Freund DA, Fox SH, Keeler E, Lipscomb J et al (1994) Costing medical care: using medicare administrative data. Medical care, 32(7) JS77
Lemieux-Charles L, McGuire WL (2006) What do we know about health care team effectiveness? a review of the literature. Med Care Res Rev 63(3):263–300
Lenfant C (2003) Clinical research to clinical practice—lost in translation? England J Med 349(9):868–874
Li J, Huang KY, Jin J, Shi J (2008) A survey on statistical methods for health care fraud detection. Health Care Manag Sci 11(3):275–287
Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ (2006) Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367(9524):1747–1757
Marconi K, Lehmann H (2014) Big data and health analytics. CRC Press
McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R, Forrey A, Mercer K, DeMoor G, Hook J et al (2003) Loinc, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 49 (4):624–633
McGrath MJ, Scanaill CN (2013) Wellness, fitness, and lifestyle sensing applications. In: Sensor technologies. Springer, pp 217–248
McGraw-Hill Concise dictionary of modern medicine. Online (2002). Accessed 31 May (2016)
McLean R (2002) Financial management in health care organizations. Cengage Learning
for Medicare & Medicaid Services C Hospital consumer assessment of healthcare providers and systems. Online. http://www.hcahpsonline.org/home.aspx. Accessed 31 May (2016)
for Medicare & Medicaid Services C ICD-9-CM, ICD-10-CM, ICD-10-PCS, CPT, and HCPCS code sets. Online (2015). Accessed 31 May 2016. ICN: 900943
for Medicare & Medicaid Services C et al. (2003) Healthcare Common Procedure Coding System (HCPCS) Centers for Medicare & Medicaid Services
Centers for Medicare & Medicaid Services H et al. (2004) Hipaa administrative simplification: standard unique health identifier for health care providers. Final rule. Fed Register 69(15):3433
Meltzer D (1997) Accounting for future costs in medical cost-effectiveness analysis. J Health Econ 16(1):33–64
Mildenberger P, Eichelberg M, Martin E (2002) Introduction to the dicom standard. Europ Radiol 12(4):920–927
Müller M., Kersten S (2003) Nutrigenomics: goals and strategies. Nat Rev Gen 4(4):315–322
Murdoch TB, Detsky AS (2013) The inevitable application of big data to health care. Jama 309(13):1351–1352
Nelson CW, Niederberger J (1990) Patient satisfaction surveys: an opportunity for total quality improvement. Hosp Health Serv Admin 35(3):409–428
Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3(7):1374–1376
Organization WH et al. International classification of diseases (ICD) (2012)
Organization WH et al. Global reference list of 100 core health indicators (2015)
Ostherr K, Borodina S, Bracken RC, Lotterman C, Storer E, Williams B (2017) Trust and privacy in the context of user-generated health data. Big Data Soc 4(1):2053951717704,673
Pol LG, Thomas RK (2000) The demography of health and health care. Springer Science & Business Media
Pol L G, Thomas RK (2013) Health demography: an evolving discipline. In: The demography of health and healthcare. Springer, pp 1–12
Poulton BC, West MA (1999) The determinants of effectiveness in primary health care teams. J Interprof Care 13(1):7–18
Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1):3
Registrar F (1997) Revisions to the standards for the classification of federal data on race and ethnicity. Fed Registr 62:58,781–58,790
Retchin SM, Ballard D (1998) Commentary: establishing standards for the utility of administrative claims data. Health Serv Res 32(6):861
Richard L, Gauvin L, Raine K (2011) Ecological models revisited: their uses and evolution in health promotion over two decades. Ann Rev Public Health 32:307–326
Riley GF (2009) Administrative and claims records as sources of health care cost data. Med Care 47(7_Supplement_1):S51–S55
Rosenbloom ST Person-generated health and wellness data for health care (2016)
Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer D E et al. (2007) Toward a national framework for the secondary use of health data: an american medical informatics association white paper. J Am Med Inform Assoc 14(1):1–9
Schiller JS, Adams PF, Nelson ZC (2005) Summary health statistics for the us population: national health interview survey, 2003. Vital and health statistics. Series 10. Data Nat Health Surv 2005(224):1–104
Schneider MV, Orchard S (2011) Omics technologies, data and bioinformatics principles. Bioinforma Omics Data: Methods Protocols, 3–30
Shameer K, Badgeley MA, Miotto R, Glicksberg BS, Morgan JW, Dudley JT (2016) Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Briefings in bioinformatics p bbv118
Shapiro M, Johnston D, Wald J, Mon D (2012) Patient-generated health data. RTI International
Shekelle PG, Ortiz E, Rhodes S, Morton SC, Eccles MP, Grimshaw JM, Woolf SH (2001) Validity of the agency for healthcare research and quality clinical practice guidelines: how quickly do guidelines become outdated? Jama 286 (12):1461–1467
Shryock HS, Siegel JS, Larmon EA (1973) The methods and materials of demography. US Bureau of the Census
Siegel JS (2011) The demography and epidemiology of human health and aging. Springer Science & Business Media
Skelly AC, Dettori JR, Brodt ED (2012) Assessing bias: the importance of considering confounding. Evidence-based Spine-care J 3(1):9
Smith HL (2003) Some thoughts on causation as it relates to demography and population studies. Popul Dev Rev 29(3):459–469
Stanhope M, Lancaster J (2015) Public health nursing: population-centered health care in the community. Elsevier Health Sciences
Stoto MA (2013) Population health in the Affordable Care Act era, vol 1. AcademyHealth, Washington, DC
Suetens P (2009) Fundamentals of medical imaging. Cambridge University Press
Taber KAJ, Dickinson BD, Wilson M (2014) The promise and challenges of next-generation genome sequencing for clinical care. JAMA Int Med 174(2):275–280
Taylor CF (2007) Standards for reporting bioscience data: a forward look. Drug Discov Today 12(13):527–533
Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK, Jones AR, Zhu W, Apweiler R, Aebersold R, Deutsch EW et al (2007) The minimum information about a proteomics experiment (miape). Nature Biotechnol 25(8):887–893
Tebani A, Afonso C, Marret S, Bekri S (2016) Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations. Int J Molec Sci 17(9):1555
Van Ommen B, Stierum R (2002) Nutrigenomics: exploiting systems biology in the nutrition and health arena. Curr Opin Biotechnol 13(5):517–521
Veeramah KR, Hammer MF (2014) The impact of whole-genome sequencing on the reconstruction of human population history. Nat Rev Gen 15(3):149–162
Ware JE, Snyder MK, Wright WR, Davies AR (1983) Defining and measuring patient satisfaction with medical care. Eval Program Plan 6(3):247–263
Wenk MR (2005) The emerging field of lipidomics. Nat Rev Drug Discov 4 (7):594–610
West M, Ginsburg GS, Huang AT, Nevins JR (2006) Embracing the complexity of genomic data for personalized medicine. Genome Res 16(5):559–566
Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P et al (2006) The mged ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22(7):866–873
Wilkinson RG, Marmot MG (2003) Social determinants of health: the solid facts. World Health Organization
Williams GH (2003) The determinants of health: structure, context and agency. Sociol Health Illness 25(3):131–154
Wood WA, Bennett AV, Basch E (2015) Emerging uses of patient generated health data in clinical research. Molec Oncol 9(5):1018–1024
Wu PY, Cheng CW, Kaddi CD, Venugopalan J, Hoffman R, Wang MD (2017) –omic and electronic health record big data analytics for precision medicine. IEEE Trans Biomed Eng 64(2):263–273
Wunsch G et al. (2007) Confounding and control. Demograph Res 16(4):97–120
Yumak Z, Pu P (2013) Survey of sensor-based personal wellness management systems. BioNanoScience 3(3):254–269
Funding
This work is supported in part by the National Science Foundation (NSF) Grant IIS-1447795.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Feldman, K., Johnson, R.A. & Chawla, N.V. The State of Data in Healthcare: Path Towards Standardization. J Healthc Inform Res 2, 248–271 (2018). https://doi.org/10.1007/s41666-018-0019-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-018-0019-8