Skip to main content
Log in

Integrating and analyzing medical and environmental data using ETL and Business Intelligence tools

  • Original Paper
  • Published:
International Journal of Biometeorology Aims and scope Submit manuscript

Abstract

Processing data that originates from different sources (such as environmental and medical data) can prove to be a difficult task, due to the heterogeneity of variables, storage systems, and file formats that can be used. Moreover, once the amount of data reaches a certain threshold, conventional mining methods (based on spreadsheets or statistical software) become cumbersome or even impossible to apply. Data Extract, Transform, and Load (ETL) solutions provide a framework to normalize and integrate heterogeneous data into a local data store. Additionally, the application of Online Analytical Processing (OLAP), a set of Business Intelligence (BI) methodologies and practices for multidimensional data analysis, can be an invaluable tool for its examination and mining. In this article, we describe a solution based on an ETL + OLAP tandem used for the on-the-fly analysis of tens of millions of individual medical, meteorological, and air quality observations from 16 provinces in Spain provided by 20 different national and regional entities in a diverse array for file types and formats, with the intention of evaluating the effect of several environmental variables on human health in future studies. Our work shows how a sizable amount of data, spread across a wide range of file formats and structures, and originating from a number of different sources belonging to various business domains, can be integrated in a single system that researchers can use for global data analysis and mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Astriani W, Trisminingsih R (2016) Extraction, transformation, and loading (ETL) module for hotspot spatial data warehouse using Geokettle. Procedia Environ Sci 33:626–634

    Article  Google Scholar 

  • Baklanov A, Hänninen O, Slørdal LH, Kukkonen J, Bjergene N, Fay B, Finardi S, Hoe SC, Jantunen M, Karppinen A, Rasmussen A, Skouloudis A, Sokhi RS, Sørensen JH, Ødegaard V (2007) Integrated systems for forecasting urban meteorology, air pollution and population exposure. Atmos Chem Phys 7:855–874

    Article  CAS  Google Scholar 

  • Berndt DJ, Fisher JW, Hevner AR, Studnicki J (2001) Healthcare data warehousing and quality assurance. Computer 34(12):56–65. https://doi.org/10.1109/2.970578

    Article  Google Scholar 

  • Bivand RS, Pebesma E, Gómez-Rubio V (2013) Spatial data import and export. Applied Spatial Data Analysis with R. Volume 10 of the series Use R. pp 83–125. doi: https://doi.org/10.1007/978-1-4614-7618-4_4

  • Castellanos MG, Dayal U, Simitsis A, Wilkinson WK. Quality-driven ETL design optimization 2014. https://www.google.com/patents/US8719769

  • Curtis L, Rea W, Smith-Willis P, Fenyves E, Pan Y (2006) Adverse health effects of outdoor air pollutants. Environ Int 32(6):815–830

    Article  CAS  Google Scholar 

  • Duque-Méndez ND, Orozco-Alzate M, Vélez JJ (2014) Hydro-meteorological data analysis using OLAP techniques. Dyna rev.fac.nac.minas. 81 (185). https://doi.org/10.15446/dyna.v81n185.37700

  • Fdez-Arroyabe P, Roye D (2017). Co-creation and participatory design of big data infrastructures on the field of human health related climate services. In: C. Bhatt et al. (eds.), Internet of Things and Big Data Technologies for Next Generation Healthcare, Studies in Big Data 23. Doi: https://doi.org/10.1007/978-3-319-49736-5_9

  • International Classification of Diseases, Ninth Revision (ICD-9). Centers for Disease Control and Prevention. Retrieved March 10, 2016, from http://www.cdc.gov/nchs/icd/icd9.htm

  • JPivot - a JSP-based OLAP client. JPivot website. Retrieved March 15, 2016, from http://jpivot.sourceforge.net/

  • Kim L, Kim J, Kim S (2014) A guide for the utilization of Health Insurance Review and Assessment Service national patient samples. Epidemiol Health 36:e2014008. https://doi.org/10.4178/epih/e2014008

    Article  Google Scholar 

  • Kistemann T, Dangendorf F, Schweikart J (2002) New perspectives on the use of geographical information systems (GIS) in environmental health sciences. Int J Hyg Environ Health 205(3):169–181. https://doi.org/10.1078/1438-4639-00145

    Article  Google Scholar 

  • Lenzerini M. (2002) Data integration: a theoretical perspective. PODS ‘02 Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. doi:https://doi.org/10.1145/543613.543644

  • Mannucci PM, Harari S, Martinelli I, Franchini M (2015) Effects on health of air pollution: a narrative review. Intern Emerg Med 10(6):657–662. https://doi.org/10.1007/s11739-015-1276-7

    Article  Google Scholar 

  • MySQL 5.5 Reference Manual. MySQL AB. Retrieved March 15, 2016, from https://dev.mysql.com/doc/refman/5.5/en/

  • Pecoraro F, Luzi D, Designing RFL (2015) ETL tools to feed a data warehouse based on electronic healthcare record infrastructure. Stud Health Technol Inform 210:929–933

    Google Scholar 

  • Pentaho Data Integration. Pentaho Website. Retrieved March 15, 2016, from http://www.pentaho.com/product/data-integration

  • Pentaho Mondrian. Pentaho Website. Retrieved March 15, 2016, from http://community.pentaho.com/projects/mondrian/

  • Rausch P, Sheta A, Ayesh A (2013) Business Intelligence and performance management: theory, systems, and industrial applications, Springer Verlag U.K., ISBN 978-1-4471-4865-4

  • Richards M, Ghanem M, Osmond M, Guo Y, Hassard J. Grid-based analysis of air pollution data. Ecological Modelling. 2006. Volume 194, Issues 1–3, 25: 274–286. https://doi.org/10.1016/j.ecolmodel.2005.10.042

  • Santurtún A, González-Hidalgo JC, Sanchez-Lorenzo A, Zarrabeitia MT (2015) Surface ozone concentration trends and its relationship with weather types in Spain (2001–2010). Atmos Environ 101:10–22. https://doi.org/10.1016/j.atmosenv.2014.11.005

    Article  CAS  Google Scholar 

  • Santurtún A, Sanchez-Lorenzo A, Villar A, Riancho JA, Zarrabeitia MT (2017) The influence of nitrogen dioxide on arrhythmias in Spain and its relationship with atmospheric circulation. Cardiovasc Toxicol 17(1):88–96. https://doi.org/10.1007/s12012-016-9359-x

  • Thomsen E (1997) OLAP solutions: building multidimensional information systems, 2nd edition. John Wiley & Sons. isbn:978-0-471-14931-6

  • Whitehorn M, Zare R (2005) Pasumansky M. Fast track to MDX, Springer London. isbn:1-84628-174-1

  • Whitehorn M, Zare R, Pasumansky M (2006) Fast Track to MDX. Springer-Verlag, London. https://doi.org/10.1007/1-84628-182-2

  • Zaiane OR, Xin M, Han J (1998) Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. Research and Technology Advances in Digital Libraries, ADL 98. Proceedings IEEE International Forum on DOI: https://doi.org/10.1109/ADL.1998.670376

  • Zuiderwijk A, Janssen M (2014) Open data policies, their implementation and impact: a framework for comparison

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Santurtún.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Villar, A., Zarrabeitia, M.T., Fdez-Arroyabe, P. et al. Integrating and analyzing medical and environmental data using ETL and Business Intelligence tools. Int J Biometeorol 62, 1085–1095 (2018). https://doi.org/10.1007/s00484-018-1511-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00484-018-1511-9

Keywords

Navigation