Skip to main content

Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2014)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 227))

Included in the following conference series:

Abstract

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge (1996)

    Google Scholar 

  2. Cios, K., Moore, G.: Uniqueness of Medical Data Mining. Artificial Intelligence in Medicine 26(1–2), 1–24 (2002)

    Article  Google Scholar 

  3. Silva, Á., Cortez, P., Santos, M.F., Gomes, L., Neves, J.: Mortality assessment in intensive care units via adverse events using artificial neural networks. Artif. Intell. Med. 36(3), 223–234 (2006)

    Article  Google Scholar 

  4. Silva, Á., Cortez, P., Santos, M.F., Gomes, L., Neves, J.: Rating organ failure via adverse events using data mining in the intensive care unit. Artif. Intell. Med. 43(3), 179–193 (2008)

    Article  Google Scholar 

  5. Chiusano, G., Staglianò, A., Basso, C., Verri, A.: Unsupervised tissue segmentation from dynamic contrast-enhanced magnetic resonance imaging. Artif. Intell. Med. 61(1), 53–61 (2014)

    Article  Google Scholar 

  6. Azari, A., Janeja, V.P., Mohseni, A.: Predicting hospital length of stay (phlos): a multi-tiered data mining approach. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 17–24. IEEE (2012)

    Google Scholar 

  7. Castillo, M.G.: Modelling patient length of stay in public hospitals in Mexico. PhD thesis, University of Southampton (2012)

    Google Scholar 

  8. Clifton, C., Thuraisingham, B.: Emerging standards for data mining. Comput. Stan. Interfaces 23(3), 187–193 (2001)

    Article  Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New york (2008)

    Google Scholar 

  10. Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences 225, 1–17 (2013)

    Article  Google Scholar 

  11. Merom, D., Shohat, T., Harari, G., Oren, M., Green, M.S.: Factors associated with inappropriate hospitalization days in internal medicine wards in israel: a cross-national survey. Int. J. Qual. Health Care 10(2), 155–162 (1998)

    Article  Google Scholar 

  12. Abelha, F., Maia, P., Landeiro, N., Neves, A., Barros, H.: Determinants of outcome in patients admitted to a surgical intensive care unit. Arq. Med. 21(5–6), 135–143 (2007)

    Google Scholar 

  13. Oliveira, A., Dias, O., Mello, M., Arajo, S., Dragosavac, D., Nucci, A., Falcão, A.: Fatores associados à maior mortalidade e tempo de internação prolongado em uma unidade de terapia intensiva de adultos. Rev. Bras. de Terapia Intensiva 22(3), 250–256 (2010)

    Article  Google Scholar 

  14. Kalra, A.D., Fisher, R.S., Axelrod, P.: Decreased length of stay and cumulative hospitalized days despite increased patient admissions and readmissions in an area of urban poverty. J. Gen. Intern. Med. 25(9), 930–935 (2010)

    Article  Google Scholar 

  15. Freitas, A., Silva-Costa, T., Lopes, F., Garcia-Lema, I., Teixeira-Pinto, A., Brazdil, P., Costa-Pereira, A.: Factors influencing hospital high length of stay outliers. BMC Health Serv. Res. 12(265), 1–10 (2012)

    Google Scholar 

  16. Sheikh-Nia, S.: An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration. University of Guelph, Ontario, Canada, Thesis for Master Science Degree (2012)

    Google Scholar 

  17. Cortez, P.: Data mining with neural networks and support vector machines using the R/rminer Tool. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 572–583. Springer, Heidelberg (2010)

    Google Scholar 

  18. Brown, M., Kros, J.: Data mining and the impact of missing data. Ind. Manage. Data Syst. 103(8), 611–621 (2003)

    Article  Google Scholar 

  19. Menard, S.: Applied logistic regression analysis, vol. 106. Sage, Thousand Oaks (2002)

    Google Scholar 

  20. Witten, I.H., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Franscico (2011)

    Google Scholar 

  21. Bi, J., Bennett, K.: Regression error characteristic curves. In: Fawcett, T., Mishra, N. (eds.) Proceedings of 20th International Conference on Machine Learning (ICML). AAAI Press, Washington DC, USA (2003)

    Google Scholar 

Download references

Acknowledgments

We wish to thank the physicians that participated in this study for their valuable feedback. Also, we would like to thank the anonymous reviewers for their helpful suggestions. The work of P. Cortez has been supported by FCT – Fundação para a Ciência e Tecnologia within the Project Scope: PEst-OE/EEI/UI0319/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo Cortez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Caetano, N., Cortez, P., Laureano, R.M.S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2014. Lecture Notes in Business Information Processing, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-22348-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22348-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22347-6

  • Online ISBN: 978-3-319-22348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics