Contrastive Learning-Based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling Using EHRs

Liu, Yuxi; Zhang, Zhenhao; Qin, Shaowen; Salim, Flora D.; Yepes, Antonio Jimeno

doi:10.1007/978-3-031-43427-3_26

Yuxi Liu¹³,
Zhenhao Zhang¹⁴,
Shaowen Qin¹³,
Flora D. Salim¹⁵ &
…
Antonio Jimeno Yepes¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14174))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

999 Accesses

Abstract

Predicting the risk of in-hospital mortality from electronic health records (EHRs) has received considerable attention. Such predictions will provide early warning of a patient’s health condition to healthcare professionals so that timely interventions can be taken. This prediction task is challenging since EHR data are intrinsically irregular, with not only many missing values but also varying time intervals between medical records. Existing approaches focus on exploiting the variable correlations in patient medical records to impute missing values and establishing time-decay mechanisms to deal with such irregularity. This paper presents a novel contrastive learning-based imputation-prediction network for predicting in-hospital mortality risks using EHR data. Our approach introduces graph analysis-based patient stratification modeling in the imputation process to group similar patients. This allows information of similar patients only to be used, in addition to personal contextual information, for missing value imputation. Moreover, our approach can integrate contrastive learning into the proposed network architecture to enhance patient representation learning and predictive performance on the classification task. Experiments on two real-world EHR datasets show that our approach outperforms the state-of-the-art approaches in both imputation and prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The implementation code is available at https://github.com/liulab1356/CL-ImpPreNet.
2.
https://mimic.physionet.org.
3.
https://eicu-crd.mit.edu/.

References

Cao, W., Wang, D., Li, J., Zhou, H., Li, L., Li, Y.: Brits: bidirectional recurrent imputation for time series. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 1–12 (2018)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Cui, S., Wang, J., Gui, X., Wang, T., Ma, F.: Automed: automated medical risk predictive modeling on electronic health records. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 948–953. IEEE (2022)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Groenwold, R.H.: Informative missingness in electronic health record systems: the curse of knowing. Diagn. Prognostic Res. 4(1), 1–6 (2020)
Article MathSciNet Google Scholar
Harutyunyan, H., Khachatrian, H., Kale, D.C., Ver Steeg, G., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. Sci. Data 6(1), 1–18 (2019)
Article Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Article MathSciNet Google Scholar
Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Le-Khac, P.H., Healy, G., Smeaton, A.F.: Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934 (2020)
Article Google Scholar
Lee, Y., Jun, E., Choi, J., Suk, H.I.: Multi-view integrative attention-based deep representation learning for irregular clinical time-series data. IEEE J. Biomed. Health Inform. 26(8), 4270–4280 (2022)
Article Google Scholar
Li, J., Shang, J., McAuley, J.: Uctopic: unsupervised contrastive learning for phrase representations and topic mining. arXiv preprint arXiv:2202.13469 (2022)
Li, M., Li, C.G., Guo, J.: Cluster-guided asymmetric contrastive learning for unsupervised person re-identification. IEEE Trans. Image Process. 31, 3606–3617 (2022)
Article Google Scholar
Li, R., Ma, F., Gao, J.: Integrating multimodal electronic health records for diagnosis prediction. In: AMIA Annual Symposium Proceedings, vol. 2021, p. 726. American Medical Informatics Association (2021)
Google Scholar
Luo, Y., Cai, X., Zhang, Y., Xu, J., et al.: Multivariate time series imputation with generative adversarial networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Luo, Y., Zhang, Y., Cai, X., Yuan, X.: E2GAN: end-to-end generative adversarial network for multivariate time series imputation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3094–3100. AAAI Press (2019)
Google Scholar
Ma, L., et al.: Adacare: explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 825–832 (2020)
Google Scholar
Ma, L., et al.: Distilling knowledge from publicly available online EMR data to emerging epidemic for prognosis. In: Proceedings of the Web Conference 2021, pp. 3558–3568 (2021)
Google Scholar
Ma, L., et al.: Concare: personalized clinical feature embedding via capturing the healthcare context. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 833–840 (2020)
Google Scholar
McCombe, N., et al.: Practical strategies for extreme missing data imputation in dementia diagnosis. IEEE J. Biomed. Health Inform. 26(2), 818–827 (2021)
Article Google Scholar
Mulyadi, A.W., Jun, E., Suk, H.I.: Uncertainty-aware variational-recurrent imputation network for clinical time series. IEEE Trans. Cybern. 52(9), 9684–9694 (2021)
Article Google Scholar
Ni, Q., Cao, X.: MBGAN: an improved generative adversarial network with multi-head self-attention and bidirectional RNN for time series imputation. Eng. Appl. Artif. Intell. 115, 105232 (2022)
Article Google Scholar
Oh, E., Kim, T., Ji, Y., Khyalia, S.: Sting: self-attention based time-series imputation networks using GAN. In: 2021 IEEE International Conference on Data Mining (ICDM), pp. 1264–1269. IEEE (2021)
Google Scholar
Pang, B., et al.: Unsupervised representation for semantic segmentation by implicit cycle-attention contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2044–2052 (2022)
Google Scholar
Pereira, R.C., Abreu, P.H., Rodrigues, P.P.: Partial multiple imputation with variational autoencoders: tackling not at randomness in healthcare data. IEEE J. Biomed. Health Inform. 26(8), 4218–4227 (2022)
Article Google Scholar
Pollard, T.J., Johnson, A.E., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 1–13 (2018)
Article Google Scholar
Sheikhalishahi, S., Balaraman, V., Osmani, V.: Benchmarking machine learning models on multi-centre eicu critical care dataset. PLoS ONE 15(7), e0235424 (2020)
Article Google Scholar
Shi, Z., et al.: Deep dynamic imputation of clinical time series for mortality prediction. Inf. Sci. 579, 607–622 (2021)
Article MathSciNet Google Scholar
Tan, Q., et al.: Data-GRU: dual-attention time-aware gated recurrent unit for irregular multivariate time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 930–937 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)
Google Scholar
Wang, Y., Min, Y., Chen, X., Wu, J.: Multi-view graph contrastive representation learning for drug-drug interaction prediction. In: Proceedings of the Web Conference 2021, pp. 2921–2933 (2021)
Google Scholar
Xu, D., Sheng, J.Q., Hu, P.J.H., Huang, T.S., Hsu, C.C.: A deep learning-based unsupervised method to impute missing values in patient records for improved management of cardiovascular patients. IEEE J. Biomed. Health Inform. 25(6), 2260–2272 (2020)
Article Google Scholar
Yang, C., An, Z., Cai, L., Xu, Y.: Mutual contrastive learning for visual representation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3045–3053 (2022)
Google Scholar
Yıldız, A.Y., Koç, E., Koç, A.: Multivariate time series imputation with transformers. IEEE Signal Process. Lett. 29, 2517–2521 (2022)
Article Google Scholar
Yuan, X., et al.: Multimodal contrastive training for visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6995–7004 (2021)
Google Scholar
Zang, C., Wang, F.: SCEHR: supervised contrastive learning for clinical risk prediction using electronic health records. In: Proceedings of IEEE International Conference on Data Mining, vol. 2021, pp. 857–866 (2021)
Google Scholar
Zhang, Y., Zhou, B., Cai, X., Guo, W., Ding, X., Yuan, X.: Missing value imputation in multivariate time series with end-to-end generative adversarial networks. Inf. Sci. 551, 67–82 (2021)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

This research is partially funded by the ARC Centre of Excellence for Automated Decision-Making and Society (CE200100005) by the Australian Government through the Australian Research Council.

Author information

Authors and Affiliations

College of Science and Engineering, Flinders University, Tonsley, SA, 5042, Australia
Yuxi Liu & Shaowen Qin
College of Life Sciences, Northwest A &F University, Yangling, 712100, Shaanxi, China
Zhenhao Zhang
School of Computer Science and Engineering, UNSW, Sydney, NSW, 2052, Australia
Flora D. Salim
School of Computing Technologies, RMIT University, Melbourne, VIC, 3001, Australia
Antonio Jimeno Yepes

Authors

Yuxi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaowen Qin
View author publications
You can also search for this author in PubMed Google Scholar
Flora D. Salim
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Jimeno Yepes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxi Liu .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

The experimental datasets used for this work are obtained from the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset and the eICU Collaborative Research dataset. These data were used under license. The authors declare that they have no conflicts of interest. This article does not contain any studies involving human participants performed by any of the authors.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Zhang, Z., Qin, S., Salim, F.D., Yepes, A.J. (2023). Contrastive Learning-Based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling Using EHRs. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-43427-3_26
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Contrastive Learning-Based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling Using EHRs