Abstract
Imputing hydro-meteorological missing values is essential in time series modeling. Imputation of missing values was traditionally performed after an observation period, which cannot effectively support water resources management in time. Therefore, it is necessary to deal with the missing data online. Moreover, traditional imputation methods usually consider only one observation variable and generate one set of imputations, which cannot describe the imputation uncertainty. Thus, a multiple-imputation method is proposed in this paper by integrating chain equations and random forest, namely, MICE-RF, to deal with the hydro-meteorological missing values. MICE-RF first simulates multiple imputation series to obtain the optimal imputations using the evaluation results of multiple imputation series. The traditional linear imputation, mean imputation, spline imputation, and k nearest neighbor imputation are compared to illustrate the robustness, reliability, and accuracy of the MICE-RF. According to the results, the MICE-RF provides the best imputation accuracy and can be easily implemented online.
Similar content being viewed by others
Data and Code Availability
Data used in the research work have been acknowledged, and data and code are available on request.
References
Al-Juboori AM (2019) Generating monthly stream flow using nearest river data: assessing different trees models. Water Resour Manag 33:3257–3270
Ben Aissia M-A, Chebana F, Ouarda TB (2017) Multivariate missing data in hydrology – Review and applications. Adv Water Resour 110:299–309. https://doi.org/10.1016/j.advwatres.2017.10.002
Bonakdari H, Binns AD, Gharabaghi B (2020) A comparative study of linear stochastic with nonlinear daily river discharge forecast models. Water Resour Manag 34:3689–3708. https://doi.org/10.1007/s11269-020-02644-y
Breiman L (2001) Random forests machine learning 45:5–32
Chen L, Xu J, Wang G, Shen Z (2019) Comparison of the multiple imputation approaches for imputing rainfall data series and their applications to watershed models. J Hydrol 572:449–460. https://doi.org/10.1016/j.jhydrol.2019.03.025
Chiu PC, Selamat A, Krejcar O (2019) Infilling missing rainfall and runoff data for Sarawak, Malaysia using Gaussian mixture model based K-nearest neighbor imputation. In: Wotawa F, Friedrich G, Pill I, Koitz-Hristov R, Ali M (eds) Advances and trends in artificial intelligence. From theory to practice. IEA/AIE 2019. Lecture notes in computer science, vol 11606. Springer, Cham. https://doi.org/10.1007/978-3-030-22999-3_3
Coulibaly P, Evora ND (2007) Comparison of neural network methods for infilling missing daily weather records. J Hydrol 341:27–41. https://doi.org/10.1016/j.jhydrol.2007.04.020
Fallah B, Ng KTW, Vu HL, Torabi F (2020) Application of a multi-stage neural network approach for time-series landfill gas modeling with missing data imputation. Waste Manag 116:66–78
Gao Y, Merz C, Lischeid G, Schneider M (2018) A review on missing hydrological data processing. Environ Earth Sci 77. https://doi.org/10.1007/s12665-018-7228-6
He X, Luo J, Zuo G, Xie J (2019) Daily runoff forecasting using a hybrid model based on variational mode decomposition and deep neural networks. Water Resour Manag 33:1571–1590. https://doi.org/10.1007/s11269-019-2183-x
Hong T, Kim C-J, Jeong J, Kim J, Koo C, Jeong K, Lee M (2016) Framework for approaching the minimum CV(RMSE) using energy simulation and optimization tool. Energy Procedia 88:265–270. https://doi.org/10.1016/j.egypro.2016.06.157
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:2895–2907. https://doi.org/10.1016/j.atmosenv.2004.02.026
Lai WY, Kuok KK (2019) A study on bayesian principal component analysis for addressing missing rainfall data. Water Resour Manag 33:2615–2628. https://doi.org/10.1007/s11269-019-02209-8
Lian Y, Luo J, Wang J, Zuo G, Wei N (2021) Climate-driven model based on long short-term memory and bayesian optimization for multi-day-ahead daily streamflow forecasting. Water Resour Manag. https://doi.org/10.1007/s11269-021-03002-2
Little RJA, Rubin DB (2019) Statistical analysis with missing data. John Wiley & Sons
Ma J, Shou Z, Zareian A et al (2019) CDSA: cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904
Nishanth KJ, Ravi V (2013) A computational intelligence based online data imputation method: an application for banking. J Inf Process Syst 9:633–650. https://doi.org/10.3745/JIPS.2013.9.4.633
Plaia A, Bondi AL (2006) Single imputation method of missing values in environmental pollution data sets. Atmos Environ 40:7316–7330
Qing X, Niu Y (2018) Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 148:461–468. https://doi.org/10.1016/j.energy.2018.01.177
Ramirez MCV, Campos Velho HF, de, Ferreira NJ, (2005) Artificial neural network technique for rainfall forecasting applied to the Sao Paulo region. J Hydrol 301:146–162
Ravi V, Krishna M (2014) A new online data imputation method based on general regression auto associative neural network. Neurocomputing 138:106–113. https://doi.org/10.1016/j.neucom.2014.02.037
Roth PL (1994) Missing data: A conceptual review for applied psychologists. Pers Psychol 47:537–560
Royston P, White IR (2011) Multiple imputation by chained equations (MICE): implementation in Stata. J Stat Softw 45:1–20
Rubin DB, Schenker N (1991) Multiple imputation in health-care databases: an overview and some applications. Stat Med 10:585–598. https://doi.org/10.1002/sim.4780100410
Ruggles TH, Farnham DJ, Tong D, Caldeira K (2020) Developing reliable hourly electricity demand data through screening and imputation. Scientific Data 7:1–14
Sharma V (2021) Imputing missing data in hydrology using machine learning models. IJERT V10. https://doi.org/10.17577/IJERTV10IS010011
Smith BL, Scherer WT, Conklin JH (2003) Exploring imputation techniques for missing data in transportation management systems. Transp Res Rec 1836:132–142
Tannenbaum CE (2009) The empirical nature and statistical treatment of missing data. University of Pennsylvania
van Buuren S (2018) Flexible imputation of missing data. CRC Press
van Buuren S, Boshuizen HC, Knook DL (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18:681–694
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30:377–399. https://doi.org/10.1002/sim.4067
Zhang Y, Zhou B, Cai X, Guo W, Ding X, Yuan X (2021) Missing value imputation in multivariate time series with end-to-end generative adversarial networks. Inf Sci 551:67–82. https://doi.org/10.1016/j.ins.2020.11.035
Acknowledgements
I sincerely appreciate the data provided by the China Meteorological Data Service Center.
Funding
This work was supported by National Natural Science Foundation of China (Grant No. 51679186), Natural Science Basic Research Program of Shaanxi Province (Grant No. 2019JLZ-15, 2019JLZ-16, 2017JQ5076), and Special Scientific Research Program of Shaanxi Provincial Education Department (Grant No. 17JK0558). The authors thank the editor for their comments and suggestions.
Author information
Authors and Affiliations
Contributions
Conceptualization: Xin Jing; Methodology: Xin Jing and Zuo GG; Writing-original draft preparation: Xin Jing and Wang JM; Writing-review and editing: Luo JG, Wei N, and Zuo GG; Funding acquisition: Luo JG.
Corresponding author
Ethics declarations
Ethical Approval and Consent to Participate
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to Publish
All authors have consented to publish this manuscript.
Competing Interests
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Jing, X., Luo, J., Wang, J. et al. A Multi-imputation Method to Deal With Hydro-Meteorological Missing Values by Integrating Chain Equations and Random Forest. Water Resour Manage 36, 1159–1173 (2022). https://doi.org/10.1007/s11269-021-03037-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-021-03037-5