Skip to main content

Advertisement

Log in

Finding optimal strategies for river quality assessment using machine learning and deep learning models

  • Original Article
  • Published:
Modeling Earth Systems and Environment Aims and scope Submit manuscript

Abstract

The accurate evaluation of river quality assessment is essential for human health, ecosystem functionality, economic growth, and future population growth. In most cases, river quality assessment practices use the Water Quality Index (WQI) to assess WQI values of the river and multivariate statistics for analyzing multiple chemical and physical variables within the river. However, due to huge data collection, difficulties in data handling, complicated and uncertain physical, chemical, and biological on water quality parameter values, need to a different approach to classify the river quality. Therefore, this study offers different techniques and comparative studies in finding optimal strategies for river quality assessment using two major Artificial Intelligence (AI) algorithms which are Machine learning (ML) and Deep Learning (DL). Before finding the optimal strategies, this study proposes different preprocessing techniques combined with the dimensional reductions to find optimal model fit with less feature imbalance. The ML algorithms include both unsupervised learning and supervised learning. The unsupervised learnings are Hierarchical Clustering (HC), and K-Means (KM) whereas ten supervised learnings are K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Lasso Regression (LAR), Ridge Regression (RR), Linear Discriminant Analysis (LDA), Naïve-Bayes (NB), Decision Tree (DT), and K-Means (KM). This study also includes two DL models which are Deep Learning Neural Network (DLNN) and Multi-Layer Perceptron (MLP). Besides, this paper also offers different ways of tuning processes to improve the algorithms’ accuracies. Results show that HC able to divide polluted area into five different levels of water pollutions and KM suggest the optimal number of clusters. Whereas ten different supervised learning with two DL methods lists all the accurate and efficient results for the classification of river quality assessment. Thus, different techniques and models offers an alternative, able to handle huge data and different types of parameters to retrieve the accurate river quality assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

(Source: wonderfulmalaysia.com)

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated and/or analysed during the current study are not publicly available due to research data being confidential and belonging to the Department of Environment Malaysia but are available from the corresponding author on reasonable request.

References

  • Azrour M, Mabrouki J, Fattah G, Guezzaz A, Aziz A (2022) Machine learning algorithms for efficient water quality prediction. Model Earth Syst Environ 8:2793–2801

    Article  Google Scholar 

  • Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612

    Article  Google Scholar 

  • Chen B, Mu X, Chen P, Wang B, Choi J, Park H, Xu S, Wu Y, Yang H (2021) Machine learning-based inversion of water quality parameters in typical reach of the urban river by UAV multispectral data. Ecol Ind 133:108434

    Article  Google Scholar 

  • Chen K, Liu Q, Peng W, Liu X (2022) Source apportionment and natural background levels of major ions in shallow groundwater using multivariate statistical method: A case study in Huaibei Plain, China. J Environ Manage 301:113806

    Article  Google Scholar 

  • Chollet F (2015) Keras GitHub. Available at https://github.com/fchollet/keras

  • Chowdhury K, Akter A (2021) Water quality trend analysis in a citywide water distribution system. Water Sci Technol 84(10–11):3191–3210

    Article  Google Scholar 

  • Cui Y, Yan Z, Wang J, Hao S, Liu Y (2022) Deep learning–based remote sensing estimation of water transparency in shallow lakes by combining Landsat 8 and Sentinel 2 images. Environ Sci Pollut Res 29:4401–4413

    Article  Google Scholar 

  • de Oliveira TF, de Sousa Brandao IL, Mannaerts CM, Hauser-Davis RA, de Oliveira AAF, Saraiva ACF, de Oliveira MA, Ishihara JH (2020) Using hydrodynamic and water quality variables to assess eutrophication on a tropical hydroelectric reservoir. J Environ Manag 256:109932

    Article  Google Scholar 

  • Dehghani R, Poudeh HT, Izadi Z (2022) Dissolved oxygen concentration predictions for running waters with using hybrid machine learning techniques. Model Earth Syst Environ 8:2599–2613

    Article  Google Scholar 

  • Forghani M, Qian Y, Lee J, Farthing MW, Hesser T, Kitanidis PK, Darve EF (2021) Application of deep learning to large scale riverine flow velocity estimation. Stoch Env Res Risk Assess 35:1069–1088

    Article  Google Scholar 

  • Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar Pollut Bull 64(11):2409–2420

    Article  Google Scholar 

  • Ha QK, Ngoc TDT, Vo PL, Nguyen HQ, Dang DH (2022) Groundwater in Southern Vietnam: Understanding geochemical processes to better preserve the critical water resource. Sci Total Environ 807:151345

    Article  Google Scholar 

  • Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95

    Article  Google Scholar 

  • Icke O, van Es DM, de Koning MF, Wuister JJG, Ng J, Phua KM, Koh YKK, Chan WJ, Tao G (2020) Performance improvement of wastewater treatment processes by application of machine learning. Water Sci Technol 82(12):2671–2680

    Article  Google Scholar 

  • Ighalo JO, Adeniyi AG, Marques G (2021) Artifcial intelligence for surface water quality monitoring and assessment: a systematic literature analysis. Model Earth Syst Environ 7:669–681

    Article  Google Scholar 

  • Javan K, Lialestani MRFH, Nejadhossein M (2015) A comparison of ANN and HSPF models for runoff simulation in Gharehsoo River watershed, Iran. Model Earth Syst Environ 1:41

    Article  Google Scholar 

  • Jiang W, Pokharel B, Lin L, Cao H, Carroll KC, Zhang Y, Galdeano C, Musale DA, Ghurye GL, Xu P (2021) Analysis and prediction of produced water quality and quality in the Permian Basin using machine learning techniques. Sci Total Environ 801:149693

    Article  Google Scholar 

  • Kamaruddin AF, Toriman ME, Juahir H, Zain SM, Rahman MNA, Kamaruddin MKA, Azid A (2015) Spatial characterization and identification sources of pollution using multivariate analysis at Terengganu River Basin, Malaysia. Jurnal Teknologi 77(1):269–273

    Article  Google Scholar 

  • Khullar S, Singh N (2022) Water quality assessment of a river using deep learning Bi-LSTM methodology: forecasting and validation. Environ Sci Pollut Res 29:12875–12889

    Article  Google Scholar 

  • Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: Proceedings of the 13th Python in Science Conference (SCIPY 2014) 33–39

  • Kumar D, Roshni T, Singh A, Jha MK, Samui P (2020) Predicting groundwater depth fluctuations using deep learning, extreme learning machine and Gaussian process: a comparative study. Earth Sci Inf 13:1237–1250

    Article  Google Scholar 

  • Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction,PLoS Computational Biology1–19

  • Okon AN, Adewole SE, Uguma EM (2021) Artifcial neural network model for reservoir petrophysical properties: porosity, permeability and water saturation prediction. Model Earth Syst Environ 7:2373–2390

    Article  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Qadir M, Wichelns D, Raschid-Sally L, Minhas PS, Drechsel P, Bahri A, McCornich P (2007) Agricultural use of marginal-quality water opportunities and challenges. IWMI Part 4:225–226

    Google Scholar 

  • Rozos E (2019) Machine learning, urban water resources management and operating policy. Resources 8(4):173

    Article  Google Scholar 

  • Seidu J, Ewusi A, Kuma JSY, Ziggah YY, Voigt H-J (2021) A hybrid groundwater level prediction model using signal decomposition and optimised extreme learning machine. Modeling Earth Systems and Environment

  • Sharma N, Zakaullah M, Tiwari H, Kumar D (2015) Runoff and sediment yield modeling using ANN and support vector machines: a case study from Nepal watershed. Model Earth Syst Environ 1:23

    Article  Google Scholar 

  • Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Prediction of groundwater quality using efficient machine learning technique. Chemosphere 276:130265

    Article  Google Scholar 

  • Stoica C, Camejo J, Banciu A, Nita-Lazar M, Paun I, Cristofor S, Pacheco OR, Guevara M (2016) Water quality of Danube Delta systems: ecological status and prediction using machine-learning algorithms. Water Sci Technol 73(10):2413–2421

    Article  Google Scholar 

  • Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Rodríguez-Sotelo JL, Jiménez-Varón CF (2020) A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Comput Sci 6(e270):1–22

    Google Scholar 

  • Terengganu River Map (2021) wonderfulmalaysia.com. Retrieved by 24th December 2021

  • Than NH, Ly CD, Tata PV (2021) The performance of classification and forecasting Dong Nai River water quality of sustainable water resources management using neural network techniques. J Hydrol 596:126099

    Article  Google Scholar 

  • Tiyasha, Tung TM, Yaseen ZM (2021) Deep Learning for Prediction of Water Quality Index Classification: Tropical Catchment Environmental Assessment. Nat Resour Res 30:6

    Google Scholar 

  • Tousi EG, Duan JG, Gundy PM, Bright KR, Gerba CP (2021) Evaluation of E. coli in sediment for assessing irrigation water quality using machine learning. Sci Total Environ 799:149286

    Article  Google Scholar 

  • Wahab NA, Kamarudin MKA, Toriman ME, Juahir H, Saad MHM, Ata FM, Ghazali A, Hassan AR, Abdullah H, Maulud KN, Hanafiah MH, Harith H (2019) Sedimention and water quality deterioration problems at Terengganu River Basin, Terengganu, Malaysia. Desalination Water Treat 149:228–241

    Article  Google Scholar 

  • Woldemariam GW, Tibebe D, Mengesha TE, Gelete TB (2021) Machinelearning algorithms for land use dynamics in Lake Haramaya Watershed, Ethiopia. Modeling Earth Systems and Environment

  • World Health Organization (WHO) (2021) Water safety and quality. https://www.who.int/teams/environment-climate-change-andhealth/water-sanitation-and-health/water-safety-and-quality

  • Wu T, Wang S, Su B, Wu H, Wang G (2021) Understanding the water quality change of the Yilong Lake based on comprehensive assessment methods. Ecol Ind 126:107714

    Article  Google Scholar 

  • Yotava G, Varbanov M, Tcherkezova E, Tsakovski S (2021) Water quality assessment of a river catchment by the composite water quality index and self-organizing maps. Ecol Ind 120:106872

    Article  Google Scholar 

  • Zhang H, Li H, Gao D, Yu H (2022) Source identification of surface water pollution using multivariate statistics combined with physicochemical and socioeconomic parameters. Sci Total Environ 806:151274

    Article  Google Scholar 

  • Zhao E, Kuo Y-M, Chen N (2021) Assessment of water quality under various environmental features using a site-specific weighting water quality index. Sci Total Environ 783:146868

    Article  Google Scholar 

  • Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A (2018) A primer on deep learning in genomics. Nat Genet 51(1):12–18

    Article  Google Scholar 

Download references

Acknowledgements

We would like to extend the gratitude to the Department of Environment Malaysia for the permission to conduct this study. Enormous appreciation and special thanks to the Department of Environment Malaysia experts for their valuable contribution to this study. The authors also would like to thank Malaysia’s Ministry of Higher Education (MOHE) for supporting this research.

Funding

This study was funded by the Malaysian Ministry of Higher Education (FRGS-RACER: RACER/1/2019/STG06/UNISZA//).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nurnadiah Zamri.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zamri, N., Pairan, M.A., Azman, W.N.A.W. et al. Finding optimal strategies for river quality assessment using machine learning and deep learning models. Model. Earth Syst. Environ. 9, 615–629 (2023). https://doi.org/10.1007/s40808-022-01494-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40808-022-01494-4

Keywords

Navigation