Abstract
In recent years, the number of floods around the world has increased. As a result, Flood Susceptibility Maps (FSMs) became vital for flood prevention, risk mitigation, and decision-making. The purpose of this study is to develop FSMs for Adana province on the Mediterranean coast of Türkiye using tree-based machine learning (ML) classifiers. This study seeks to analyze the predictive performance of Natural Gradient Boosting Machines (NGBoost) for the first time in FSM studies, as well as the first comparative study of Light Gradient Boosting Machines (LightGBM) and CatBoost versus other techniques, including Random Forest (RF), Gradient Boosting (GB), eXtreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost). These ML approaches were evaluated using fourteen flood conditioning parameters divided into five categories: topographical, meteorological, vegetation, lithological, and anthropogenic. The AdaBoost and LightGBM models scored the highest in terms of test accuracy (0.8978), followed by GB and NGBoost (0.8832), XGBoost (0.8759), RF (0.8613), and CatBoost (0.8102). A McNemar's test was used to determine the statistical significance of classifier predictions. According to the FSMs generated, Adana province has a substantial quantity of land that is moderately to extremely prone to flooding. For feature selection, the majority of previous studies used solely the Information Gain (IG) method and multicollinearity analysis. However, only a few studies used global explanatory models to calculate the relevance of their conditioning factors. A locally explained model is required to understand the associations and dependencies between each conditioning factor. Therefore, this study locally explains the generated ML-based FSMs with the help of an eXplainable Artificial Intelligence (XAI) approach, namely SHapley Additive exPlanations (SHAP). According to the findings, elevation, slope, and distance to rivers are the top three contributing factors in most models. SHAP results show that lower elevations, lower slopes, areas closer to river banks, agricultural areas, and sparsely vegetated areas are shown to be more prone to flooding.
Similar content being viewed by others
References
Akay H (2021) Flood hazards susceptibility mapping using statistical, fuzzy logic, and MCDM methods. Soft Comput 25:9325–9346. https://doi.org/10.1007/s00500-021-05903-1
Al-Abadi AM (2018) Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study. Arab J Geosci 11:218. https://doi.org/10.1007/s12517-018-3584-5
Alganci U, Sertel E, Kaya S (2019) Determination of the flooded agricultural lands with spot 6 high resolution satellite images: A case study of Menderes plain, Turkey, 2019 8th International Conference on agro-geoinformatics (Agro-Geoinformatics), pp. 1–4 https://doi.org/10.1109/Agro-Geoinformatics.2019.8820242.
Arabameri A, Seyed Danesh A, Santosh M et al (2022) Flood susceptibility mapping using meta-heuristic algorithms. Geomat Nat Haz Risk 13:949–974. https://doi.org/10.1080/19475705.2022.2060138
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Bui DT, Tsangaratos P, Ngo P-TT et al (2019) Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ 668:1038–1054. https://doi.org/10.1016/j.scitotenv.2019.02.422
Chen W, Li Y, Xue W et al (2020) Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979. https://doi.org/10.1016/j.scitotenv.2019.134979
Chen T, Guestrin C (2016) XGBoost. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp. 785–794
Choubin B, Moradi E, Golshan M et al (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651:2087–2096. https://doi.org/10.1016/j.scitotenv.2018.10.064
Chowdhuri I, Pal SC, Chakrabortty R (2020) Flood susceptibility mapping by ensemble evidential belief function and binomial logistic regression model on river basin of eastern India. Adv Space Res 65:1466–1489. https://doi.org/10.1016/j.asr.2019.12.003
Collini E, Palesi LAI, Nesi P et al (2022) Predicting and understanding landslide events with explainable AI. IEEE Access 1:3156. https://doi.org/10.1109/ACCESS.2022.3158328
Copeland HE, Tessman SA, Girvetz EH et al (2010) A geospatial assessment on the distribution, condition, and vulnerability of Wyoming’s wetlands. Ecol Ind 10:869–879. https://doi.org/10.1016/j.ecolind.2010.01.011
Costache R (2019) Flash-flood potential assessment in the upper and middle sector of Prahova river catchment (Romania). A comparative approach between four hybrid models. Sci Total Environ 659:1115–1134. https://doi.org/10.1016/j.scitotenv.2018.12.397
Costache R, Tien Bui D (2019) Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: a case study at the Putna river catchment of Romania. Sci Total Environ 691:1098–1118. https://doi.org/10.1016/j.scitotenv.2019.07.197
Costache R, Popa MC, Tien Bui D et al (2020) Spatial predicting of flood potential areas using novel hybridizations of fuzzy decision-making, bivariate statistics, and machine learning. J Hydrol 585:124808. https://doi.org/10.1016/j.jhydrol.2020.124808
CRED (2022) 2021 Disasters in numbers. In: CRED. https://cred.be/sites/default/files/2021_EMDAT_report.pdf. Accessed 22 May 2022
De Risi R, Jalayer F, de Paola F, Lindley S (2018) Delineation of flooding risk hotspots based on digital elevation model, calculated and historical flooding extents: the case of Ouagadougou. Stoch Env Res Risk Assess 32:1545–1559. https://doi.org/10.1007/s00477-017-1450-8
Demir V, Kisi O (2016) Flood hazard mapping by using geographic information system and hydraulic model: Mert river, Samsun, Turkey. Adv Meteorol 2016:1–9. https://doi.org/10.1155/2016/4891015
Dorogush AV, Ershov V, Gulin A (2018) CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363
Duan T, Avati A, Ding DY, et al (2019) NGBoost: Natural Gradient Boosting for Probabilistic Prediction PMLR
Ekmekcioğlu Ö, Koc K, Özger M, Işık Z (2022) Exploring the additional value of class imbalance distributions on interpretable flash flood susceptibility prediction in the Black Warrior river basin, Alabama, United States. J Hydrol 610:127877. https://doi.org/10.1016/j.jhydrol.2022.127877
El-Haddad BA, Youssef AM, Pourghasemi HR et al (2021) Flood susceptibility prediction using four machine learning techniques and comparison of their performance at Wadi Qena Basin. Egypt Nat Hazards 105:83–114. https://doi.org/10.1007/s11069-020-04296-y
Farhadi H, Esmaeily A, Najafzadeh M (2022) Flood monitoring by integration of remote sensing technique and multi-criteria decision making method. Comput Geosci 160:105045. https://doi.org/10.1016/j.cageo.2022.105045
Farr TG, Rosen PA, Caro E, et al (2007) The shuttle radar topography mission. Reviews of geophysics 45:RG2004 https://doi.org/10.1029/2005RG000183
Fick SE, Hijmans RJ (2017) WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol 37:4302–4315. https://doi.org/10.1002/joc.5086
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. https://doi.org/10.1006/jcss.1997.1504
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232. https://doi.org/10.1214/aos/1013203451
Hong H, Tsangaratos P, Ilia I et al (2018) Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang county, China. Sci Total Environ 625:575–588. https://doi.org/10.1016/j.scitotenv.2017.12.256
Hosseini FS, Choubin B, Mosavi A et al (2020) Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: APPLICATION of the simulated annealing feature selection method. Sci Total Environ 711:135161. https://doi.org/10.1016/j.scitotenv.2019.135161
Hunt EB, Marin J, Stone PJ (1966) Experiments in induction. Academic Press, Oxford
Iban MC, Sekertekin A (2022) Machine learning based wildfire susceptibility mapping using remotely sensed fire data and GIS: a case study of Adana and Mersin provinces, Turkey. Ecol Inform 69:101647. https://doi.org/10.1016/j.ecoinf.2022.101647
Janizadeh S, Chandra Pal S, Saha A et al (2021) Mapping the spatial and temporal variability of flood hazard affected by climate and land-use changes in the future. J Environ Manage 298:113551. https://doi.org/10.1016/j.jenvman.2021.113551
Kalantari Z, Nickman A, Lyon SW et al (2014) A method for mapping flood hazard along roads. J Environ Manage 133:69–77. https://doi.org/10.1016/j.jenvman.2013.11.032
Kannangara KKPM, Zhou W, Ding Z, Hong Z (2022) Investigation of feature contribution to shield tunneling-induced settlement using Shapley additive explanations method. J Rock Mech Geotech Eng. https://doi.org/10.1016/j.jrmge.2022.01.002
Kavzoglu T, Teke A (2022) Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest Extreme gradient boosting (XGBoost) and Natural gradient boosting (NGBoost). Arab J Sci Eng. https://doi.org/10.1007/s13369-022-06560-8
Kavzoglu T, Teke A, Yilmaz EO (2021) Shared blocks-based ensemble deep learning for shallow landslide susceptibility mapping. Remote Sens 13:4776. https://doi.org/10.3390/rs13234776
Ke G, Meng Q, Finley T, et al (2017) LightGBM: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, pp. 3149–3157
Khosravi K, Pham BT, Chapi K et al (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, Northern Iran. Sci Total Environ 627:744–755. https://doi.org/10.1016/j.scitotenv.2018.01.266
Kim Y, Kim Y (2022) Explainable heat-related mortality with random forest and SHapley Additive exPlanations (SHAP) models. Sustain Cities Soc 79:103677. https://doi.org/10.1016/j.scs.2022.103677
Koç G, Natho S, Thieken AH (2021) Estimating direct economic impacts of severe flood events in Turkey (2015–2020). Int J Disast Risk Reduct 58:102222. https://doi.org/10.1016/j.ijdrr.2021.102222
Li X, Yan D, Wang K et al (2019) Flood risk assessment of global watersheds based on multiple machine learning models. Water Basel 11:1654. https://doi.org/10.3390/w11081654
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg U, Bengio V et al (eds) Advances in neural information processing systems. Curran Associates Inc, NY
Manfreda S, di Leo M, Sole A (2011) Detection of flood-prone areas using digital elevation models. J Hydrol Eng 16:781–790. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000367
Meliho M, Khattabi A, Driss Z, Orlando CA (2022) Spatial prediction of flood-susceptible zones in the Ourika watershed of Morocco using machine learning algorithms. Appl Comput Inform. https://doi.org/10.1108/ACI-09-2021-0264
Mirzaei S, Vafakhah M, Pradhan B, Alavi SJ (2021) Flood susceptibility assessment using extreme gradient boosting (EGB). Iran Earth Sci Inform 14:51–67. https://doi.org/10.1007/s12145-020-00530-0
Mohammadifar A, Gholami H, Comino JR, Collins AL (2021) Assessment of the interpretability of data mining for the spatial modelling of water erosion using game theory. CATENA 200:105178. https://doi.org/10.1016/j.catena.2021.105178
Mosavi A, Ozturk P, Chau K (2018) Flood prediction using machine learning models: literature review. Water Basel 10:1536. https://doi.org/10.3390/w10111536
Msabi MM, Makonyo M (2021) Flood susceptibility mapping using GIS and multi-criteria decision analysis: a case of Dodoma region, central Tanzania. Remote Sens Appl Soc Environ 21:100445. https://doi.org/10.1016/j.rsase.2020.100445
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Pham QB, Ali SA, Bielecka E et al (2022) Flood vulnerability and buildings’ flood exposure assessment in a densely urbanised city: comparative analysis of three scenarios using a neural network approach. Nat Hazards. https://doi.org/10.1007/s11069-022-05336-5
Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province. Iran Geocarto Int 31:42–70. https://doi.org/10.1080/10106049.2015.1041559
Ramesh V, Iqbal SS (2022) Urban flood susceptibility zonation mapping using evidential belief function, frequency ratio and fuzzy gamma operator models in GIS: a case study of Greater Mumbai, Maharashtra, India. Geocarto Int 37:581–606. https://doi.org/10.1080/10106049.2020.1730448
Saber M, Boulmaiz T, Guermoui M et al (2021) Examining LightGBM and CatBoost models for Wadi flash flood susceptibility prediction. Geocarto Int 1–26:1974959. https://doi.org/10.1080/10106049.2021.1974959
Sachdeva S, Kumar B (2022) Flood susceptibility mapping using extremely randomized trees for Assam 2020 floods. Eco Inform 67:101498. https://doi.org/10.1016/j.ecoinf.2021.101498
Sariturk B, Bayram B, Duran Z, Seker DZ (2020) Feature extraction from satellite images using segnet and fully convolutional networks (FCN). Int J Eng Geosci 5(3):138–143. https://doi.org/10.26833/ijeg.645426
Seckin N, Haktanir T, Yurtal R (2011) Flood frequency analysis of Turkey using L-moments method. Hydrol Process 25:3499–3505. https://doi.org/10.1002/hyp.8077
Seleem O, Ayzel G, de Souza ACT, Bronstert A, ve Heistermann, M. (2022) Towards urban flood susceptibility mapping using data-driven models in Berlin, Germany. Geomat Nat Haz Risk 13(1):1640–1662
Shafapour Tehrany M, Kumar L, Neamah Jebur M, Shabani F (2019) Evaluating the application of the statistical index method in flood susceptibility mapping and its comparison with frequency ratio and logistic regression methods. Geomat Nat Haz Risk 10:79–101. https://doi.org/10.1080/19475705.2018.1506509
Shapley LS (1953) Stochastic Games*. Proc Natl Acad Sci 39:1095–1100. https://doi.org/10.1073/pnas.39.10.1095
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45:427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Swain KC, Singha C, Nayak L (2020) Flood susceptibility mapping through the GIS-AHP technique using the cloud. ISPRS Int J Geo Inf 9:720. https://doi.org/10.3390/ijgi9120720
Tehrany MS, Jones S, Shabani F (2019) Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 175:174–192. https://doi.org/10.1016/j.catena.2018.12.011
Towfiqul Islam ARM, Talukdar S, Mahato S et al (2021) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci Front 12:101075. https://doi.org/10.1016/J.GSF.2020.09.006
Yaseen A, Lu J, Chen X (2022) Flood susceptibility mapping in an arid region of Pakistan through ensemble machine learning model. Stoch Env Res Risk Assess. https://doi.org/10.1007/s00477-022-02179-1
Zhao Y, Gao G, Ding G et al (2022) Assessing the influencing factors of soil susceptibility to wind erosion: a wind tunnel experiment with a machine learning and model-agnostic interpretation approach. Catena 215:106324. https://doi.org/10.1016/j.catena.2022.106324
Zounemat-Kermani M, Batelaan O, Fadaee M, Hinkelmann R (2021) Ensemble machine learning paradigms in hydrology: a review. J Hydrol 598:126266. https://doi.org/10.1016/j.jhydrol.2021.126266
Acknowledgements
This study is master dissertation research (Department of GIS and Remote Sensing at Mersin University) of the first author supervised by the second author. We would like to acknowledge the journal editor and anonymous reviewers for their constructive comments. Each named author has substantially contributed to conducting the underlying research and drafting this paper. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The named authors have no conflict of interest, financial or otherwise.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aydin, H.E., Iban, M.C. Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations. Nat Hazards 116, 2957–2991 (2023). https://doi.org/10.1007/s11069-022-05793-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11069-022-05793-y