Explicit the urban waterlogging spatial variation and its driving factors: The stepwise cluster analysis model and hierarchical partitioning analysis approach

https://doi.org/10.1016/j.scitotenv.2020.143041Get rights and content

Highlights

  • The SCAM-HPA is applied to explicit urban waterlogging variation and its driving factors.

  • The SCAM shows high classification accuracy and generalization capability.

  • The impervious surface, vegetation coverage, and precipitation are the dominant drivers.

  • Under different scenarios, the waterlogging susceptibility has a great variation.

  • Formulate waterlogging prevention strategies according to watershed conditions.

Abstract

Urban waterlogging is a hydrological cycle problem that seriously affects people's life and property. Characterizing waterlogging variation and explicit its driving factors are conducive to prevent the damage of such disasters. Conventional methods, because of the high spatial heterogeneity and the non-stationary complex mechanism of urban waterlogging, are not able to fully capture the urban waterlogging spatial variation and identify the waterlogging susceptibility areas. A more robust method is recommended to quantify the variation trend of urban waterlogging. Previous studies have simulated the waterlogging variation in relatively small areas. However, the relationship between variables is often ignored, which cannot comprehensively reveal the dominant drivers affecting urban waterlogging. Therefore, a novel approach is proposed that combined stepwise cluster analysis model (SCAM) and hierarchical partitioning analysis (HPA) within a general framework and verifies the applicability through logistic regression, artificial neural network, and support vector machine. According to the dominant driving factors, different simulation scenarios are established to analyze waterlogging density variation. Results found that the SCAM provides accurate and detailed simulated results both in urban centers where waterlogging frequently occurs and urban fringe with few waterlogging events, which shows an excellent performance with a high classification accuracy and generalization capability. HPA detected that the impervious surface abundance (28.07%), vegetation abundance (20.80%), and cumulate precipitation (16.25%) are the dominant drivers of waterlogging. This result suggests that priority should be given to controlling these three factors to mitigate the risk of waterlogging. It is interesting to note that under different urbanization and rainfall scenarios, the urban waterlogging susceptibility has a considerable variation. The watershed spatial location and watershed characteristics are relevant aspects to be considered in identifying and assessing waterlogging susceptibility, which provides original insights that urban waterlogging mitigation strategies should be developed according to different local conditions and future scenarios.

Introduction

Under the background of global climate change and rapid urbanization, waterlogging has become a frequent disaster in Chinese cities (Yu et al., 2018; Huang et al., 2017; Xue et al., 2016). Urbanization has increased the interaction between human society and the ecological environment, which leads to a series of social-environmental problems (Huang et al., 2018; Wu et al., 2018). This conversion has changed urban hydrological conditions, thereby reducing water storage capacity and increasing surface runoff. Moreover, the surface roughness of the impervious surface is far lower than vegetation cover (i.e., grassland or forest). Thus the confluence speed of surface runoff will be significantly accelerated, which means that the confluence time of water flow is reduced considerably, as a consequence increasing the pressure of the drainage system (Shuster et al., 2005; Pijl et al., 2018; Sofia et al., 2017; Chen et al., 2015). These changes led to a direct environmental consequence - increasing the risk of urban waterlogging disasters.

According to the “Statistical Bulletin of Flood and Drought Disasters in China” of the Ministry of Water Resources, an average of 157 cities experienced urban waterlogging from 2006 to 2017 (http://www.mwr.gov.cn/). On May 7, 2017, a heavy rainstorm (over 250 mm) occurred in Guangzhou, which affected more than 30,000 people, resulting in a large-scale traffic jam (China Global Television Network). Urban waterlogging prevention has become a prominent shortcoming of China's national flood control, which has severely affected the safety of people's lives and property (Zhang, 2015). Therefore, simulating and predicting the waterlogging variation is conducive to provide useful theoretical and practical references for urban waterlogging prevention, sustainable urban development, and urban planning.

In recent years, the irreversible damage caused by urban waterlogging incidents has highlighted the necessity of implementing the waterlogging mitigation measures and management (Ahammed, 2017; Pijl et al., 2018; Sofia et al., 2014; Shao et al., 2016; Zhang et al., 2020; Zhang and Pan, 2014). In general, characterizing the urban waterlogging variation is conducive to revealing the urban waterlogging prone areas, thereby minimizing waterlogging negative effects (Wang et al., 2012; Miao et al., 2019; X. Tang et al., 2018). However, as many researchers have pointed out, urban waterlogging is influenced by the natural environment (precipitation and urban topography) and human activities (land-use change and drainage network) (Su et al., 2018; Viero et al., 2019; Wu and Zhang, 2017; Y. Zhang et al., 2018). Furthermore, the landscape heterogeneity leads to non-stationary and non-linear characteristics of urban waterlogging. Thereby, it is difficult to simulate urban waterlogging variation accurately. Generally, methods for characterizing urban waterlogging variation can be summarized into four categories: (1) the multivariate statistical methods, (2) the hydrological and hydrodynamic models, (3) the qualitative model based on expert knowledge, and (4) the machine learning models. For the first method, the multivariate statistical methods (such as stepwise regression model) are widely used to analyze the impact of various variables on waterlogging (Sofia et al., 2017; Huang et al., 2018; Huang and Shen, 2018). However, due to the tremendous landscape heterogeneity, it is difficult to utilize them to simulate the spatial variation of urban waterlogging accurately. Consequently, this method is gradually being replaced by more robust and precise methods. Concerning the second group, the hydrological and hydrodynamic models (such as SWMM, MIKE, HEC-RAS, LISFLOOD-FP) are extensively utilized to simulate the urban waterlogging process (Youssef et al., 2011; Quan, 2014; Bisht et al., 2016; Cheng et al., 2017; Li et al., 2016). The storm-water management model (SWMM), as one of the representative hydrological models, is able to analyze various hydrological processes generated by surface runoff (Kia et al., 2012; Burger et al., 2014; Babaei et al., 2018). However, the estimation of runoff from these models is usually based on the empirical estimation or the curve proposed by the Soil Conservation Service (SCS-CN), which may not be sufficient to describe specific differences in complex urban landscapes (Y. Zhang et al., 2018; Zope and Eldho, 2016). Furthermore, the artificial structures (buildings, roads) or trees will change the direction of the surface runoff, resulting in the complicated water flow movement. This undoubtedly limits the application of hydrological models in urban areas to some extent. To overcome the shortcomings, the two-dimensional hydrodynamic models based on the partial differential equations can better simulate the water flow process under different terrain conditions, which is more suitable for great spatial heterogeneity areas (Tsanis and Boyle, 2001; Paiva et al., 2011; Felder et al., 2017). However, these models rely heavily on high-precision local data (such as high-resolution DTM/DEM data and drainage network) and a large amount of computing resources, resulting only suitable for small research areas. Although these models can accurately simulate the physical process of waterlogging in a relatively small catchment, findings in small areas tend to be site-specific, and may not be useful for large-scale studies. Concerning the third category, the qualitative models such as the analytic hierarchy process (AHP) and multi-criteria decision analysis (MCDA) strongly depend on expert knowledge (Z. Tang et al., 2018; Samanta et al., 2016; Brito et al., 2019). These qualitative models use the AHP to determine factor weights or integrate explanatory factors into a multicriteria sensitivity map to simulate urban waterlogging events (Zhao et al., 2018; Chowdary et al., 2013). For example, Hong et al. (2018) used the hierarchy process and fuzzy weight evidence to construct a flood susceptibility map. However, some studies have pointed out that the methods rely on expert knowledge and judgment, which introduces uncertainty into analysis. For the fourth group, the machine learning models, such as an artificial neural network (ANN), support vector machine (SVM), decision tree (DT), and random forest (RF), have become a common method for urban waterlogging simulation, susceptibility modeling, and risk assessment (Pradhan, 2012). These methods are regarded as a black box to map the relation between input and output of training samples, which shows advantages in complex data modeling. For example, Gupta et al. (2017) identified the urban waterlogging sensitive areas and predicted the severity using an ANN, which indicated that this method could effectively and accurately predict the severity of waterlogging. In Johor River Basin, Malaysia, Kia et al. (2012) integrated the ANN and GIS for flood simulation. The study conducted in Beijing also demonstrates that the ANN is suitable for urban waterlogging risk assessment (Lai et al., 2017). For the SVM, Tang et al. (2019) applied a particle swarm optimization and an SVM in an integrated approach to evaluate the urban waterlogging susceptibility. In Terengganu, Malaysia, different kernel types of SVM models were used to assess the risk of urban flood (Tehrany et al., 2015). Furthermore, Tehrany et al. (2013) also applied a rule-based decision tree to predict the flood susceptible areas in the Kelantan River basin. The results of these studies have confirmed to some extent that the ANN and SVM are effective in simulating urban waterlogging. However, these models are sensitive to the quality of the sample data. If the value of validation samples exceeds the range of training samples, the accuracy of the model will be greatly affected, resulting in poor performance. Furthermore, in the case of very huge sample size, these models require considerable time consuming and additional modeling parameters.

Compared with the above methods, a more transparent structure and computational efficiency model is needed for urban waterlogging analysis and simulations. The stepwise cluster analysis model (SCAM) is a non-parametric statistical method based on multivariate analysis of variance, which has no statistical assumptions and can process data from different measurement scales. This method is a machine learning paradigm based on a cluster tree, which can be used for multivariate modeling (multiple x and y, i.e., supervised learning) and clustering (i.e., unsupervised learning). It has important advantages in investigating the inherent non-linear/discrete relationship between the dependent and independent variables (Fan et al., 2015; Wang et al., 2015; Sun et al., 2019; Wang et al., 2013). Compared to other methods such as ANN and SVM, it performs the more transparent structure of a cluster tree to reflect the complex relationship between the dependent and independent variables. Furthermore, it is also advantageous in the prediction process, which can predict the given independent sample sets or an individual case without given linear/non-linear functions. The studies of Fan et al. (2016), Sun et al. (2019), Zhuang et al. (2016), and Sun et al. (2018) are good examples of using the SCAM for climate prediction, streamflow forecast, and urban ecosystem variation simulation which indicated that SCAM is useful in this respect. However, few studies have applied SCAM for urban waterlogging variation assessment. Therefore, considering the spatial non-stationary and complexity of urban waterlogging, we attempt to propose the SCAM in this study to capture the spatial variation characteristics of such events. Furthermore, although the machine learning model can obtain accurate analysis results, it cannot provide the relationship between urban waterlogging and various influencing factors. This weakness ignores the relationship between variables, which makes the structure and performance of the model difficult to understand. The quantitative relationship between natural/anthropogenic factors and urban waterlogging is not fully understood. We do not know how much each input factor affects the model. To what extent do these factors contribute to urban waterlogging? Which factors have the dominant effect on waterlogging variation? Answering these questions is essential to provide theoretical references for better waterlogging management and future urban planning. Therefore, while using the SCAM to simulate urban waterlogging spatial variation, we innovatively introduce hierarchical partitioning analysis (HPA) to help us understand the mechanism of urban waterlogging. As a complementary analysis, the HPA reveals the relative contribution of each SCAM input variable, which provides insights into which factor is more critical in determining waterlogging. Furthermore, according to these dominant factors, different simulation scenarios were established to predict the response of waterlogging under the change of waterlogging dominant driving factors, which could provide practical suggestions for urban waterlogging risk identification and control.

The current research aims to explicit the urban waterlogging variation and identifies the dominant drivers using the SCAM and ensemble the statistical method of HPA. Then the proposed method will be applied to the example of a highly urbanized coastal metropolis–Guangzhou, P.R. China, selected as a useful case study where waterlogging events frequently occur. In details, the specific objectives are: (1) develop the SCAM and HPA, verify its applicability through accuracy verification indicators (overall accuracy, mean absolute percentage error, modify relative error, correlation coefficient, and Nash-Sutcliffe model efficiency index) and three comparative analysis models: logistic regression (LG), artificial neural network (ANN), and support vector machine (SVM); (2) reveal the spatial variation characteristics of urban waterlogging and clarify the dominant drivers that are deriving waterlogging variations; (3) establish urban waterlogging simulation scenarios to simulate the waterlogging spatial variation under the change of dominant factors, and identify the urban waterlogging susceptibility areas under different development scenarios. This study is expected to provide useful information for further application of the SCAM and HPA in urban waterlogging simulation and assessment as well as bring inspiration to waterlogging risk prevention.

Section snippets

Study area

We selected the central urban districts of Guangzhou (Liwan, Yuexiu, Tianhe, Haizhu, Baiyun, and Huangpu district) as the study area, with a total terrestrial area of 1166.37 km2. The Guangzhou city (112°57′–114°3′E and 22°26′–23°56′N) is the central city of the Guangdong-Hong Kong-Macao Greater Bay Metropolitan Region, which is known as one of China's three largest cities with a population of 14.90 million in 2019. This city is also one of the most economic vitality cities in mainland China.

Data sources

The data in this study included urban waterlogging records, topography (i.e., elevation, slope), precipitation, land cover composition, spatial configuration, gross domestic product (GDP), and drainage facilities. The waterlogging records were used to construct a waterlogging inventory map, while the remaining datasets were utilized as natural and anthropogenic factors of urban waterlogging. We first mapped the spatial location of waterlogging events through Google Earth according to the

SCAM simulation

In this study, the SCAM trained the calibration and validation data set at eight significance levels to determine the most appropriate cluster tree for urban waterlogging density simulation (Table 5). When the significance level was 0.01 (α = 0.01), the SCAM performed a total of 160 cutting actions and 25 merging actions, and the cluster tree generated a total of 346 total nodes and 136 leaf nodes; while the α = 0.05, the number of cutting and merging actions were significantly increased, and

Discussion

Our results found that the high-density urban waterlogging watersheds are mainly clusters in the Liwan, Yuexiu, and Haizhu district, indicating these regions are highly susceptible to urban waterlogging events (Fig. 7a). Thereby, the local authorities can pay more attention to the risk warning for these regions. Simultaneously, with the application of spatio-temporal big data (distribution of population, public institutions), we can predict and assess the urban waterlogging vulnerability to

Conclusion

This study proposed a novel approach to explicit the urban waterlogging spatial variation and determine its dominant drivers by implementing the SCAM and HPA. Specifically, the SCAM was applied to simulate the urban waterlogging variation and identified the urban waterlogging susceptibility areas under different scenarios. This study gains three conclusions: (1) the proposed SCAM can successfully capture the non-stationary and non-linear interaction between urban waterlogging and explanatory

CRediT authorship contribution statement

Qifei Zhang: Conceptualization, Methodology, Formal analysis, Data curation, Writing - original draft. Zhifeng Wu: Resources, Writing - review & editing, Supervision. Guanhua Guo: Validation, Writing - review & editing. Hui Zhang: Methodology, Validation. Paolo Tarolli: Conceptualization, Methodology, Writing - review & editing, Overall supervision, Project administration.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The study was partly funded by the University of Padova research projects (DOR1948955/19; DOR2079232/20), the NSFC-Guangdong Joint Foundation Key Project (U1901219), and the Team Project of Guangdong Provincial Natural Science Foundation (grant number 2018B030312004).

References (75)

  • M. Su et al.

    The influence of landscape pattern on the risk of urban water-logging and flood disaster

    Ecol. Indic.

    (2018)
  • J. Sun et al.

    Analyzing urban ecosystem variation in the city of Dongguan: a stepwise cluster modeling approach

    Environ. Res.

    (2018)
  • X. Tang et al.

    A spatial assessment of urban waterlogging risk based on a weighted Naïve Bayes classifier

    Sci. Total Environ.

    (2018)
  • X. Tang et al.

    Urban waterlogging susceptibility assessment based on a PSO-SVM method using a novel repeatedly random sampling idea to select negative samples

    J. Hydrol.

    (2019)
  • M.S. Tehrany et al.

    Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    J. Hydrol.

    (2013)
  • M.S. Tehrany et al.

    Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS

    J. Hydrol.

    (2014)
  • M.S. Tehrany et al.

    Flood susceptibility assessment using gis-based support vector machine model with different kernel types

    Catena

    (2015)
  • M.S. Tehrany et al.

    Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques

    Catena

    (2019)
  • I.K. Tsanis et al.

    A 2D hydrodynamic/pollutant transport GIS model

    Adv. Eng. Softw.

    (2001)
  • D.P. Viero et al.

    Floods, landscape modifications and population dynamics in anthropogenic coastal lowlands: the Polesine (northern Italy) case study

    Sci. Total Environ.

    (2019)
  • X. Wang et al.

    A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study

    Environmental Modelling and Software

    (2013)
  • M. Werner et al.

    Identifiability of distributed floodplain roughness values in flood extent estimation

    J. Hydrol.

    (2005)
  • S. Zhang et al.

    An urban storm-inundation simulation method based on GIS

    J. Hydrol.

    (2014)
  • Y. Zhang et al.

    Simulation and assessment of urbanization impacts on runoff metrics: insights from landuse changes

    J. Hydrol.

    (2018)
  • Q. Zhang et al.

    Identifying dominant factors of waterlogging events in metropolitan coastal cities: the case study of Guangzhou, China

    J. Environ. Manag.

    (2020)
  • G. Zhao et al.

    Mapping flood susceptibility in mountainous areas on a national scale in China

    Sci. Total Environ.

    (2018)
  • G. Zhao et al.

    Assessment of urban flood susceptibility using semi-supervised machine learning model

    Sci. Total Environ.

    (2019)
  • Zope et al.

    Impacts of land use-land cover change and urbanization on flooding: a case study of Oshiwara River basin in Mumbai, India

    Catena

    (2016)
  • F. Ahammed

    A review of water-sensitive urban design technologies and practices for sustainable stormwater management

    Sustain. Water Resour. Manag.

    (2017)
  • V. Barros et al.

    Managing the risks of extreme events and disasters to advance climate change adaptation: special report of the intergovernmental panel on climate change

    J. Clin. Endocrinol. Metab.

    (2012)
  • D.S. Bisht et al.

    Modeling urban floods and drainage using SWMM and MIKE URBAN: a case study

    Nat. Hazards

    (2016)
  • M.M. Brito et al.

    Spatially-explicit sensitivity and uncertainty analysis in a MCDA-based flood vulnerability model

    Int. J. Geogr. Inf. Sci.

    (2019)
  • T. Cheng et al.

    Flood risk zoning by using 2D hydrodynamic modeling: a case study in Jinan City

    Math. Probl. Eng.

    (2017)
  • China Global Television Network
  • V.M. Chowdary et al.

    Multi-criteria decision making approach for watershed prioritization using analytic hierarchy process technique and GIS

    Water Resour. Manag.

    (2013)
  • Y.R. Fan et al.

    A stepwise-cluster forecasting approach for monthly streamflows based on climate teleconnections

    Stoch. Env. Res. Risk A.

    (2015)
  • Y.R. Fan et al.

    Probabilistic prediction for monthly streamflow through coupling stepwise cluster analysis and quantile regression methods

    Water Resour. Manag.

    (2016)
  • Cited by (39)

    View all citing articles on Scopus
    View full text