1 Introduction

Floods in different parts of the world seem to be a major problem as storms and floods killed more than a million people between 1980 and 2012 [1, 2]. Between June and July 2019, several parts of the world are experiencing record-breaking floods, making it a global catastrophe. A major portion of the world’s populations resides in the flood plains and they are indirectly or directly dependent on the flood plains, thereby human encroachment, modification of river and degradation of the ecosystem has become unavoidable results [2]. Along with this, floods are possibly the most frequent, devastating and widespread event, responsible for the huge loss of life and economy [3]. Hwang et al. [4] have found that due to global warming, over the past 25 or 30 years of twentieth-century witnessing numerous unprecedented floods globally which have increases the damage rate in a spectacular way [5]. The Centre for Research on the Epidemiology of Disasters reports that 2156 number of floods have occurred in the previous 30 years, which has resulted in the deaths of 206, 303 lives, loss of 386 billion US dollars and has affected nearly 2.6 billion people [6]. In the case of tropical rivers of South Asia, floods are a frequent event [3, 7]. In the case of India, there is no exemption in these circumstances [8]. The Central Water Commission of India reported that every year 32 million population are suffered from floods due to nearly 7.21 million hectares of land inundation [9]. Along with this, the devastating floods in Mumbai (2005), Uttarakhand (June 2013), Jammu and Kashmir (September 2014) are one of the few examples of devastating floods in India. West Bengal, which is primarily an agrarian state with high population density in the low-lying alluvial plain, is facing a catastrophic flood problem. The Irrigation and Waterways Department of Govt. of West Bengal (IWD) in their several reports stated that 42 per cent of the state is susceptible to flood. Whereas, Several studies have argued that approximately twenty million 55.8% of the state is susceptible to floods [10,11,12]. Chapman and Rudra [13] and Kadam and Sen [14] have concluded in the year of 2000, 20 million people were affected that during that flood. Therefore, flood risk assessment and flood management are very much essential to understand the flood-prone areas and to take suitable measurements [15].

Several models like support vector machine [16], weights-of-evidence [16], machine learning [17, 18], analytical hierarchy processes [19], bivariate and multivariate statistical models [20,21,22] were used to address the future flood susceptibility of an area. Furthermore, satellite images, such as Japanese Earth Resources Satellite 1 (JERS-1), Landsat, Environmental Satellite (ENVISAT), European Remote Sensing Satellite 2 (ERS-2), and Sentinel data are generally used to estimate river discharge [23] and flood-affected area [24, 25]. Along with this, the occurrence of extreme weather events rises with global climate change [26, 27] which is also a possible reason behind extreme hydrological events like droughts and floods [27, 28]. There are numerous studies which have applied the GCM data to estimates future flood susceptibility [17, 29,30,31,32]. But, most of them have focused mainly on the spatial patterns of flood susceptibility, where they have ignored the flood frequency analysis.

On the other hand, flood frequency analysis is very much important for reducing the impact of the flood by taking appropriate policies [33]. However, it is quite hard to forecast the flood using inadequate past observations on gauge height, river discharge, unpredictable rainfall, etc. [34, 35]. Human encroachment in the flood plain system made this situation very difficult [36]. Bai [27] stated that the flood frequency analysis (FFA) is generally applied to predict the flood magnitude and frequency where long historical records are available. Nevertheless, FFA through quantitative manner is still accepted as the benchmark technique for prediction of a flood [37]. Several studies have been done on this aspect based on diverse kinds of data, e.g., past data and paleo-flood accounts from all over the globe [5]. Among the several methods for the FFA, the probabilistic method is usually used in flood hydrology [38]. Studies on the FFA is mainly based on certain popular probability frequency distribution functions, such as Extreme Value distribution (EV), Gumbel’s Extreme Value distribution (GEV), Log-Pearson Type III, log-normal (LN), etc.[3]. Careful investigation on these methods has shown that suitability of frequency distribution functions varies with variation in a geographical area, such as GEV in Britain [39], LN distribution in China [40] and LPT-3 distribution in the USA [41, 42]. Beside, these analyses can provide only some statistical values, where spatial aspect of flood is absent.

The main objective of this study is to evaluate historical and future flood frequency analysis and flood prone area mapping based log-normal, Log-Pearson Type III (LPT-3), Gumbel’s extreme value distribution (EV-I) and extreme value distribution-III (EV-III) models and HEC RAS software. In this study, a monsoon-dominated lower reaches of the Dwarkeswar River at Arambag Station, Hooghly in West Bengal, India, has been selected to fulfil our objectives, where flooding is a common phenomenon in this lower part of the river due to poor drainage conditions [43,44,45]. Historical accounts by the IWD and studies from Mandal and Das et al. [35, 46] shown that the study area has been suffering from frequent floods. In addition, the HEC RAS software was used to estimate river discharge based on GCM-based daily rainfall data, elevation data, land use and land cover data and compares the selected flood models with it to create flood risk maps and it will be very helpful for policymakers in spatial assessment for future risks. As physical modelling is one of the most reliable and accepted methods for studying flood hazard [47]. Several hydrological simulating software and models, such as SWAT, HEC-HMS, and HEC-RAS, were used to understand the flood probability [27, 47, 48]. Minh et al. [49] have stated that such kinds of hydrological models are very useful to predict the flood frequency where long term records of hydrological data are not available. As well as Sivapalan, Sivapalan et al., Du et al., Zhang et al. and Huang et al. [50,51,52,53,54] have argued that these models are also very important to estimate flood with changing land use and climate. Flood models are mainly two types such as 1D and 2D. As the 1D model has a major limitation regarding lateral flow therefore 2D are frequently used for flood routing [55, 56].

2 Study area

Dwarkeswar River, which is also known as Dhalkishore [57], which is one of the major rivers in the western region of West Bengal. The drainage area of this river is bounded by 23°32′00ʺN to 23°40′25ʺN latitude and 86°31′08ʺE to 87°47′58ʺE longitude (Fig. 1), covering an area of 4356.6 km2. The Dwarkeswar River originates from the Tilboni hill of Chhotanagpur Plateau in Puruliya district. Several minor tributaries like Arkasha, Berai, Shankari, Beko Nala, DangraNala, Kumari Nala, Futuari Nala, Dudhbhaiya Nala joined with this main Dwarkeswar River and ultimately the Dwarkeswar River meet with Shilabati near Ghatal, Paschim Medinipur district. The lower part of the Dwarkeswar River is associated with Holocene Sediment [57, 58].

Fig. 1
figure 1

Location map of the study area

The shape of the drainage basin is elongated, and it is sixth-order drainage network and the average bifurcation ratio is 3 [57]. The highest width of the drainage basin is 40.80 km and the maximum length of the basin is 159.84 km. The entire length of the mainstream is 228.65 km. The lower part of the river basin is suffering from frequent flooding. In this study area, two gauge stations, such as Arambag and Shakepur, can be found in the lower part of the river. Among which Shakepur is suffering from severe data gaps. Therefore, the Arambag station (11.43 m from MSL) has been taken into our study, where according to Irrigation and Waterways Department Government of West Bengal, India [59], 17.23 and 17.43 m are representing the danger level and the extreme danger level. The catchment of the Dwarkeswar is associated with the monsoon type of climate. The mean yearly precipitation ranges from 1400 to 1500 mm [60] and most of the rainfall occurred during the peak monsoon time (Fig. 2a). The population density of this area varies from 1500 to 2000 person per km2 [61]. This area is mainly associated with agricultural activity (Fig. 1). Records from 1978 to 2018 shows that in the last 30 years river gauge height of Dwarkeswar river has crossed the extreme danger level for seven times near the Arambag station [62] (Fig. 2b). Therefore, approximately in every 6th year, gauge height of this river crosses the extreme danger level.

Fig. 2
figure 2

a Average monthly rainfall; b distribution of gauge height of Dwarkeswar River near Arambag (1978–2018)

3 Database and methodology

In this study to fulfil our objectives, first of all we have collected daily rainfall data, elevation data, land use and land cover data and daily river flow data for the period of 10 August 2016 to 27 August 2016 and incorporated in HEC-RAS 5.0.7. A rating curve was estimated with the help of observed data and HEC-RAS 5.0.7. After that observed historical gauge height data of the Dwarkeswar River near Arambag Town, Hooghly District, West Bengal, was collected for the period of 1978 to 2018 and annual peak discharge for the same period was computed with the help of rating curve. Besides, MIROC5 GCM daily rainfall data for the period of 1978 to 2018 (historical data) and 2020 to 2095 (to simulate future scenario) were collected from GCM (Table 1) and simulated in the HEC-RAS 5.0.7 to estimate Annual peak discharge (APD) for the respective periods. Afterward, statistical bias correction of the simulated annual peak discharge data from the MIROC5 GCM daily rainfall has been done using computed annual peak discharge data from the observed gauge height data for the same period. At last, flood frequency analysis of historical and future discharge and gauge height and flood-prone areas concerning different recurrence interval have been determined (Fig. 3) to determine present and future flood susceptibility of the lower part of the Dwarkeswar River.

Table 1 Data and maps used for this study
Fig. 3
figure 3

Methodological flow chart of the study

3.1 Rain-on-grid model

Rain-on-grid model of the HEC-RAS-v5.07 has been applied in this work to forecast the river discharge and gauge height data for Arambag station using daily rainfall data, elevation data and LULC data. In this HEC-RAS-v5.07 version, Rain-on-grid model is able to predict the river discharge, gauge height and flood affected area [63].

3.1.1 Rating curve development

A rating curve can be defined as “a relationship between two stream or river variables, usually its discharge (m3 s−1) and a related variable such as water stage (depth of water above a local datum)” [64]. It is generally used to forecast a variable which is hard to determine constantly or in extreme events. Stage data of the river is much more significant than the discharge for flood forecasting and planning evacuation in flood-prone areas, [65]. In this study, a rating curve was estimated using daily rainfall data from 10 rain gauge stations (Arambag, Bankura, Champadanga, Durgapur, Ghatal, Indus, Panchet, Simulia, Sonamukhi and Tusuma) for the August 2016 flood (10th August 2016 to 27th August 2016) was collected from IWD (Table 1 and Fig. 4a). After that average daily rainfall of the basin was determined using thesian polygon in Arc GIS 10.3 and weighted daily rainfall was estimated. Beside, land use and land cover data and geology of the study area (Fig. 4b and c) were collected from NRSC Bhuban [66] to estimate Manning roughness value following Chow [67] (Table 2). ALSO PALSAR digital elevation model and fifteen channel cross-sections by Dumpy-level survey were used to modify the terrain model. In this study, 12.5 m staggered grid with rectangular computational cell (Mesh) was used to make similar to the modified DEM. After incorporating all the required data, one hour time interval was applied to run the model. The Couranr–Friedrichs–Lewy condition time step was determined and applied to stabilize the model. In this version, it can resolve the 2D diffusive wave equations or the full 2D Saint Venant equations were followed considering depth of the water, specific flow, surface elevation, gravity acceleration, Manning roughness coefficients, density of the water, effective shear stress and Coriolis force [63]. It has been initially found that Saint Venant equations and 2D diffusive wave provided a similar result, but 2D diffusive wave was much quicker. Therefore, 2D diffusive wave option was applied for the analysis.

Fig. 4
figure 4

a Rainfall stations and their spatial coverage developed through the application of thiessen in Arc GIS software; b Land use and Land Cover and c geology of the study area

Table 2 Land use and land cover of the study area and its respective Manning ‘n’ values

3.1.2 Model validation

Evaluation of the model is one of the significant aspects of any kind of study associated with the application of the model. Gauge height data was evaluated through statistical measures and evaluation of inundation area for the 2016 flood event to make it more reliable.

3.1.2.1 Nash–Sutcliffe coefficient

Statistical accuracy of the rating curve or the stage information was assessed with the help of Nash–Sutcliffe coefficient (NS) [68]. Here, simulated gauge heights data were compared with observed gauge heights to develop NS value. NS value of 1 is recognizes as ideal effectiveness of the model, while NS value below 0 represents that the average value of the measured data would have been suitable forecaster than the applied model [65].

3.1.2.2 Receiver operating characteristic curve

Beside this NS test, the simulation of the 2016 flood area by the model was evaluated with the help of receiver operating characteristic curve (ROC) with the area under curve (AUC) value using ISRO flood database extracted from Geo-visualization for the respective period and (Table 1). The ROC curve was used to compare the flood-affected area by this particular event. The ROC is a significant and extensively applied diagnostic methods in spatial modelling and the Geosciences [69, 70]. It is one of the benchmark methods to establish the performance of the model or models [70]. It is represented by plotting the values of sensitivity and 1-specificity on the abscissa and ordinate correspondingly [69,70,71,72]. The prediction of the event’s non-occurrence and occurrence is being quantified quantitatively applying the area under curve (AUC) [70]. In this way performances of the suggested model have been highlighted, where the value ranges from 0.1 to 1. AUC value was used to categorize the precision of the event predictive models as follows: poor accuracy (0.5–0.6); moderate accuracy (0.6–0.7); good accuracy (0.7–0.8); very good accuracy (0.8–0.9) and outstanding accuracy (0.9–1) [73].

3.2 Flood frequency analysis

Flood frequency analysis (FFA) deals with the runoff data measured at a particular station or site across a river [74]. FFA method is used to fit a probability distribution function to the recorded maximum discharge to estimate the future flood events in respect to its return period and probability [75]. There are several probability distribution functions available to determine the chance of extreme floods [76]. In this study annual maximum series of peak discharge/height have been used to understand the flood probability analysis. The Institution of Engineers Australia (IEA) [77] stated that the annual maximum series are independent, easily extracted and it conforms the theoretical frequency distribution. Extreme value distribution and its probability functions are a significant aspect of hydrological studies. According to Chow (1959 and 2010) Extreme Value Type I distributions are generally modelled for storm rainfall events, and this model can be used for river flow also. In this study, flood FFA have been done following four methods of extreme value distribution; such as Gumbel’s method (Extreme Value-I) and Weibull’s method (Extreme Value-III), log-normal and Log-Pearson-Type-3 were employed for yearly peak discharge (Qmax) and stage data to assess the competence of selected techniques for flood analysis. EasyFit and MS Excel software have been used to get several parameters, probability density functions f(x) and cumulative distribution functions F(X).

3.2.1 Gumbel’s method of the extreme value function

Gumbel’s method of the extreme value distribution function, was introduced by Gumbel in 1941[78], is also known as extreme value type I distribution. Gumbel realized that the annual peak flooded data are nothing but the extreme values in different year’s observations [74]. It is based on the argument that distribution of an extreme event is unlimited and hence the most suitable distribution for fitting to the extreme value data is of the double exponential type [74], and it is used to replica the allocation of the minimum or maximum number of samples of different distributions [79]. According to S.K. Pal it is widely used in estimation of natural phenomena like storm rainfall, peak discharge, low flows and other similar events [74]. Gumbel’s method is the GEV type I (EVI) distribution and is useful for smaller data sizes. Though, if the sample size increases above 50, it will demonstrate a superior performance [80]. Furthermore, Cunnane in 2010 represents that distributions with two parameters (EV1) is associated with lesser standard error, but higher bias than more than two parameter distributions particularly in a little data set [80].

3.2.2 Weibull’s method of extreme value distribution

Weibull’s method of extreme value distribution is widely used method for flood frequency analysis [81,82,83]. Weibull in 1939 introduced this equation for the analysis of flood magnitude for the corresponding return periods [81]. The technique in plotting the distribution is to rank the data [74]. This method probability distribution is associated with shape and scale parameters and other types of probability distributions; especially, it imposed within the Rayleigh distribution and the exponential distribution [84]. The appearance of the density function of this method adjusts significantly with the changes in the value of shape parameter [84]. Here, the probability of an event (P) is the rank of the flood discharge for a particular distribution and number of observation (N) and the frequency of occurrence (f) which is known as the percentage of probability distribution.

3.2.3 Log-Normal distribution of extreme value

Log-normal distribution is very significant in the explanation of natural phenomena [85]. It is an uninterrupted probability distribution model of a random distribution and its logarithm value is normally distributed [85, 86]. The typical uses of this distribution model are observed in representation of failure rates, fatigue failure, and other phenomena involving a large range of data. [87].

3.2.4 Extreme value distribution using Log-Pearson-Type-3

Log-Pearson-Type-3 (LTP-III) is considered as the standard method for hydrological frequency analysis [74]. LPT-III distribution of extreme value was developed by Pearson [88]. In this case, the estimated flow (Xt) of a corresponding stage can be determined by the logarithm of the calculated flood. LPT-III, is nothing but a Pearson Type 3 distribution, and this type of distribution is also known as the Gamma distribution [89, 90], is complex having probability density function with scale, shape and location parameters (for further details [90]). It is a skew distribution so that the distribution has a limited range in the left in which direction, the probability curve gets truncated and below a certain value of the variate the probability is zero [74].

3.2.5 Selection of best fit model

To determine reliable and particular techniques and thereby forecast of flood incident for a particular distribution, it is important to determine appropriate function through test statistics. A single test cannot give a decisive result [83], therefore in this study, we have used Kolmogorov–Smirnov (K–S), Anderson–Darling (A–D), and Chi-square (X2) tests based on cumulative distribution functions F(X) or probability density functions (X) to determine the best fit model function at 5% level of significance. Details of the test statistics can be found in the work of Solaiman [91]. Along with this D-index test was conducted to validate the best result [92]. The lowest value of D-index represents the excellent function distribution for estimation of annual maximum flow corresponding to its return period.

3.3 CMIP5 model and climate change scenarios

The CMIP5 data of MIROC5 model have been used in this study for the two specific periods, e.g., historical and future period which was downloaded from the Earth System Grid Federation (https://esgf-node.llnl.gov/search/cmip5/). It is based on IPCC’s latest phase of Representative Concentration Pathways (RCP) scenarios for climate change projections [93] under different levels of radiative forcing that is RCP 2.0, 4.5, 6.0 and 8.5. RCP 8.5 representing the extremely high amount of climate-changing force was considered. RCP 6.0, 4.5 and 2.0 represent a decreasing amount of climate change forcing [94]. Sharmila et al., Singh et al. and Sanap et al. [95,96,97] have found that MIROC5 GCM can be used in the Indian scenario to simulate rainfall. Thereby, daily rainfall data from CMIP5 (GCM-MIROC5) has been used in this study for two specific periods such as historical (1978–2018) and future (different RCPs 2020–2100). R Studio (Version 3.1.3) has been used for statistical downscaling of the daily rainfall data into text format through the spatial location of the desired weather station [98]. First of all, historical daily rainfall data run into HEC-RAS model and annual maximum discharge and stage height of Arambag has been extracted for the two specific periods. After that historical data has been compared with the estimated discharge data from gauge height to reduce the biases in the GCM data (Fig. 3).

3.4 Estimation of the flood-affected area and its mapping

The flood-prone area has been determined from the ALOSPALSAR digital elevation model (12.5 m) with geographic projection. It has been found that ALOSPALSAR DEM is more reliable than other types of freely available DEM [71]. In this study, DEM has been processed following standard procedure and after that, it was used for flood-prone area estimation. The calculation for demarcating flood-affected area was done following Bandyopadhyay et al. [3]. Extraction of the flood-affected area has been used for 25, 50 and 75 year return period for all models as well as for all the RCPs. Flood affected area was validated using 2016 flood. Gauge height and flood-affected area for the 2016 flood using Gumbel Max, Log-Pearson 3, Lognormal and Weibull were estimated (Fig. 3) and validated with the actual flood cover area using ROC.

4 Result and analysis

4.1 Flow simulation and rating curve

The rating curve was developed using the rain-on-grid model in HEC-RAS for August 2016. The difference between the observed gauge height and simulated gauge height is quite low (Fig. 5a). There are some deviations among the observed and simulated gauge heights especially for lower peaks. This may be possibly because of presence of several anthropogenic activities like extraction of river water for irrigation. As we are more interested on the peak gauge height, in this case this has shown similar results as observed gauge heights. Apart from this extracted rating curve has been associated with a loop formation, which is very much significant outcome by this model. Simulated rating curve also shown that during the August 2016 flood, maximum 1260 m3 s−1 discharge has been predicted (Fig. 5b and Table 3). An average error in the predicted gauge height of the river at Arambag station showed that minimum error of 0.002 m, observed on 8/12/2016 and maximum error was 1.872 m found on 8/23/16. Average error in predicting the actual gauge height of the river was 0.673 m. The difference between the observed peak gauge height and simulated peak gauge height was very low (0.05 m). Pearson correlation among the observed and simulated gauge height is significantly high (0.903, Table 4) and explained 81% of the variable. Standard error between observed and simulated gauge heights was 0.828 m so the standard uncertainty is quite low. Beside, NS value of 0.71 also support that HEC-RAS 5.0.7 rain-on-grid model can be able to replicate the gauge height successfully (Fig. 5a).

Fig. 5
figure 5

a Observed and simulated gauge height for the Dwarkeswar River during August 2016 flood; b Estimated rating curve of Dwarkeswar River near Arambag

Table 3 Statistical parameters of observed and simulated gauge height
Table 4 Relationships among observed and simulated data of Arambag station during flood 2016 August

Along with this above statistical evaluation of simulated and observed gauge height, inundation caused by the same flood (Fig. 6a, b and c) was also validated (Fig. 3). In this study simulated flooded area by the model was evaluated with the ISRO flood database extracted from Geo-visualization for the respective period (Table 1) and Receiver Operating Characteristic curve (ROC) with the area under curve (AUC) value with 0.798 (Fig. 6c) indicates the good quality of the model (Table 5). So, the rating curve of this river for the Arambag Station can be considered for the estimation of river discharge during floods.

Fig. 6
figure 6

a Flood affected area of lower part of the Dwarkeswar River in August 2016 (NRSC BHUBAN); b Simulated flood affected area of the lower Dwarkeswar River extracted from HEC RAS; c Receiver Operating Characteristic curve developed from the actual flooded area and estimated flooded area

Table 5 Receiver Operating Characteristic curve and its significance level

4.2 Statistical character of APD of the study area

Arambag station of Dwarkeswar River records only the depth of the river or the flood height. Therefore, based on the rating curve (Fig. 5b), we have estimated the annual peak discharge of the Arambag station (Fig. 7a) with the help of gauge height data available for the period of 1978 to 2019 (Fig. 7b). It has shown that APD data of this study are independent (Fig. 7a). The statistical attributes have been described in Table 6. Results are showing that the highest amount of APD was observed in 1978 with 1440.73 m3 s−1. Minimum discharge of 5.59 m3 s−1 was found in the year of 2014. It was also found that the river is characterized by a high standard deviation of 431.64 m3 s−1 due to dependency on the monsoon rainfall and the uncertainty of the monsoon. The ranked estimated discharge data shows that maximum numbers of APD are below the average discharge of 462.35 m3 s−1. Apart from this, the variation of APD above the mean discharge is greater than those below the average (Fig. 7c). As well as Fig. 7b has also shows that the gauge height of the Dwarkeswer River near Arambag has crossed the EDL seven times from 1978 to 2018. The value of mean deviation varies from − 1.058 to 2.26 m. Trend analysis of the APD is indicating a negative trend in the annual peak estimated discharge (Fig. 7a) probably due to the constructions of several minor dams, particularly for irrigation purposes [99]. The ratios between annual peak discharge and long term average discharge suggest that the peak discharges at Arambag station is unpredictable and variable in nature. The peak discharge (Qmax) for the 1978 event was 3.12 times greater than the mean annual peak discharge (Qm) at Arambag station (Fig. 7d).

Fig. 7
figure 7

Hydrological characteristics of the lower Dwarkeswar River system. a Annual peak discharge of Dwarkeswar River and its trend. b Distribution of annual peak gauge height of Dwarkeswar River. Danger level (DL) is indicated by the dotted line, which is the maximum limit of safe level for the lower part of the Dwarkeswar River (LDR). The vulnerable for LDR is represented by the upper continuous line indicates the extreme danger level (EDL). c Departure of yearly maximum discharge from average discharge for 41 years (1978–2018). d Temporal variation of Qmax/Qm ratio, where Qmax = APD and Qm = long-term average peak discharge. (Arambag Station)

Table 6 Descriptive statistics of APD of Dwarkeswar River near Arambag station (1978 to 2018) (

4.3 Flood frequency analysis of the study area

4.3.1 Historical flood frequency analysis of the study area (1978–2018)

FFA by four types of probability distribution methods have been applied on the APD data of the lower part of Dwarkeswar River at Arambag station from 1978 to 2018. The four functions of yearly peak discharge taking random variable are expressed with its cumulative distribution function and probability density function [100] (Fig. 8 a, b and c).

Fig. 8
figure 8

a Probability density function for yearly maximum discharge (APD) of the Dwarkeswar River applying Gumbel, Weibull, Lognormal and Log-Pearson 3 distributions; b Cumulative distribution functions of APD using the same models; c Distribution of probability difference of APD of Dwarkeswar River near Arambag station

Maximum likelihood estimation was considered in the study to estimate the parameters (Table 7) in the subsequent models as it presents population parameters with minimum mean error [100]. In this analysis, EsyFit software (available at http://www.mathwave.com) was applied to calculate the cumulative distribution functions, probability density functions and parameters.

Table 7 Estimated statistical parameters for the four distributions

On the semi-log graph, estimated rank APD data was plotted against the return period and it is a skewed curve toward the end of the graph (Fig. 9a). After Weibull’s method, recurrence interval (T) of the lowest yearly maximum discharge for 2014 (5.53 m3 s−1) is 1.02 years with a probability of 97.62% and the highest APD of the year 1978 (1440.73 m3 s−1) is 42 years with a probability of 2.38%. Figure 9a and b also shows that the ranked discharged data in Gumbel’s method is near to a straight line than the Weibull’s method. Figure 9c indicates that the recurrence interval of extreme APD in Weibull’s method is greater than the Gumbel’s method which is also found in the comparative analysis of per cent probability for its discharge (Fig. 9d). Along with this, it is also clear that Gumbel’s probability distribution (EV-I) method is more appropriate than EV-III because of the linear relationship in EV-I.

Fig. 9
figure 9

a Weibull’s return period concerning APD (1978–2018) for the Dwarkeswar River. b Gumbels’s probability plot demonstrating selected return period and yearly maximum discharge (1978–2018) for the Dwarkeswar River. c Comparison between Weibull’s and Gumbels’s frequency distribution of APD lower part of Dwarkeswar River. d Comparison between Weibull’s and Gumbels’s per cent probability distribution of APD lower part of Dwarkeswar River

Frequency factor (K) has been used to APD data observed at Arambag on other types of approaches including Gumbel, log-normal and Log-Pearson-Type-III. From Gumbel’s probability distribution (EV-I), it was observed that a flood with the discharge of 949.04 m3 s−1, which is representing a major flood has a recurrence interval of 6.77 years with a probability of 14.77%, whereas the probability of the same discharge by Log-Pearson 3, Weibull and Lognormal methods (Figs. 8a and b, 9d) are much higher than Gumbel’s probability distribution such as 18.60, 18.94 and 18.3%, respectively. Probability distribution curves of Log-Pearson 3 and Weibull shows higher degree of similarity throughout the distribution (Figs. 8a and b, 9d). Beside, Lognormal probability distribution curve is showing lower probability than the others in discharge less than 500 m3 s−1 and higher probability in respect higher discharge (more than 1000 m3 s−1) (Fig. 8a and b). The probability curve of the Gumbel’s method is representing opposite to the log-normal probability distribution (Figs. 8a and b, 9d). Along with this probability distribution by the Gumbel’s method is showing the highest probability than other distributions in lower discharge ranges from 100 to 400 m3 s−1 and lowest probability in respect higher discharge above 650 m3 s−1 (Figs. 8a and b, 9d). All the probability distribution curves are showing similar probability in APD below 50 m3s−1 and 450–650 m3s−1 (Figs. 8a and b, 9d). The floods concerning different return period’s 1.5, 5, 10, 25, 50 and 75 have been estimated using Gumbel, log-normal and Log-Pearson-Type-III to understand the predicted discharges as well as the vulnerability of the LDR in respect to Arambag hydrologic station (Fig. 10). The estimated flood magnitude of 5 and 75-year return period is computed as 823.1 and 1884.0 m3 s−1, respectively, by Gumbel’s method, whereas Log Pearson Type 3 shows the APD of 844.9 and 2684.3 m3 s−1 for 5 and 75 year return period with probability of 13.59 and 5.15%, respectively, which is much higher than the other methods (Fig. 10). So, from the different methods of FFA, this became ambiguous, which requires some assessment on uncertainty, so that proper inundation map can be prepared. Therefore, the GOF test for the four methods has been computed (Table 8). The result (Table 8) indicates that Log Pearson Type 3 distribution appears to be the best fit among the four methods.

Fig. 10
figure 10

Comparative representation of APD and its return periods by the Log-Pearson 3, Weibull, Lognormal, Gumbel

Table 8 Model validation and its summaries for the selected models fitted to Dwarkeswar River discharge data (Qmax)

4.3.2 Climate change and future FFA of the study area (2020–2100)

Apart from the statistical models, we have used MIROC5 GCM data for FFA of this region in the upcoming years. The bias-corrected APD for the different future scenarios (RCP 2.6, 4.5, 6.0 and 8.5) have been calculated (Fig. 11). It is obvious from the Fig. 11 that there are a few small dissimilarities observed between the quantities of discharge among the observed discharge and corrected GCM discharge for different recurrence interval (Fig. 11). After that, the biases in future discharge with different RCP scenarios, such as RCP 2.6, RCP 4.5, RCP 6.0 and RCP 8.5 has been eliminated.

Fig. 11
figure 11

a Estimated observed annual peak discharge (APD), uncorrected APD from MIROC5 and b bias-corrected APD from the same GCM model

Annual peak discharge data predicted for the different future scenario of RCP 2.6, 4.5, 6.0 and 8.5 for the period of 2020 to 2100 (Fig. 12a) indicates that highest daily discharge reach to 1651.2, 1725.24, 1760.06 and 1798.28 m3 s−1 at Arambag station. Minimum discharges of 67.76, 60.01, 79.60 and 78.94 m3 s−1 were estimated for RCP 2.6, 4.5, 6.0 and 8.5, respectively (Table 9). Average APD for the respective RCPs were 501.3, 553.3, 631.6 and 638.2 m3 s−1, respectively. Standard deviations of predicted discharge for RCP 2.6, 4.5, 6.0 and 8.5 were also very high and they are 463.20, 503.50, 493.99 and 514.66 m3 s−1, respectively (Table 9).

Table 9 Descriptive statistics of predicted river discharge of the Dwarkeswar River near Arambag Station for the period of 2020 to 2100

The distribution of APD (Fig. 12a) shows that maximum discharges were 3.6, 3.7, 3.8 and 3.9 times higher than current average discharge of 462.35 m3 s−1 for RCP 2.6, 4.5, 6.0 and 8.5, respectively (Table 6 and Fig. 12b).

Fig. 12
figure 12

Distribution of APD (a) and Temporal variation of Qmax/Qm ratio (b) of the Dwarkeswar River at Arambag Station for the period of 2020 to 2100 in different RCPs such as, 2.6, 4.5, 6.0 and 8.5

Distribution of APD of the Dwarkeswar River at Arambag station for the period of 2020 to 2100 with different RCPs indicate that APD increases with degree of climate change forcing factor increases (Table 9 and Figs. 12a, b, 13 and 14a). Frequency distribution of APD with different RCPs showed that number of APD of below 300 m3 s−1 decrease with increasing climate forcing (from RCP 2.6 to RCP 8.5). On the other hand, number of APD above 300 m3 s−1 increased with increasing climate forcing (Table 9 and Fig. 13), especially APD between 600 and 900 m3 s−1 (Table 9 and Fig. 13).

Fig. 13
figure 13

Frequency distribution of APD of the Dwarkeswar River at Arambag Station for the period of 2020 to 2100 with RCP 2.6, 4.5, 6.0 and 8.5 and observed discharge (1978–2009)

Fig. 14
figure 14

a Frequency distribution of APD in different scenarios (RCP 2.6, 4.5, 6.0 and 8.5) and b is compound representations of APD and its return periods by the Log-Pearson 3, Weibull, Lognormal, Gumbel and by future RCP 2.6, 4.5, 6.0 and 8.5 (MIROC5)

4.4 Comparative assessment of the historical and future flood frequency

The results of compound APD are increasing in an approximately straight line on the semi-log graph paper (Fig. 14a and b). In this Fig. 14b, it is also clear that Log Pearson Type-III is increasingly rapidly with increasing return period in comparison to other. Compound results from Table 10 and Fig. 14a and b showed that probability distribution models have predicted higher amount of APD in lower recurrence interval. APD for the recurrence interval of 1.5 years increased from 101.9 to 184.8 m3 s−1 (in RCP 8.5), which is nearly 181.4% of the estimated APD by Log Pearson Type 3 (Table 10). Although, estimated APD for the return period of 1.5 years by Gumbel and Weibull probability distributions methods were very high compared to Log Pearson Type 3 and different RCP based future discharges. Similarly, APD with 5 and 10 years of return period increased from 844.9 m3 s−1 (Log Pearson Type 3) to 1184.3 m3 s−1 (RCP 8.5) and 1322.5–1487.2 m3 s−1, respectively (Table 10 and Fig. 14a and b). Increase of APD for 5 and 10 years of return period compare to Log Pearson Type 3 were 140.2 and 112.5%, respectively. On the other hand, APD of 25, 50 75 years of recurrence interval were showing reduction of APD in respect to Log Pearson Type 3. This may be due to prediction of higher amount of APD by the Log Pearson Type 3 probability distribution method. RCP scenarios show the higher amount of APD in shorter recurrence interval, which is indicating the impact of climate change. Figure 14 a and b shows that RCP 8.5 is showing the higher amount of flood compared to other types of scenarios by the MIROC5 GCM model, whereas, RCP 2.6 showing the lower value (Table 10).

Table 10 Return period and its respective estimated discharge and estimated gauge height in respect to different return period

4.5 Flood-affected area of the study area

Flood affected area was determined based on gauge height at the Arambag station for 2016 flood. The area under curve value for Gumbel, Weibull, Lognormal and Log-Pearson-Type-3 were 0.884, 0.885, 0.88 and 0.884, respectively (Table 11 and Fig. 15a and b). The result shows that Weabull is the best-fitted model in the flood-affected area estimation (Fig. 15a and b). Although, Gumbel Max, Log-Pearson 3 are quite similar in respect to flood affected area estimation (Table 11).

Table 11 ROC curve test values (
Fig. 15
figure 15

a Flood affected area by Gumbel Max, Log-Pearson 3, Lognormal and Weibull in respect to 2016 flood and b ROC curve for flood affected area by Gumbel Max, Log-Pearson 3, Lognormal and Weibull in respect to 2016 flood

It was found that most of the frequently inundated area belongs to the low-lying areas. In this case, Log Pearson Type 3 is showing the highest amount of flood height (Fig. 16). The higher amount of flood height was found for the Log Pearson Type 3, Weibull, Gumbel and log-normal distribution in the higher return period (> 10 years) and thereby inundating much greater area in the longer return period (Table 10 and Fig. 16). In the case of, the future scenario, the higher gauge height was found for the lower return period (less than 10 years) (Table 9 and Fig. 16). Thereby indicating the higher amount of area will be inundation in the future period. These low-lying depression areas of the lower part of the Dwarkeswar River are usually suffered from the floods (Fig. 17) particularly after meeting with its tributary like Amodar though embankment breaching (locally known as Bali Hana). Apart from this Irrigation and Waterways Department [59] indicated that these areas are characterized by poor drainage conditions which are causing frequent flooding in these areas. Bandyopadhyay et al. [99] argued that flood in the east-flowing rivers of West Bengal are mainly due to degradation of channel capacity, human encroachment and tidal surge are the primary reason behind the occurrences of the flood. Biswas et al. [101] argued that human encroachments, artificial levee construction along with tidal and fluvial forces are responsible for frequently occurrences of the flood, whereas Das and Bandyopadhya [102] found that a huge amount of monsoon rainfall, catchment size and its shape, as well as land use, are one of the basic reasons behind the frequent flood in Dwarkeswar River. Das et al. [35, 103] also stated that huge amounts of rainfall within a small period, as well as human encroachment on a natural flood plain, are the reason behind this frequent flood. During the field survey with local aged people stated that improper design of bridge near Bandor obstructing the river flow as well as elevating the height of embankment resulted in the increases of inundation period in this area. Along with this increasing population number and its density causing increases of flood risk in this area.

Fig. 16
figure 16

Flood affected the area of the lower part of Dwarkeswar River based on Gumbel, Weabull, log-normal, Log-Pearson Type-III and different RCPs of MIROC5 GCM data

Fig. 17
figure 17

Flood field photograph showing flood conditions of the Dwarkeswar River around Arambag Town

4.6 Risk analysis of the study area

It was found that lower part of the Dwarkeswar River basin is associated with frequent flooding (Fig. 17), where 187.91 km2 of the land area is associated with the very high-frequency flood with 10 years of return period and here the major portion of the inundated area belongs to the agricultural land, covering 166.83 km2 (Figs. 16, 17 and Table 12). A total of 93 villages with 23,996 households, 106,984 population and 13,272 cultivators come into this very high flood susceptibility category. In case of the flood with 25 years of the return period, it can inundate 301 villages covering 382.90 km2 areas where agricultural land and settlement areas are 344, 22.33 km2, respectively. These types of floods can create serious threats to the 1,217,345 households, 6,227,447 populations and 579,161 main and marginal cultivators (Fig. 16 and Table 12). The floods with 50 years of return periods may inundate 716.45 km2 area characterized by 643.74 km2 agricultural land, 51.04 km2 settled area, 1.24 km2 vegetative lands, 20.13 km2 water bodies and 0.31 km2 fallow land (Fig. 16 and Table 12). Such type of moderate floods may affect 585 villages, 1,889,409 households, 9,624,973 population and 895,367 main and marginal cultivators (Fig. 16 and Table 12). Low flood susceptibility category associated with 75 years of return periods can affect 980 km2 areas. Although the flood susceptibility is low but it occurs due to very high flood conditions in the river. Cumulatively it may inundate 881.96 km2 agricultural land, 75.52 km2 settled area, 21.08 km2 water bodies and 1.43 km2 vegetations (Fig. 16) which may affect 3,111,321 households, 15,876,395 population including 1,470,976 main and marginal cultivators (Table 12). Rest of the area is associated with very low flood susceptibility with more than 75 years of the return period, caused by extremely high flood conditions and may inundate 1439.4 km2 area characterized by 1287.46 km2 agricultural land, 126.68 km2 settled area, 1.60 km2 vegetation, 0.31 km2 fallow land and 23.35 km2 of water bodies (Table 12). These kinds of floods may affect 897 villages, 3,683,643 households, 18,826,856 populations and 1,741,989 main and marginal cultivators (Table 12).

Table 12 Return period wise land use and land cover area under flood susceptibility and its respective number of villages, population, cultivator and number of households susceptible to floods in respect to different return periods (LPT-3) (

In this situation, this study can provide the people and policymaker to take preventive measures. Das et al. [103] stated that living with a flood may be a suitable way to cope up with the flood. However, community participation, development of early flood warning system, evacuation of population nearer to the river and good coordination among different government agencies are very much crucial to mitigate the flood. Along with Das and Bandyopadhya [102] have suggested some possible measures by which the impact of a flood can be minimized over a long period to attain sustainable development in this region.

5 Conclusion

Flood monitoring and assessment are very crucial for attaining sustainable development. The FFA was conducted in the lower Dwarkeswar River basin at Arambag station. This study analyzed four probability distributions such as Gumbel’s method (Extreme Value-I) and Weibull’s method (Extreme Value-III), log-normal and Log-Pearson-Type-3 based on 1978 to 2018, as well as future flood-related possibilities concerning RCP 2.6, 4.5, 6.0 and 8.5 for the period of 2020 to 2100. The study concludes that statistically, Log-Pearson-Type-III is very helpful in dealing with FFA of the study area, whereas Weibull’s method is very helpful for assessing the vulnerable areas. Log Pearson Type 3 depicts that flood values are very high such as 844.9, 1322.5, 1946.2 m3 s−1 and 2387.9 and 2684.3 m3 s−1 for 5, 10, 25, 50 and 75 years of return periods, respectively. As well as gauge height may also increase to 17.35, 18.33, 19.31, 19.88 and 20.22 m for the respective years. MIROC5 GCM model also concludes that APD in case of RCP 8.5 scenario with recurrence interval of 1.5, 5 and 10 years increases to 184.8 m3 s−1 (181.4%), 1184.3 m3 s−1 (140.2%) and 1487.2 m3 s−1 (112.5%), respectively, in respect to Log Pearson Type 3. All models show that there is an increase in the occurrences of floods for short and long term recurrence interval. Flood affected area may increase from 187.91 km2 in 10 years of return period to 1439.4 km2 area with 75 years of return period which is more than 7.5 times, where 897 number of villages, 3,683,643 number of households, 18,826,856 population and 1,741,989 cultivators may affected. Agricultural lands are the most vulnerable area and 1287.46 km2 may face inundation problem by floods with 75 years of return period. Along with this, it must be remembered that flooding is an inseparable part of fluvial processes, but, it became a hazard when human encroachment took place in the playing zone of the river. Instead of emphasis on short term measures like construction of new embankment and elevating embankment long term measures like relocation of embankment away from the active channel considering the predicted channel capacity and floodplain connectivity; renovation of palaeo-channels; flood frequency analysis based flood-prone mapping considering climate change; crops, animal and human lives insurance and real-time monitoring of rainfall, river gauge height and simulated flow analysis to forecast the inundation area and its duration can be considered to assist the planner and local people from flood hazards. Therefore, we need to shift our mindset as well as flood management policy for the long term sustainable development of this area.