Abstract
Accurate estimation of evaporation is of great significance for understanding regional drought, and managing and applying limited water resources in dryland. However, the application of the traditional estimation approaches is limited due to the lack of required meteorological parameters or experimental conditions. In this study, a novel hybrid model was proposed to estimate the monthly pan Ep in dryland by integrating long short-term memory (LSTM) with grey wolf optimizer (GWO) algorithm and Kendall-τ correlation coefficient, where the GWO algorithm was employed to find the optimal hyper-parameters of LSTM, and Kendall-τ correlation coefficient was used to determine the input combination of meteorological variables. The model performance was compared to the performance of other methods based on the evaluation metrics, including root mean squared error (RMSE), the normalized mean squared error (NMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and Nash–Sutcliffe coefficient of efficiency (NSCE). The results indicated that the optimal input meteorological parameters of the hybrid Kendall-τ-GWO-LSTM models are the monthly average temperature, the minimum air temperature, the maximum air temperature, the minimum values of RMSE, NMSE, MAE, and MAPE are 38.28, 0.20, 26.62, and 19.96%, and the maximum NSCE is 0.89, suggesting that the hybrid Kendall-τ-GWO-LSTM exhibit better model performance than the other hybrid models. Thus, the hybrid Kendall-τ-GWO-LSTM model was highly recommended for estimating pan Ep with limited meteorological information in dryland. The present investigation provides a novel method to estimate the monthly pan Ep with limited meteorological variables in dryland by coupling a deep learning model with meta-heuristic algorithms and the data preprocessing techniques.
Similar content being viewed by others
Introduction
Evaporation (Ep) is a highly non-linear physical process, which is profoundly affected by meteorological parameters, including temperature, wind speed, precipitation, solar radiation, etc.1,2. As a main component of water balance, it plays an extremely important role in the global hydrological cycle3,4,5. Accurate estimation of evaporation by using is a significant issue in ecological management 6,7,8,9,10,11, especially in arid sand land, where the stability and sustainability of the artificially re-vegetated belts depend on the effective utilization of the limited available water resources12,13.
In general, the direct measurements method (e.g., Class A pan, Lysimeter group) is largely restricted due to the limitation of experimental conditions in dryland14,15,16, and the physically-based methods (e.g., Dalton model, FAO-56 Penman–Monteith method, etc.) have the drawbacks that the estimated results are very sensitive to the errors of parameters17,18, and the key meteorological factors(e.g., relative humidity, latent heat of evaporation, radiation) are sometimes difficult to be measured in the arid sand land19,20. Therefore, it is necessary to construct the data-driven models to estimate the Ep with less meteorological information.
Recently, various data-driven shallow machine learning (ML) models, e.g. artificial neural networks (ANN)11,20, radial basis function neural networks (RBFNN)21, multilayer artificial neural networks (MLNN)2,22, extreme learning machine (ELM)2,15, random forest (RF)7, support vector machine (SVM)5,12,13,23, etc., have been widely used to simulate Ep with incomplete meteorological variables. Those models have the excellent capability of simulating the non-linear relationships between the Ep and meteorological variables24,25. As the hyper-parameters of the ML models determine the estimated results and accuracy, meta-heuristic algorithms, including genetic algorithm (GA)6,26,27, particle swarm optimization algorithm (PSO)1,28, whale optimization algorithm(WOA)2,12,29, flower pollination algorithm (FPA)2, grey wolf optimizer algorithm (GWO) 12,13, etc., were employed to obtain the optimal hyper-parameters of ML models. In addition, the data preprocessing techniques, including Kendall-τ correlation coefficient29,30, and entropy weight31 were used to find the effective input combination of ML models. Literature review shows that shallow ML models hybridized with meta-heuristic algorithms and data preprocessing techniques, namely, hybrid model, have higher estimation accuracy than shallow ML models or physically-based methods2,7,32. Such models are recommended as the best choice for estimating Ep with limited meteorological information in different climate zones8,12,13,33,34,35.
Although shallow ML models hybridized appropriate meta-heuristic algorithms and data preprocessing techniques have proven potentially capable of estimating Ep in different regions2,6,7,32,33,34,35, the output of those hybrid models exists large error since the structure of shallow ML models cannot fully simulate the non-linear relationships between the meteorological parameters and Ep11,13,19,36,37. To improve the estimating accuracy, deep learning models (e.g. recurrent neural network (RNN)36, deep neural network (DNN)37, temporal convolution neural network(TCNN)37, long short-term memory (LSTM)12,38, etc.) were employed to estimate the Ep. Literature review shows that the deep learning models, especially LSTM, have better model performance than that of the other deep learning models and shallow ML models, and are demonstrated as an effective method for estimating Ep in different regions12,36,37,38. However, the setting of hyper-parameters of LSTM is subjective or depends on experience, which inevitably leads to a large estimating error. The hyper-parameters of LSTM, including the number of hidden layers (NHL), the number of hidden units (NHU), epochs (E), the mini-batch size (MBS), and learning rate (LR), directly determine the estimated results, whereas, few studies use meta-heuristic algorithms to optimize the hyper-parameters of LSTM for more precise estimation of Ep.
In this paper, two typical ML models, i.e. LSTM and SVM, were selected as main estimating modules, and two new meta-heuristic algorithms, including GWO and WOA, were employed to obtain the optimal hyper-parameters of ML models, and Kendallτ-correlation coefficient was employed to determine the input combinations of ML models. The proposed hybrid models, including Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM, and Kendall-τ-WOA-LSTM, were employed to estimate the monthly pan Ep with limited meteorological information, and the superiority of the proposed models was tested by using the standard evaluation metrics. The aims of this study were (1) to provide a novel approach for monthly pan Ep estimation with limited meteorological variables; (2) to obtain more robust and precise estimating results by coupling LSTM with heuristic algorithms and data preprocessing technique; (3) to find the optimal and minimum meteorological parameters to be observed in the study area. Compared to previous studies14,15,16,17,18,19,20,21,22,36,37,38, the proposed models simultaneously account for data preprocessing and hyper-parameters optimization of deep learning models, and can be recommended as an effective method to estimate Ep with limited meteorological information in dryland.
Materials and methods
Case study
This study was conducted in the Shapotou (37°32′ N, 105°02′ E), Ningxia Hui Autonomous Region, China. Figure 1 shows the location map of the study area. This area is characterized by densely distributed trellis dunes, and it has the typical arid climate with scarce precipitation and huge evaporation, where the annual average precipitation is 180 mm and the annual average evaporation is 2520.4 mm3. To prevent the damage of sand erosion and promote regional ecological restoration, the artificial sand-binding vegetation belts were established in 1956a, and over subsequent years (1964a, 1981a and 1987a)4,12,13. It has been proved that revegetation is an effective approach for rehabilitation in arid sandy land39, ensuring the sustainability of artificial sand-binding vegetation under scarce precipitation and huge Ep is challenging for ecologists and land managers. Therefore, accurate estimation of Ep is of great theoretical and practical significance for understanding regional drought, managing and applying limited water resources, and determining the composition, structure, spatial distribution, and scale of artificial sand-binding vegetation.
Data collection and analysis
The monthly meteorological variables needed to accomplish this study, including the monthly average temperature (T), the minimum air temperature (Tmin), the maximum air temperature (Tmax), the monthly precipitation (P), and the monthly average wind speed (WS), were compiled from the Shapotou Desert Research and Experiment Station from 1991 to 2018a. The data during 1991a–2010a was utilized as the training set, and the data during 2011a–2018a was used as the validation data set. Table 1 shows the minimum, maximum, mean, variance, skewness, and kurtosis of those measured meteorological parameters. As shown in Table 1, the average annual temperature in Shapotou during 1991–2018 was 10.8 ℃, with low-temperature and high-temperature extremes of − 26.2 ℃ and 40 ℃. The average monthly precipitation is 15.1 mm and the maximum precipitation is 117.3 mm. The average monthly Ep is 210 mm and the average monthly wind speed is 2.8 m/s. The probability distribution of all meteorological parameters is skewed.
Kendall-τ correlation coefficient
The Kendall-τ correlation coefficient is generally used to measure the correlation between two random variables without any assumption of population distribution. The definition of the Kendall-τ correlation coefficient is
with the sign function
Machine learning models
Long short-term memory (LSTM)
LSTM was designed to solve the gradient vanishing problem in RNN40. The significant difference between LSTM and RNN is that LSTM addresses the long-term dependency problems by adding repeating modules (cell) to store the information of the previous nodes41. Thus, LSTM was employed to estimate the evaporation in the study area. Figure 2 shows the internal structure of the LSTM cell, each memory cell consists forget gate \(F_{t}\), input gate \(I_{t}\), and output gate \(O_{t}\), which are updated in the iterative process with
Support vector machine (SVM)
SVM is a typical shallow ML model that exhibited better model performance than other ML models to solve the nonlinear fitting problems by using kernel trick and Vapnik–Chervonenkis theory23,42. Thus, SVM was widely used to estimate Ep with limited meteorological variables in the field of hydrology5,12,13,23,29.
The regression coefficients are determined by solving the following problem
The regression function \(R(x)\) can be obtained by using Karush–Kuhn–Tucker’s method, which is
where \(C > 0\) denote the penalty coefficient, \(\xi_{i}\) and \(\eta_{i}\) are the slack variable, \(\alpha_{i}\) and \(\alpha_{i}^{*}\) are Lagrange multiplications, respectively. The kernel function
where \(G = 0.5\sigma^{ - 2}\) denotes the radius of \(k(x,x_{i} )\).
Meta-heuristic algorithms
Grey wolf optimizer (GWO) algorithm
GWO algorithm is a new meta-heuristic algorithm, the search process of GWO is inspired from the population hierarchy and predation behavior of the grey wolves43. Figure 3 shows the population hierarchy of grey wolves and the position updating process of GWO, where \(\alpha ,\beta ,\delta\) and \(\omega\) represents the grey wolves in the different hierarchical structures, and the dominance is decreased in sequence. In the simulation process, the distance and position vectors of different hierarchies are updated as
where the coefficient vectors \(\overrightarrow {{\mathbf{A}}} = \overrightarrow {{{\varvec{\upalpha}}}} (2\overrightarrow {{{\mathbf{r}}_{1} }} - 1)\), and \(\overrightarrow {{\mathbf{C}}} = 2\overrightarrow {{{\mathbf{r}}_{2} }}\), the random vectors \(\overrightarrow {{{\mathbf{r}}_{1} }} ,\overrightarrow {{{\mathbf{r}}_{2} }} \in \left[ {0,1} \right]\), the attenuation factor \(\overrightarrow {{{\varvec{\upalpha}}}}\) varies from 2 to 0. A more detailed description of GWO, we refer to Mirjalili et al.43 (Fig. 3).
Whale optimization algorithm (WOA)
The WOA originated from the bubble-net feeding behavior of the humpback whale44. In the iterative process, the location vector of prey \(\overrightarrow {{{\mathbf{X}}^{ * } (t)}}\) is regarded as the current best candidate solution, the humpback updates the positions vector X(t) along a spiral-shaped path, namely
the attenuation coefficient vectors
and
where b is a constant, r is a random variable in [0,1], the vector \(\overrightarrow {{\mathbf{\alpha }}}\) decreases from 2 to 0, and l varies from 0 to 1, the random number p is used to judge whether the search process enters the bubble attack stage or performs the global search mechanism44.
Hybrid models
In this study, the hybrid models, including Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM, and Kendall-τ-WOA-LSTM, were proposed and employed to estimate the monthly pan Ep in the study area with incomplete meteorological information. It should be noted that Kendall-τ-WOA-LSTM denotes the LSTM coupled with the WOA algorithm and Kendall-τ correlation coefficient, the meaning of the Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM models are similar to that of the Kendall-τ-WOA-LSTM. Figure 4 schematically illustrates the estimating processes in this study. As shown in Fig. 4, the estimating process includes three modules: the data pre-processing module, the parameters optimization module, and the model evaluation module, the main steps are as follows:
Step 1. The Kendall-τ correlation coefficient was employed to recognize the effective input variables of each ML model, and the training and testing data were normalized by using the min-max normalization method.
Step 2. SVM and LSTM were selected as the main estimating modular to achieve accurate estimate the evaporation in the study area.
Step 3. WOA and GWO were used to find the best penalty coefficient (C) and radius (G) of the SVM, and determine the optimal hyper-parameters of LSTM, including NHL, NHU, E, MBS, and LR, respectively.
Step 4. The root mean squared error (RMSE) was used to choose the best hybrid models with optimal hyper-parameters from 5 replications for each fixed meteorological parameter, and the optimal input meteorological parameters were determined according to the model performance.
Step 5. The estimated performance of the proposed models was compared by using the standard statistics metrics.
Step 6. The optimal estimating model was determined based on the evaluation results.
Evaluation metrics
In this paper, the evaluation metrics, including RMSE22, the normalized mean squared error (NMSE)12, the mean absolute error (MAE)9,13,22, the mean absolute percentage error (MAPE)14,22, and Nash–Sutcliffe coefficient of efficiency (NSCE)12,13 were employed to assess the model performance. The definition of those evaluation indexes are as follows:
where \(Ep_{i}\) and \(\widehat{Ep}_{i}\) denoted as the desired and actual outputs. It should be noted that RMSE, NMSE, MAE, and MAPE are generally employed to describe the error of the estimated results, those evaluation metrics approach 0 suggesting that the outputs of proposed models are close to the desired results. Thus, RMSE, NMSE, MAE, and MAPE are regarded as negative statistical metrics12,13. NSCE can be employed to describe the model efficiency and measure the goodness of fit, NSCE close to 1 indicates the model has good fitness, thus, NSCE is regarded as positive evaluation metric12,13. The list of abbreviations used in this manuscript is shown in Table 2.
Results
As mentioned above, the SVM and LSTM were regarded as the main modular to compute the monthly evaporation, respectively. To determine the input combination of ML models, the Kendall correlation coefficients between the meteorological variables, including T, Tmax, Tmin, P, WS, and Ep were calculated and shown in Table 3.
Table 3 shows that T, Tmax, and Tmin have the highest correlation with evaporation, and WS and P have the next highest correlation, the Kendall correlation coefficients are 0.731, 0.725, 0.636, 0.418, and 0.386, respectively. With Kendall correlation coefficient greater than 0.5 as the threshold, T, Tmax, and Tmin were selected as the fixed input variables of all ML models, thus, the input meteorological variables combinations are C1 (T, Tmax, Tmin, P, WS), C2 (T, Tmax, Tmin, WS), C3 (T, Tmax, Tmin, P), and C4 (T, Tmax, Tmin). The input meteorological variables combinations, including C1, C2, C3, and C4, were input into the SVM and LSTM to estimate the monthly Ep, respectively. The input dimension of each ML model was the number of input variables.
GWO and WOA are new efficient meta-heuristic optimization techniques that inspired from the predation behavior of grey wolves and humpback whales43,44, respectively. At present, these two algorithms have been widely used to optimize the hyperparameters of shallow ML models, and show better ergodicity and global optimization capacity than other heuristic algorithms12,13,29. However, few studies using GWO or WOA to optimize deep learning models, especially finding the optimal hyperparameters of LSTM in the hydrological field. In this study, to overcome the defects of ML models sensitive to parameter selection, the heuristic algorithms (GWO and WOA) were employed to find the optimal hyper-parameters of SVM and LSTM, respectively. Table 4 shows the parameter setting of the proposed models.
As the randomness of some parameters in heuristic algorithms, the output of hybrid models was inconsistent. Thus, the relevant hyper-parameters and estimation accuracy of each hybrid model were recorded from five replications. Tables 5, 6, 7, 8 show the optimal parameters of each proposed models obtained by the heuristic algorithm in the training stage, and the evaluation indexes are also listed (The estimating results of the proposed models with different input with different input combinations and optimal hyper-parameters are shown in Supplementary File). It should be noted that the optimal hyper-parameters and evaluation metrics of those hybrid models with different input combinations are marked in bold. E.g., Table 5 shows that the optimal hyper-parameters of the hybrid Kendall-τ-GWO-SVM model in the training stage with different combinations are: C1 (C = 214.76, G = 0.001), C2 (C = 700.49, G = 0.014), C3 (C = 339.44, G = 0.013), and C4 (C = 434.08, G = 0.063), the minimum MAPE with the input combinations C1, C2, C3, and C4 in the testing stage are 30.71%, 30.34%, 26.97%, 32.32%, and the maximum NSCE are 0.74, 0.72, 0.77, 0.76, the results of other evaluation metrics are omitted. Table 7 shows that the optimal hyper-parameters of the hybrid Kendall-τ-GWO-LSTM model with input combinations C1-C4 in the training stage are: C1 (NHL = 6, NHU = 15, E = 96, MBS = 24, LR = 0.003), C2 (NHL = 10, NHU = 51, E = 39, MBS = 44, LR = 0.008), C3(NHL = 52, NHU = 89, E = 68, MBS = 16, LR = 0.007) and C4(NHL = 47, NHU = 93, E = 57, MBS = 20, LR = 0.005), the minimum MAPE with the input combinations C1, C2, C3 and C4 in testing stage are 26.17%, 27.97%, 23.03%, 19.96%, and the maximum NSCE are 0.81, 0.80, 0.86, 0.89, respectively. The meanings of the results in Tables 6 and 8 are similar to that of Tables 5 and 7.
The scatter plots of the desired and actual outputs of each model with optimal hyper-parameters and input combinations are shown in Fig. 5. As shown in Fig. 5, the hybrid Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM, and Kendall-τ-WOA-LSTM models can be used to compute the monthly Ep and achieve high computing accuracy with the limited meteorological information, the coefficients of the regression lines are all greater than 1 except for that of the hybrid Kendall-τ-GWO-LSTM model, suggesting that the hybrid Kendall-τ-WOA-SVM, Kendall-τ-GWO-SVM, and Kendall-τ-WOA-LSTM models overestimated the monthly Ep, and the hybrid Kendall-τ-GWO-LSTM model underestimated the monthly Ep to a certain extent. To further compare the model performance of the hybrid Kendall-τ-WOA-SVM, Kendall-τ-GWO-SVM, Kendall-τ-WOA-LSTM, and Kendall-τ-GWO-LSTM models, the Taylor diagram is illustrated in Fig. 6. Taylor diagram shows the standard deviation, RMSE,and Pearson correlation coefficient on a two-dimensional chart, which provides an intuitive way to compare the model performance and reflects the simulation capability of the proposed models10,11,18,35. On the whole, Fig. 6 shows that the hybrid Kendall-τ-GWO-SVM model has higher Pearson correlation coefficient and lesser standard deviation and RMSE than that of the hybrid Kendall-τ-WOA-SVM, Kendall-τ-GWO-SVM, and Kendall-τ-WOA-LSTM models, indicating that the hybrid Kendall-τ-GWO-LSTM has superior performance than that of the other hybrid models.
Discussion
The accuracies of the proposed models are determined by the different input combinations of meteorological variables, finding the optimal input combination of ML models can effectively improve the estimating accuracy. As shown in Tables 5, 6, 7, 8, the computing accuracies present different trends with different input meteorological variables. Taking the hybrid Kendall-τ-GWO-SVM model as an example, when the input meteorological variables are T, Tmax, Tmin, P, WS, the ranges of MAE, MAPE, RMSE, NMSE, and NSCE are [43.27, 45.70], [30.71%, 33.90%], [55.18, 55.68], [0.17, 0.19], and [0.73,0.74], respectively; When the input meteorological variables are T, Tmax, Tmin, and P, the ranges of MAE, MAPE, RMSE, and NMSE are [39.79, 39.84], [26.97%, 27.11%], [51.84, 51.86], [0.14, 0.14], and the maximum NSCE is 0.77, respectively. Thus, the computing accuracies of Kendall-τ-GWO-SVM were significantly improved when the input meteorological variables were optimized.
The statistical metrics in Tables 5, 6, 7, 8 show that RMSE, NMSE, MAE, and MAPE are not necessarily consistent with each other, which will lead to confusion if a different evaluation index is selected as a main benchmark to evaluate the model performance or find the optimal parameters of proposed models. Since MAPE and NSCE are two dimensionless quantities, the results of these two metrics are relatively more stable than the other evaluation indexes12,13. Thus, MAPE and NSCE were employed to determine the optimal input combination in this study (The discussion of other evaluation metrics is similar). As shown in Tables 5 and 6, the optimal and minimum input meteorological parameters of the hybrid Kendall-τ-GWO-SVM and Kendall-τ-WOA-SVM models are T, Tmax, Tmin, and P, the minimum MAPE is 26.97%, and the maximum NSCE is 0.77 from five replications. Tables 7 and 8 show that the optimal input meteorological parameters of the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-LSTM models are T, Tmax, and Tmin, the minimum MAPE and the maximum NSCE of the hybrid Kendall-τ-GWO-LSTM model are 19.96% and 0.89; As for the hybrid Kendall-τ-WOA-LSTM model, the minimum MAPE and the maximum NSCE are 21.30% and 0.88. On the whole, the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-LSTM models have outperformed the hybrid Kendall-τ-GWO-SVM and Kendall-τ-WOA-SVM models, and need fewer meteorological parameters to be observed.
To test whether there is a significant difference in the estimation accuracy of the proposed models under the same input combination, Kruskal–Wallis (K–W) test was performed on MAE, MAPE, NMSE, RMSE, and NSCE in the validation stage. K–W test is a non-parametric test method that does not need to assume that the variables to be tested obey normal distribution45, and its original assumption is that there is no significant difference between the variables to be tested and the level of significance \(\alpha = 0.05\). The results of the K–W test are shown in Table 9.
Table 9 shows that the p-values of the K–W test between the hybrid Kendall-τ-GWO-SVM and Kendall-τ-WOA-SVM models are all greater than 0.05, which means that there is no significant difference in the estimation accuracy of these two models with the same input combination; The p-values of the K-W test between shallow ML models and deep learning models are all less than 0.05, suggesting that there is a significant difference in the estimation accuracy under the same input combination; As for the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-LSTM models, the p-values of K–W test are all greater than 0.05, suggesting that the model performance of these two models have little difference in the estimation of Ep with limited meteorological parameters.
To compare the model performance of the hybrid Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM, and Kendall-τ-WOA-LSTM models, the performance indexes average in the testing stage were calculated, and shown in Table 10. It should be noted that the minimum verage of MAE, RMSE, MAPE, and the maximum average of NSCE were marked in bold. Table 10 shows that the minimum average of MAPE is 28.10% and the maximum average of NSCE is 0.77 when the input meteorological parameters of the hybrid Kendall-τ-WOA-SVM model are T, Tmax, Tmin, and P. Compared with the hybrid Kendall-τ-WOA-SVM model, the hybrid Kendall-τ-GWO-SVM model with the same input combination performed slightly better than the hybrid Kendall-τ-WOA-SVM model, the minimum average of MAPE is decreased from 28.10 to 27.03%, and the maximum average of NSCE is 0.77.
Although both the hybrid Kendall-τ-WOA-SVM and Kendall-τ-GWO-SVM models can be used to accurately simulate Ep with limited meteorological parameters, the estimation accuracy of these two models needs to be further improved since shallow ML models can not fully extract the nonlinear-and-dynamic-features between the meteorological parameters and Ep. As shown in Table 10, the minimum average MAPE of the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-LSTM models are 21.91% and 23.51%, and the maximum average of NSCE are 0.87 and 0.84, implying that the estimating accuracy is significantly improved. Compared with Kendall-τ-GWO-SVM, the minimum average of MAPE decreased from 28.10 to 21.91%, and the maximum average of NSCE increased from 0.77 to 0.88, which means that the deep learning models significantly improved the estimating accuracy. In addition, the optimal and minimum input meteorological parameters of the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-SVM models are T, Tmax, and Tmin, suggesting that deep learning models need fewer meteorological parameters to be observed than that of shallow ML models.
Figure 7 intuitively shows the performance indexes average of the proposed models with different input combinations. As shown in Fig. 7, the statistical metrics of the hybrid Kendall-τ-GWO-LSTM and Kendall-τ-WOA-LSTM models were similar to each other in the testing stage, suggesting that those two models can be employed to estimate Ep in dryland. Whereas, the negative evaluation indexes of Kendall-τ-GWO-LSTM are all smaller than that of Kendall-τ-WOA-LSTM, and NSCE showed the opposite trend, which means that the hybrid Kendall-τ-GWO-LSTM model performed better than the hybrid Kendall-τ-WOA-LSTM model, and GWO can obtain the optimal hyper-parameters of LSTM more effectively than WOA. Therefore, the hybrid Kendall-τ-GWO-LSTM model is strongly recommended to estimate Ep with limited meteorological parameters in dryland.
Conclusion
In this study, four novel data-driven models, including the hybrid Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, Kendall-τ-GWO-LSTM, and Kendall-τ-WOA-LSTM models, were proposed to estimate the monthly Ep with limited meteorological parameters, the proposed models simultaneously conduct the input meteorological variables and hyper-parameters optimization. The results illustrate that the optimal input meteorological parameters of the hybrid Kendall-τ-GWO-SVM (with C = 145.35 and G = 0.013) and Kendall-τ-WOA-SVM (with C = 339.44 and G = 0.013) models are T, Tmax, Tmin, and P, the minimum MAPE for both model is 26.97%, and the maximum NSCE is 0.77; the optimal input meteorological parameters of the hybrid Kendall-τ-GWO-LSTM (with NHL = 47, NHU = 93, E = 57, MBS = 20, and LR = 0.005) and Kendall-τ-WOA-LSTM (NHL = 63, NHU = 76, E = 46, MBS = 29, and LR = 0.005) models are T, Tmax, and Tmin, the minimum MAPE are 19.96% and 21.30%, and the maximum NSCE are 0.89 and 0.88, suggesting that Kendall-τ-GWO-LSTM is outperformed the Kendall-τ-GWO-SVM, Kendall-τ-WOA-SVM, and Kendall-τ-WOA-LSTM models, and needs fewer meteorological parameters to be observed. Therefore, the hybrid Kendall-τ-GWO-LSTM model can be highly recommended to estimate Ep without adequate meteorological parameters in dryland.
Although the deep learning models coupled with heuristic algorithms and data preprocessing techniques show fairly higher computing performance than the shallow ML models, the transferability of the proposed models to other locations need to be further tested. In addition, the main estimation modules are mainly focused on one or two ML models, and the estimation results inevitably have systematic overestimation or underestimation, which will inevitably lead to the risk of model selection. Further works will focus on constructing the combination model by integrating multiple ML models to obtain more robust estimating results in different bioclimatic zones.
Data availability
All data analyzed or generated during this study are included in the Supplementary Information.
References
Moazenzadeh, R. et al. Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng. Appl. Comp. Fluid. 12, 584–597 (2018).
Wu, L. F. et al. Hybrid extreme learning machine with meta-heuristic algorithms for monthly pan evaporation prediction. Comput. Electron. Agr. 168, 105–115 (2020).
Li, X. R. et al. Fundamental Ecohydrology of Ecological Restoration and Recovery in Sand Desert Regions of China (Science Press, 2016).
Li, X. R. et al. Hydrological response of biological soil crusts to global warming: A ten year simulative study. Glob Change Biol. 24(10), 4960–4971 (2018).
Wen, X. et al. Support vector machine based models for modeling daily reference evapotranspiration with limited climatic data in extreme arid regions. Water Resour. Manag. 29, 3195–3209 (2015).
Feng, Y. et al. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J. Hydrol. 536, 376–383 (2016).
Feng, Y. et al. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agr. Water Manag. 193, 163–173 (2017).
Malik, A., Kumar, A. & Kisi, O. Daily pan evaporation estimation using heuristic methods with gamma test. J. Irrig. Drain. Eng. ASCE. 144, 04018023 (2018).
Rezaie-Balf, M., Kisi, O. & Chua, L. H. Application of ensemble empirical mode de composition based on machine learning methodologies in forecasting monthly pan evaporation. Hydrol. Res. 50(2), 498–516 (2019).
Elbeltagi, A. et al. Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration. Appl. Water Sci. 12, 152 (2022).
Elbeltagi, A. et al. Modelling daily reference evapotranspiration based on stacking hybridization of ANN with meta-heuristic algorithms under diverse agro-climatic conditions. Stoch. Environ. Res. Risk. Assess. 36, 3311–3334 (2022).
Fu, T. L. & Li, X. R. Hybrid the long short-term memory with whale optimization algorithm and variational mode decomposition for monthly evapotranspiration estimation. Sci. Rep. 12, 20717 (2022).
Fu, T. L. et al. A novel integrated method based on a machine learning model for estimating evapotranspiration in dryland. J. Hydrol. 603, 126881 (2021).
Kushwaha, N. L. et al. Evaluation of data-driven hybrid machine learning algorithms for modelling daily reference evapotranspiration. Atmos. Ocean 60(5), 519–540 (2022).
Fan, J. L. et al. Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water. Manag. 225, 105758 (2019).
Kushwaha, N. L. et al. Data intelligence model and meta-heuristic algorithms-based pan evaporation modelling in two different agro-climatic zones: A case study from Northern India. Atmosphere 12(12), 1654 (2021).
Elbeltagi, A. et al. Forecasting monthly pan evaporation using hybrid additive regression and data-driven models in a semi-arid environment. Appl. Water Sci. 13, 42 (2023).
Pande, C. B. et al. Forecasting of SPI and meteorological drought based on the artificial neural network and M5P model tree. Land 11, 2040 (2022).
Gocić, M. et al. Soft computing approaches for forecasting reference evapotranspiration. Comput. Electron. Agric. 113, 164–173 (2015).
Jain, S. K., Nayak, P. C. & Sudheer, K. P. Models for estimating evapotranspiration using artificial neural networks, and their physical interpretation. Hydrol. Process. 22, 2225–2234 (2008).
Petković, D. et al. Particle swarm optimization-based radial basis function network for estimation of reference evapotranspiration. Theor. Appl. Climatol. 125, 555–563 (2016).
Anurag, M. et al. Deep learning versus gradient boosting machine for pan evaporation prediction. Eng. Appl. Comp. Fluid. 16(1), 570–587 (2022).
Shrestha, N. K. & Shukla, S. Support vector machine based modeling of evapotranspiration using hydro-climatic variables in a sub-tropical environment. Agric. Forest Metoorol. 200, 172–184 (2015).
Fan, J. L. et al. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agr. Forest Metoorol. 263, 225–241 (2018).
Rezaie-balf, M. et al. Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting. J. Hydrol. 553, 356–373 (2017).
Aghajanloo, M. B., Sabziparvar, A. A. & Hosseinzadeh, T. P. Artifificial neural network-genetic algorithm for estimation of crop evapotranspiration in a semiarid region of Iran. Neural Comput. Appl. 23, 1387–1393 (2013).
Kim, S. Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling. J. Hydrol. 351, 299–317 (2008).
Zhu, B. et al. Hybrid particle swarm optimization with extreme learning machine for daily reference evapotranspiration prediction from limited climatic data. Comput. Electron. Agric. 173, 105430 (2020).
Mohammadi, B. & Mehdizadeh, S. Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric. Water Manag. 237, 106145 (2020).
Farshad, A. et al. Application of an artificial intelligence technique enhanced with intelligent water drops for monthly reference evapotranspiration estimation. Agric. Water. Manag. 244, 106622 (2021).
Saray, M. H. et al. Regionalization of potential evapotranspiration using a modified region of influence. Theor. Appl. Climatol. 140(1), 115–127 (2020).
Kisi, O. & Alizamir, M. Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: wavelet extreme learning machine vs wavelet neural networks. Agric. Forest Metoorol. 263, 41–48 (2018).
Abdullah, S. S. et al. Extreme learning machines: a new approach for prediction of reference evapotranspiration. J. Hydrol. 527, 184–195 (2015).
Karbasi, M. Forecasting of multi-step ahead reference evapotranspiration using wavelet- gaussian process regression model. Water Resour. Manag. 32, 1035–1052 (2018).
Dinesh, K. V. et al. Methods to estimate evapotranspiration in humid and subtropical climate conditions. Agric. Water Manag. 261, 107378 (2022).
Granata, F. & Di Nunno, F. Forecasting evapotranspiration in different climates using ensembles of recurrent neural networks. Agric. Water Manag. 255, 107040 (2021).
Chen, Z. J. et al. Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. J. Hydrol. 591, 125286 (2020).
Majhi, B. et al. Improved prediction of daily pan evaporation using deep-LSTM model. Neural Comput. Appl. 32, 7823–7838 (2019).
Solé, R. & Levin, S. Ecological complexity and the biosphere: The next 30 years. Philos. Trans. R. Soc. B. 377, 20210376 (2022).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 735–1780 (1997).
Zuo, G. et al. Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting. J. Hydrol. 585, 124776 (2020).
Vapnik, V. Statistical Learning Theory (Wiley, 1998).
Mirjalili, S., Mirjalili, S. M. & Lewis, A. Grey Wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014).
Mirjalili, S. & Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016).
McDonald, J. H. Handbook of Biological Statistics 3rd edn. (Sparky House Publishing, 2014).
Acknowledgements
The research was supported by the Creative Research Groups of China under Grant No. 41621001.
Author information
Authors and Affiliations
Contributions
T.L.F. wrote the main manuscript text, and X.R.L. reviewed and checked the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fu, T., Li, X. Estimating the monthly pan evaporation with limited climatic data in dryland based on the extended long short-term memory model enhanced with meta-heuristic algorithms. Sci Rep 13, 5960 (2023). https://doi.org/10.1038/s41598-023-32838-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-32838-4
This article is cited by
-
Hybrid and Integrative Evolutionary Machine Learning in Hydrology: A Systematic Review and Meta-analysis
Archives of Computational Methods in Engineering (2023)
-
Investigating hybrid deep learning models and meta-heuristic algorithms in predicting evaporation from a reservoir: a case study of Dez dam
Earth Science Informatics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.