Abstract

Reference evapotranspiration is one of the most significant factors in the hydrological cycle since it has a great influence on water resource planning and management, agriculture and irrigation management, and other processes in the hydrological sector. In this study, an efficient and local predictive model was established to forecast the monthly mean over Turkey based on the data collected from 35 locations. For this purpose, twenty input combinations including hydrological and geographical parameters were introduced to three different approaches called multiple linear regression , random forest , and extreme learning machine . Moreover, in this study, large investigation was done, involving the establishment of 60 models and their assessment using ten statistical measures. The outcome of this study revealed that the ELM approach achieved high accurate estimation in accordance with the Penman–Monteith formula as compared to other models such as and . Moreover, among the 10 statistical measures, the uncertainty at 95% indicator showed an excellent ability to select the best and most efficient forecast model. The superiority of in the prediction of mean monthly over and approaches is illustrated in the reduction of the parameter to 49.02% and 34.07% for and models, respectively. Furthermore, it is possible to develop a local predictive model with the help of computer to estimate the using the simplest and cheapest meteorological and geographical variables with acceptable accuracy.

1. Introduction

1.1. Background

Global warming has become a great concern of researchers and world leaders. It is well known that the Earth surface temperature is increasing significantly during the last decades [1, 2]. Water storage, hydrological cycles, and, consequently, water availability are directly affected by global warming [35]. Thus, one of the most essential indicators of climate change is the referenced evapotranspiration , which is considered as the most complicated element in the hydrological cycle [68].

mainly occurs due to two complicated processes. The first is when water evaporates from the surface of the soil, lakes, rivers, etc., and this process is called physical evaporation. The second process is the transpiration phenomenon in crops and plants, which is called biological transpiration [9, 10]. Undeniably, the evaporation process requires energy to convert water from the liquid phase to the vapor phase. Therefore, the main parameters that affect the process is the sun radiation, wind speed and direction, air temperature, and humidity [911]. In conclusion, the also represents the link between surface energy and carbon cycle [12, 13]. Based on the stated literature, a precise measurement and prediction of is essential for quantifying surface energy and water reserves worldwide [1416]. Thus, providing accurate models for weather and climate change diagnosis is crucial [14, 1721].

1.2. Literature Review

Due to the significant effect of on climate change, Earth temperature, crops and plants, water management, and runoff quantity, many researchers have studied the prediction over the last decades [9, 2224]. The Penman–Monteith is the most widely used model, and it is considered as a physical model as it is an approximate linearized solution governing energy balance, thermodynamic state, vertical heat, and water-vapor diffusion [9, 23]. However, requires many meteorological data to be applied, which can be considered as a drawback for this equation [25, 26]. In any case, there are many models applied to estimate the around the world. Examples of such are the constant heat method by including heat pulse [27, 28] and the Shuttleworth–Wallace S-W method to estimate the transpiration from plants [2931]. It is worth mentioning that the number of empirical equations for modeling the evaporation has exceeded 100 due to the importance of measurements and the variety of meteorological data around the world. Therefore, it is impossible to decisively compare these models [3234].

Recently, the development of artificial intelligence has received significant attention from communities in the hydrological and environmental sciences, including water treatment [35, 36], hydrology [3741], water reservoir optimization [4244], remote sensing applications [45, 46], etc. Consequently, due to the highly nonlinear characteristics associated with the data, AI technology presented a suitable modeling approach to solve many issues with the empirical equations that has been used before [47]. Kumar et al. utilized the artificial neural network (ANN) in 2002 for predicting , where different ANN architectures were implemented for evaporation simulation. The radial neural network yielded the best results for evaporation simulation, and it calculates the number of layers and neurons based on a trial and error process. [48]. Many researchers have followed his footprint in predicting [4952]. In addition, an adaptive network-based fuzzy inference system (ANFIS) has been used to predict [53, 54]. The ANFIS and ANN techniques and empirical equations were used in the evapotranspiration field, and it was found that the ANFIS and ANN methods were much better than the empirical equations. [55, 56]. However, it is well established that ANN models easily get stuck in a local minimum, and, therefore, recent studies have employed new models adopting other AI techniques for modeling [47]. Many approaches have been utilized for this purpose, including support vector machine SVM [53, 57, 58]. The SVM is well known to have a basic form, but one of the drawbacks of SVM is the unknown parameter [59]. Another approach in the field of simulation data is genetic programming (GP). This approach consists of measurement programming capable of obtaining input data and producing a nonlinear interaction between data to determine the outputs [55, 6062].

In order to enhance the ability of AI models, many algorithms were associated with different AI methodologies, including the use of wavelet transform regression model [63], wavelet coupled with ANN [64, 65], and wavelet enhanced extreme learning machine [66]. Others utilize random forest (RF) algorithm to enhance the AI techniques. Due to its success over a variety of datasets, high precision estimation, a small range of user-defined parameters, the ability to estimate relative value of the variables, and its ability to preclude overfitting, the RF approach has become extremely popular in recent years [26, 34, 67, 68]. Recently, due to its higher efficiency and much quicker calculation speed, a newly proposed machine learning technology called the extreme learning machine (ELM) has confirmed it to be a promising estimation tool [69]. First, Abdullah et al. (2015) used ELM to forecast at three Iraqi stations and concluded that the ELM model is highly efficient and computerized at high generalization speeds [70, 71]. Ever since, the ELM for predictions has been used by many studies in different climate environments [7274]. To the best of the authors' knowledge, all models presented in the literature were established to simulate the evapotranspiration using a single model for each location or case study. Furthermore, some researchers employed modern techniques and used different case studies, but they could not produce a general model that could take into account more than one case study. In this study, an effort is being made to include a robust modeling methodology using a variety of locations along the southern coast of Turkey to create a comprehensive general model to forecast . The other objective of the study is to predict mean monthly from limited data, which can be easily available.

1.3. Motivation of the Study

Due to the significance of , there are numerous studies that have been conducted to estimate it, such as AI techniques and empirical models. In general, these approaches achieved a satisfactory success. However, creating one robust model for conducting local throughout a specific country based on data collected from different sites is still considered as a challenging issue that needs to be addressed. Moreover, recognizing the statistical measures that are effectively used to assess the feasibility of a certain model are also very significant in the selection of the best predictive model. Thus, in this study, a broad investigation is performed using three different approaches, multiple linear regression (MLR), random forest (RF), and extreme learning machine (ELM), based on twenty combinations of meteorological and geographical indicators constituting 60 predictive models. Moreover, there were 10 efficient statistical measures employed to assess the accuracy of the performance for each model separately in accordance with the Penman–Monteith equation. Although the formula of Penman–Monteith is well-known in the prediction of , it does pose some issues regarding the measurement of some factors, which may not be relevant to every site such as solar radiation and sensible heat flux into the soil. Therefore, a robust local model is established in this study, which can efficiently predict the local mean monthly over the southern coast of Turkey using conveniently and inexpensively measurable parameters. Furthermore, the evaluation of the outcomes of the models will be further assessed and validated against the actual values, which are calculated by the Penman–Monteith equation.

2. Case Study and Data Collection

In this study, the data collected from 35 meteorological stations in Turkey are used. The geographical location of these stations covers large area located between a latitude from 36° to 38°. Figure 1 shows the location of each meteorological station separately. It can be seen that the majority of these stations are located in the south of Turkey on the coasts of the Mediterranean Sea. The data collected from the general directorate of Turkish state meteorological includes several long-term monthly meteorological data such as temperature, humidity, wind speed, and reference evapotranspiration. Furthermore, the dataset comprises of the long-term mean monthly variables covering the period from 1975 to 2010. The highest temperature is recorded at the Mut station at 46.7°C, while the lowest temperature is recorded at Goksun station at −33.5°C. Figure 2 showed the long-term monthly mean (), overall in Turkey. It can be observed that the highest value of () occurred in July followed by June, August, and May, consecutively.

It is worth mentioning that Turkey primarily has a complex climate due to its location and topography. The Mediterranean climate in southern Turkey is predominant with warm and dry summers and wet and moderate to cold winters. Continental weather is predominant in central Turkey with warm and dry summers and cold winters. The oceanic atmosphere in northern Turkey is seen through warm and rainy summers and cold and wet winters. This study ensures that the stations chosen to measure are spread nearly uniformly across southern Turkey in order to determine spatial differences in mean monthly values and their time characteristics.

Due to the lack of data, the FAO embraced the PM equation as a standard methodology for calculating the reference . The FAO56-PM can be used on hourly or daily scales to supply the data needed for machine learning approaches. The equation is suggested for hourly time measures as expressed in the following equation [7, 75]:where is the reference evapotranspiration in (mm/day), represents the slope of the saturation vapor pressure function at air temperature in , is the net solar radiation in , is the soil heat flux density in , is the psychometric constant in , is the mean air temperature in (°C), is the average 24-hour wind speed at 2m height above the ground surface in , is the saturation vapor pressure in , and is the actual vapor pressure in . Finally, it is important to mention that the dataset is subjected to the normalization process (between 0 and 1), for all input variables and their appropriate targets. This process is very important in order to boost the predictive model performance. Later, the data are simulated using three modeling approaches, namely, RF, ELM, and MLR.

3. Methodology

3.1. Random Forest

Random Forests (RF) is an algorithm that manages regression issues of high dimensions. This method is tree-based, where all trees have random variables in the selection, and the forest is established from several trees of regression and is clustered together [34, 76]. The tree is chosen as a random subset of variables that will be used to determine the result of the prediction. Whereas, in the random forest learning process, two significant parameters are identified: the first one is the number of trees and the second one is the number of variables in each division . After fitting single tree into the ensemble (bagging procedure), the final decision is made by averaging the output. The bias between the bagged trees is equal to that of the single tree, while the variance is reduced as the correlation between the trees is reduced [77].

For regression-based RF formation, in order to get the tree predictor numerical value, the process starts by growing trees on the bias of a random vector . For any given numerical estimator, the mean squared generalization error can be expressed as follows [78]:

The RF predictor is generated by taking an average over of a single tree. Here, in this respect, the following theorems line up:

Theorem 1. By expanding the number of trees in the forest, the error will then be expressed as follows:

This equation's right hand demonstrates the generalization error of the forest. Similarly, the average tree generalization error can be found from the following equation:

Theorem 2. If we assume for every , thenwhere represent the weight correlation [76].

3.2. Extreme Learning Machine (ELM)

The ELM, an advanced learning algorithm, is a machine learning modeling approach assembled with single-layer feedforward neural network (SLFN) [79]. ELM’s primary strength is that the weight of the input parameters is determined arbitrarily, while the output parameter weights are measured analytically using the Moore–Penrose approach (Huang et al., 2006). The SLFN function combines the hidden node additives and activation function, which can be represented mathematically as follows:where represents the ELM model output function; represents the input variable, and represent the learning parameters hidden nodes, and stands for the number of hidden nodes. The governs the connecting weight to the -hidden node between the output nodes. The output vector implies the output of the hidden node. The sigmoid activation solves the additional hidden node as follows:where and represent weight and basis values for each -node in its hidden layer. In the algorithm of ELM, the weight of the input layer and bias values are randomly generated. Figure 3 showed the basic structure of the ELM.

For the input and output variables, the arbitrary distinct sample is denoted as . Based on the above findings, equation (6) can be rewritten as follows:where the Hussain matrix is represented as

In addition,

represents the activation function and in this current study, log-sigmoid transfer function is used, which is expressed aswhere as indicated in equation (6).

The ELM model with independent sets, which can result in zero learning errors, can be trained in the form of SLFN [80]. In addition, even with fewer hidden neurons than the number of separate samples , the ELM will specify random parameters for the hidden nodes. Ultimately, the output weights can be calculated by means of pseudoinverse of , which makes a limited error range . The hidden node parameters (weights and bias) are given random values during the training phase [81].

3.3. Multiple Linear Regression

In the form of the local climatic parameters of minimum and maximum temperature, relative humidity, wind speed and sun hours, the data can be formulated using multiple linear regression (MLR) models. Herein, the is calculated by MLR. The MLR can be employed to describe the relationship between dependent and independent variables as a multivariate statistical tool described by the following equation:where the response variables are represented by , which is also the predicted mean monthly .The independent variables and the predictors are represented by , and the coefficients of regression are represented by , which can be acquired by the following equations [82]:where the error of the estimated and real values of is represented by and , respectively.

3.4. Model Evaluation

The selection process of the best predictive model is of great importance to achieve high accurate predictions. Therefore, in this study, ten statistical parameters have been used to assess the performances of each predictive approach [83, 84]. The quantitative parameters are as follows:(i)Mean absolute error . It can be expressed by assuming the absolute errors divided by the number of total observations. This indicator is widely used in water resources and hydrological sectors to assess the predictive models because it provides significant information on how closely the simulated data points match with the actual ones. The mathematical expression of is shown in the following equations [85, 86]:where is the forecasted error,(ii)Root mean square error . It is a statistical parameter often used to compare the forecasting errors of several models. The lower value usually points out to better predictions. The can be derived using the following equation [87, 88]:(iii)Mean absolute relative error . This indicator can be expressed as an absolute relative computer error (difference between actual and forecasted points). When the parameter is represented as a percentage, it is called the mean absolute relative error , which is expressed by the following equation:(iv)Root mean square relative error is expressed as(v)Relative root mean square error can be calculated by dividing the criteria over the mean of actual data points. This parameter is very vital in assessing the accuracy of a model. In accordance to [89], the model is considered excellent if , good if , fair if , and last, the model can be considered poor if . The mathematical expression of the RRMSE is depicted as follows:(vi)Mean bias error . It discloses the tendency of a model and explains whether it overestimates the data or underestimates them and is expressed by the following equation:(vii)Correlation of coefficient . The is a significant factor that can be utilized to efficiently discover the robustness of the relationship between predicted and simulated data points. The mathematical expression of CC is shown in the following equation [82, 90]:(viii)Maximum absolute relative error is expressed as(ix)t-statistic . This statistical test is beneficial for validating and testing the broadband models [91]. As indicator approaches zero, the desired model would be achieved and it is expressed by the following equation:where n is the total number of observations, and are the actual and estimated mean monthly reference of evapotranspiration, respectively.(x)Uncertainty at 95% . This quantitative criterion is very efficient in terms of selecting an efficient predictive model among several models. The indicator provides very useful information on the deviation of a certain model [91]. The can be calculated by the following equation: where SD represents the standard deviation of the difference between true and simulated data points. The value 1.96 is a coverage factor corresponding to 95% confidence level. Last, for visual evaluation, boxplots and scatter plots are presented in the Results and Discussion sections. The coefficient of determination is calculated based on the following equation:

3.5. Model Development

To accurately predict the monthly, three different models were assigned called MLR, RF, and ELM approaches. In accordance with the nature of dataset collected from different locations and sites, and in addition to having different characters, it is a significant and difficult task to create a reliable predictive model. Therefore, the current work is carried out by establishing 20 different input groups for each predictive approach as shown in Table 1. The dataset is randomly divided into phases called training set and testing set. The training set is used for calibration process and model construction, while the testing set is used for examining the performance accuracy of each candidate model. Table 2 shows the statistical description of the variables used in this study. In addition, Table 3 gives information about the Pearson correlation coefficient between all variables and used in the suggested case study. As shown in the table, the maximum temperature variable has the highest correlation with for both training and testing datasets. The table shows that the longitude variable has the lowest correlation coefficient with an of 0.09, while the maximum temperature is significantly correlated with with an R of 0.841.

The process of selecting a more accurate model is relatively tough with ten statistical matrices and three different approaches including several input combinations. Thus, the assessment process is carried out based on two stages. First, during the training set, from each approach, the best three different models (in total nine models) are being selected. In the second step, it is crucial to monitor the performance of those models, which are selected throughout the training phase, during the testing phase and to select the three most efficient models for each approach. Finally, this process would provide much information about each adopted approach, apart from providing a clear and realistic impression of the performance of each predictive model separately.

Furthermore, a robust and effective performance measure is used to assess the capability of each model. Among these statistical metrics, the uncertainty at 95% has been used to assess the performance accuracy for each model and subsequently recognize the best predictive model. Figure 4 describes the prediction process of along the southern coast of Turkey. Last, it is imperative to emphasize that all input variables and their corresponding targets are normalized between 0 and 1. This process is very important to enhance the effectiveness of the predictive models [92, 93]. All models are developed using MATLAB 2017a.

4. Results and Discussion

This section of the study is dedicated to illustrating the forecast results obtained for mean monthly over Turkey via three different predictive models, namely, MLR, RF, and ELM. Twenty scenarios have been presented including different input variables and are introduced to the mentioned models. 60 predictive models are assessed and validated against the Penman–Monteith equation using ten efficient statistical indicators and graphical presentations. A qualified model is one that meets the requirements of most of the mentioned statistical parameters. The dataset collected from 35 stations is divided randomly into two sets: the training phase (75%) is used to calibrate the models, and the rest of the data is used for validation purposes.

In terms of quantitative assessment, Table 4 exhibits further information about the performances of the proposed 60 different models based on different input variables through the training phase. The three predictive modeling approaches have achieved different accuracies in accordance with statistical measures. Moreover, it is difficult to rank the models in accordance with the achieved accuracies, but the RF approach relatively showed the best accuracy predictions. However, for conducting a fair comparative analysis for each adopted approach, from each modeling technique, the best three models with different input combinations have been selected for performing further and fair comparisons.

Thus, among the 60 predictive models, only the best nine models are selected for optimally carrying out the efficient quantitative analysis. Reducing the number of models has many advantages. For instance, it ensures conducting a powerful and excellent compilation and, thereby, optimally choosing the best statistical matrices. Table 5 shows the performance accuracy of three different approaches based on several input variables.

In general, remarkably, it can be noted from Table 5 that the most frequent combination is C1, C4, and C7, consecutively. This means that the component includes all useful parameters that have effective impact on mean monthly. At a glance, it can be said that the RF models provided more accurate estimations of mean monthly than MLR and ELM approaches. Here, it is essential to mention that the uncertainty at 95% indicator is the most efficient factor, which plays a major role in the evaluation process of the best model. Furthermore, the RF-C7 produced the lowest value (1.441) of as compared with other predictive models. Besides, the model mentioned also presented high performance based on the rest of the statistical measures. Nevertheless, the accuracy of MLR was unsatisfactory and recorded a high uncertainty with of 6.077. This is due to the fact that the nonlinear relationship between predictors and their corresponding targets was not considered. Finally, the performances of ELM modeling approaches were satisfactory according to the indicator. With respect to the ELM approach, the best model was ELM-C1, which recorded 4.126 of.

Essentially, it should be taken into consideration that promising estimations were obtained during the training phase, which is primarily employed to effectively calibrate the models based on the known input variables and targets. However, the testing step is vital in assessing the performance of a model since it examines the model’s accuracy based on unseen target values. This advantage does not exist in the training set. Therefore, the reliable model should have a stable and balanced performance in both training and testing phases.

After demonstrating the performance of the models during the calibration (training) phase, it is very important to see the accuracy of the adopted models during the testing phase. Table 6 demonstrated the performance skill of each predictive model using different input parameters.

It is also important to carefully follow-up the performance of nine efficient models that were chosen in the training set (MLR-C1, MLR-C4, MLR-C7, RF-C7, RF-C4, RF-C2, ELM-C1, ELM-C4, and ELM-C14). Besides the mentioned models, there were additional three models (RF-C5, RF-C6, and ELM-C2) that have effectively provided satisfactory estimations. The heat-map diagram as presented in Figure 5 provided significant information about the best modeling performance based on ten statistical parameters.

Although RF models generated acceptable precision during the training set, it exhibited the worst accuracies as compared to ELM and MLR techniques in the testing set. Moreover, these models produced high uncertainty, and the values of are 19.59, 27.1, 22.74, 22.35, and 26.72% for RF-C7, RF-C4, RF-C5, RF-C6, and RF-C2, respectively. Additionally, the other statistical parameters such as RMSE and RRMSE also gave further information about the weaknesses of RF models. It is undeniable that the RF approach suffers from overfitting issue. On the other hand, the MLR models showed much better performance capacity than RF models. Finally, the ELM models achieved high precision in the prediction of mean monthly in accordance with the P-M equation. Moreover, the ELM-C1 is considered the best predictive model and recorded the highest CC (0.957), lowest RMSE (1.155), MAE (0.946 mm/month), (10.37), RRMSE (16.54%), and (9.989%), respectively. Moreover, among ten statistical parameters, the most efficient parameters, which can easily recognize the best predictive model are, RMSE, and RRMSE. The supremacy of ELM approaches was evaluated in accordance with its ability of reducing the most significant statistical measures (, RMSE, and RRMSE) throughout the testing phase. The results obtained as shown in Figure 6 illustrated the superiority of ELM-C1 over other predictive models in reducing the value of the mentioned three statistical metrics. The effectiveness of the ELM-C1 model over MLR-C1 powerfully appeared in reducing the RMSE and parameters to 10.05% and 34.07%, respectively. Moreover, the prediction accuracy of ELM-C1 accomplished better outcome when it is compared with the RF-C7 model, where there was a significant improvement in the reduction of RMSE and parameters, which reached 16.36% and 49.02%, respectively.

The box plot diagram, shown in Figure 7, presented the best candidate models, which were employed to predict the average monthly. It can be remarkably noted that the RF models could not perform well as compared to the other approaches such as MLR and ELM. However, the MLR models had the modest performance and poor accuracy as compared to ELM approaches. It can be said that the best performance approach in prediction average monthly is ELM, followed by MLR and RF techniques, consecutively. The ELM-C1 achieved the best estimation accuracy where the median and interquarter range (IQR) were found to be very close to the actual median and IQR.

Line graph and scatterplot of predicted and actual average monthly are provided in Figures 8(a) and 8(b) during the testing phase. The scatter plot effectively presents useful visualization information on the diversion between observed and predicted values, and the coefficient of determination showed the determination between them. Based on Figures 8(a) and 8(b), the proposed ELM-C1 has better prediction performance than other comparative models in terms of providing a higher value of .

Considering the best ELM models, it can be noted that this approach often requires a relatively higher number of input parameters in comparison with RF and MLR approaches. The logical explanation of this phenomenon is that the data size includes 35 locations of different characteristics. Besides, the geographical factors are very important when developing local and robust models based on a dataset collected from several locations.

For further assessment, it is vital to examine the ability of the suggested model (ELM-C1) against several predictive models that were proposed in the previous studies to estimate over different locations around the world. In this respect, the findings obtained during the test step by using the ELM-C1 model are validated against several predictive models carried out in the literature by some researchers. Khoob [94] conducted a study using ANN to predict in Safiabad station, which is located in Southern Iran. The outcome of this study revealed that the suggested ANN model performed very well when compared with the actual values of. The most significant observation is that the accuracy of the model was very good with a high of 0.9135. Moreover, another study was carried out using the ANN technique to predict in the Reynolds Creek Experimental Watershed in Southwestern Idaho, USA [95]. The obtained results showed that the ANN model managed to effectively estimate with high accuracy of prediction and fewer forecasted error reported . Additionally, another study was conducted in Southeast Asia using different AI models; Wu and Fan [96] introduced a comprehensive study to accurately predict over 14 stations located in China. The researchers employed eight different techniques such as ANFIS, MLP, GRNN, KNEA, SVR, XGBoost, M5Tree, and Mars. These models were developed based on the collected data, which includes precipitation and temperature variables. The outcomes of the study illustrated that the machine learning techniques could provide satisfactory prediction of by using only one variable (temperature). In addition, the estimated values were close to the actual ones, and the reported average indicator was equal to 0.829. On top of that, Izadifar and Elshorbagy [97] employed three data-driven models called MLR, GP, and ANN to estimate over Northern Alberta, Canada. Statistical analyses of the study revealed that the most influencing variables on were net radiation and temperature. Besides, the ANN approached trained Bayesian regularization algorithm presented more accurate results than other comparable models, which recorded the highest accuracy of prediction .

Significant attempts were made to increasingly constitute powerful models to accurately estimate based on few parameters, which can easily be measured such as temperature. Zhu et al. [69] managed to develop a hybrid model called PSO-ELM to predict over five meteorological stations located at the Northwest of China. The proposed model produced more accurate estimations than other AI models and employed empirical equations. The proposed model (PSO-ELM) were built based on temperature only, in general, and produced fewer forecasted errors and higher correlation with the actual one (mean of 0.8905).

Overall, the published studies, as illustrated above, have been carried out to estimate over several case studies. Even though researchers developed powerful approaches that attained high accuracy, the approaches mentioned could not be applied to constitute a general model instead of simulation of each case study separately. However, in this current study, a single model (ELM-C1) successfully managed to simulate the based on the data collected from dozens of stations. In addition to the simulation process of 35 stations in one single model, the other interesting feature is that the predictability of the proposed model was recorded high .

5. Conclusion

Evapotranspiration is considered as one of the most significant factors in the hydrological cycle. Although the well-known Penman–Monteith equation for computing the evapotranspiration exists, there are some difficulties in accurately calculating some of its parameters such as solar radiation and sensible heat flux into the soil. Therefore, in this study, three different approaches were employed, namely, ELM, RF, and MLR, based on geographical and meteorological parameters for the prediction of the mean monthly evapotranspiration over southern Mediterranean coast of Turkey. Besides, twenty different input combinations assigned according to data collected from 35 meteorological stations were established. Ten efficient statistical parameters have been used to assess the 60 predictive models. The outcomes of this study revealed that the performance of the ELM approach outperformed the MLR and RF models. In addition, the efficiency of the proposed ELM produced less computed errors during the testing set. It is worth mentioning that the measure played a vital role in the selection of the best model accuracy among the established 60 models in this study. The other essential observation that can be drawn is that the RF approach provided the perfect accuracy during the training phase, and when it comes to examining its performances during the testing phase, the predictions’ accuracies were very poor and disappointing. Finally, this study suggested the use of ELM in building local models based on different stations and weather conditions. In addition, a study that covers the whole area of the Mediterranean Sea via one robust model is needed. For further investigation, feature selection approaches can possibly be integrated prior to the predictive learning process to extract the essential input variables for the prediction matrix [98, 99].

Abbreviations

:Reference evapotranspiration
:Maximum monthly temperature
:Minimum monthly temperature
:Average monthly temperature
:Standard deviation
:Wind speed
:% Maximum relative humidity
:% Minimum relative humidity
:Mean absolute error
:Estimated error between actual and predicted ETO
:Root mean square error
:Mean absolute relative error
:Root mean square relative error
:Relative root mean square error
:Mean bias error
:Correlation of coefficient
:Correlation of determination
:Maximum absolute relative error
:t-statistic
:Uncertainty at 95%
:Extreme learning machine
:Random Forests
:Multiple-linear regression
:PenmanMonteith.

Data Availability

All the data are available upon request.

Conflicts of Interest

The authors have no conflicts of interest.

Acknowledgments

The authors would like to thank AlMaaref University College (AUC) for funding this research.