Introduction

Nowadays, the world is almost impossible to envisage without its interrelationship and dependence on electricity1. This electricity is mainly produced with fossil fuels and based on statistics, the global primary energy demand will increase by over 59% between 2002 and 2030 2. However, the evidential environmental impact of the current (fossil fuels) energy resources, as well as the need to reduce its climate change effect, led to the development of renewable energy sources (RES) 3. These RES have experienced significant growth in recent decades and they are projected to have as much as 39% share in global electricity generation by 2050 4. Solar energy is a sustainable, clean, and extremely abundant RES 5 that poses a very low risk to its immediate environment and the world at large. The critical investigation into the accessibility and availability of renewable energy (RE) resources has witnessed a continuous evolvement, especially in developing countries. There is a rapid and consistent escalation in electricity demand in many developing countries as they strive toward advanced technological implementation and globalization 6. Therefore, it is imperative to initiate and encourage RES development in these regions.

Solar radiation influences agricultural production, atmospheric circulation, hydrological processes, public health as well as ecological services, and the comprehensive knowledge of this parameter at any location is important to its environmental sustainability and economic potential 7. Moreover, solar radiation is a crucial and decisive parameter for solar energy management and generation. Information about global solar radiation is also significant in many applications including; RE-usage, hydrology, and meteorology 8. The recent efforts and push for the replacement of fossil fuels with RES have made solar radiation a more important meteorological variable used to simulate and measure RE potential in any location. Unlike other meteorological parameters like relative humidity, temperature, and sunshine duration, the observation stations for solar radiation measurement are not globally available. This is due to the complicated measurement techniques and relatively high cost. Therefore, developing an accurate method or model to predict solar radiation is very important 9.

Typically, the models for solar radiation prediction or estimation can be classified into empirical, statistical, physical, and machine learning models 9. While physical models such as sky-image-based models explore the mechanism between solar radiation and other meteorological parameters 10, empirical models are aimed at developing a linear or non-linear regression equation for solar radiation estimation 11. Statistical models such as the autoregressive moving-average model (ARIMA), are developed based on statistical correlation 12. In recent years, artificial intelligence (AI) models have been used for better solar radiation prediction. The ability of these models to simulate nonlinear and complex relationship mapping as well as the capability to learn and extract meaning features from the input data via backpropagation and parameter update make it more desirable for this task 13.

The adoption of AI (machine learning and deep learning) models for the prediction or estimation of solar radiation have proven in literature to have a wider application and higher accuracy in comparison to other models. These models can accurately moderate the long-term, medium-term, and short-term prediction of solar radiation 14. Gurel et al. 15 presented the assessment of time series (Holt-Winters), machine learning (feed-forward neural network), empirical models (3 Angstrom-type models), and response surface methodology (RSM) for global solar radiation. Meteorological data obtained between 2008 and 2018 for four provinces in Turkey were used to train, validate, and test the models. Based on the performance evaluation of their models, the R2 varied between 0.952 and 0.993 while the artificial neural network was concluded to present the best results 15. Furthermore, a review of some of the most recent literatures on solar radiation prediction with different models and methods is summarized in Table 1. This table highlights the type of model, case study, the aim of the study, and the performance summary of the models in different works of literature. Based on the articles reviewed in this table, the use of both unsupervised (machine) learning and supervised learning algorithms has been proposed for the forecast of solar irradiance. Therefore, the comparison of these models is one of the aims of this present study. Also, none of the proposed models were able to give a 100% accurate prediction/forecast of solar radiation in all the various locations. Hence the consistent recommendation stated in most of these research articles that future studies are required in this research domain to develop more accurate models for solar radiation forecasting.

Table 1 Summary of recent literature on solar radiation forecast/prediction.

The expansion of solar energy-based technologies and applications will continue 40. Therefore, the reliable estimation of solar radiation including its hourly, daily average, monthly average, annual, 41 and seasonal variability is of paramount importance for the estimation of solar energy capacity and potential 42. As mentioned earlier, the high cost and technological complexity attached to the measurement of solar radiation makes it a more difficult task in many meteorological stations. For example, there are 1798 meteorological stations in Turkey in the year 2020 and only 129 of the stations are capable of measuring solar radiation 43. Also, out of the 756 meteorological stations in China, only 122 of them have the capability to measure solar radiation 44. These further stresses the importance of solar radiation estimation. In most existing works of literature on solar radiation prediction, the prediction was done with different models. However, these models were compared based on the similarity of the class. Also, most models are used to predict a particular type of data type with a specific timestep. This has raised research questions about the adoption of different models for the various dataset, timesteps, and locations. Furthermore, developing countries (especially Africa) have enormous solar energy potential, however, the development of solar-based technologies has been very slow due to many reasons. One of which is inadequacies in the measurements of solar radiation.

Therefore, in this paper, we seek to further the knowledge of literature in this field by comparing different artificial intelligence (AI) models for solar radiation estimations. Eight different AI models namely; convolutional neural network (CNN), artificial neural network (ANN), long short-term memory recurrent model (LSTM), eXtreme gradient boost algorithm (XG Boost), multiple linear regression (MLR), polynomial regression (PLR), decision tree regression (DTR), and random forest regression (RFR) are compared for solar irradiance forecast. Additionally, two hybrid deep neural network models are developed in this study for this task. These models are a combination of two or more deep neural network models namely; ANN-CNN and CNN-LSTM-ANN. In comparison to existing techniques where a specific timestep is adopted, in this study, the models developed will be used to estimate the hourly, every minute, and daily average solar radiation. Also, different datasets such as typical meteorological year (TMY), surface radiation data set for heliostats (SARAH), and The World Bank solar radiation measurement data (WB-ESMAP) dataset are used to test the models developed in this paper. In comparison to literature where a specific solar irradiance data set is used, the research further contributes to literature by considering different measured solar irradiance datasets. These datasets include; global beam direct solar irradiance (GSR), diffused solar irradiance (DSR), daily average solar radiation flux at the surface normal to the direction of the sun (DNI), global horizontal irradiance measured from silicon pyranometer (GHISil), diffused horizontal irradiance from rotating shadowband irradiometer (DHIRSI), and global horizontal irradiance measured from thermopile pyranometer (GHIpyr). These are useful for solar photovoltaics, solar thermal, solar heliostat, solar rooftop, and other solar technology applications.

This study seeks to determine the AI model that has a consistent accurate predictive performance for solar irradiance measured with various methods in different locations. Therefore, the datasets used in this study have been collected from 13 specific locations across six African countries. The viability of different AI models, when used for solar radiation prediction in different locations and considering various datasets as well as timesteps, is analysed in this study. One of the research questions that this study seeks to address is the possible sovereignty of an AI model for solar radiation estimation tasks considering differences in location, timestep, and dataset. While developing (African) countries has been used as the case study for the implementation of the AI algorithms developed in this study, the applicability of this models is not limited to developing countries only. They can be use in developed countries also however, some of the training parameters may require adjustments for the supervised AI algorithm. The rest of the article is organized as follows; a brief introduction to all the models considered in this study as well as the model development are explained in “Machine learning and deep learning algorithms” and “Data acquisition and preparation” sections. The performances of the models are presented in “Results” section and a brief discussion of these performances is stated in “Brief summary and discussion” section. The entire article is concluded in “Conclusions” section.

Machine learning and deep learning algorithms

Recent research have focused on forecasting renewable energy resources 45,46,47, because of the growth in global RES and the integration of such sources into the electrical grid throughout the world. Recently, the projection of renewable energy production, notably wind and solar energy, has received considerable attention due to its considerable influence on operating and managing power management choices. Precise forecasts for the production of renewable energy-based systems are essential to ensure the continued dependability of the grid and to decrease energy market and energy systems risks/costs. Due to nature, the energy generated by solar and wind energies will always be unstable. Hence, the need to adopt sophisticated methodologies for the forecast of energy systems’ production. The methods adopted and compared in this study for solar energy resources forecast may be divided into 4 categories: physical methods, statistical models, techniques, and hybrid ways of artificial intelligence 48. These are introduced in the following subsection.

Random forest regression

One of the most common machine learning methods is a random forest (RF) algorithm 49. This is a controlled approach that employs a regression method for learning. The learning approach integrates various machine learning algorithms in order to generate predictions that are more accurate than a single model. In the course of training and determining the mean class of the classes, a random forest operates by building many decision trees as a forecast for all the trees 50,51. Creating several trees for different subsets of the data points balances the prevalent overfitting problem, minimizes variance, and ensures improved accuracy. The RF algorithm is shown in Algorithm 1 while a sample of the RF tree is illustrated in Fig. 1.

Figure 1
figure 1

Sample of a random forest tree.

Algorithm 1 Random Forest Algorithm

Start

      Select from the training set a random k data point

      Construct a decision tree for the k data points

      Select N of the trees you would want to construct

      Repeat

                  Steps 1 and 2

      Make a prediction of the values of y for the data point for each of your N-trees and assign the new data point to the average across the whole number of y-values anticipated

End

The RF algorithm's predictive value is provided by the mathematical equation 52;

$$ \hat{Y} = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} T_{n} \left( X \right) $$
(1)

where \(Y\)'s mean values are from \(n, N, and T_{n} \left( x \right).\) Input parameters in \(X\) indicate the number of random forest decision trees in N. The equation specifies the average number of \(T_{n} , n = 1,2,...,N\) decision trees given the input \(X\) in order to provide a solid forecast.

With the RF-Method, forecasts can be obtained and forecasting parameters identified (which are related to the response) via RF's integrated measurement of variable importance. This may also be taken into consideration and enhanced prognostics can be produced. Specifically, RF is adopted in this study for solar radiation forecast due to its use in existing works of literatures 53. For instance, in three distinct sites with varied API conditions in China, Sun et al. 54 utilize the random forest to estimate solar radiation given a single, accessible meteorological variable and air pollution index.

Polynomial regression

Polynomial regression is a specialized linear regression in which the data (having a curvilinear connection between the goal and the independent variables) are multinomially equated. Polynomial ensures a proper approximation of dependent and independent variables across a wide range of curvatures. The value of the target variable does not vary uniformly with regard to the predictor in a curvilinear relationship (s). The linear regression equation (Eq. (2)) with one predictor is transformed to polynomial equation of degree n in polynomial regression as Eq. (3).

$$ Y = \theta_{0} + \theta_{1} x $$
(2)

where \(Y\) is the goal, \(x\) is the predictor, \(\theta_{0}\) is the bias, and \(\theta_{1}\) is the weight of the equation of regression.

$$ Y = \theta_{0} + \theta_{1} x + \theta_{2} x^{2} + \theta_{3} x^{3} + \ldots + \theta_{n} x^{n} $$
(3)

Here \(\theta_{0}\) is the bias, \(\theta_{0} , \theta_{1} , \ldots . \theta_{n}\) are the weight of the polynomial regression equation and \(n\) is the polynomial degree. Since hourly solar radiation profile follows a polynomial path, this AI algorithm is modelled in this study for the forecast of solar irradiance in accordance with the literature 55.

Multi-linear regression

This AI algorithm employs numerous explanatory factors to predict the result of the response variable. The objective of multiple linear regression (MLR) model is to describe the linear connection between the (independent) explanatory and the (dependent) responsive variables. The connection of many independent variables \(\left( {x_{1} ,x_{2} ,x_{3} ,x_{4} } \right)\) and a dependent variable \(\left( {\hat{y}} \right)\) is explored and the first order of regression function employed in this investigation is presumed to be;

$$ \hat{y} = b_{0} + b_{1} x_{1} + b_{2} x_{2} + b_{3} x_{3} + b_{4} x_{4} $$
(4)

where \(b_{0}\) is the y-axis cut-off point for the adjusted regression curve, \(b_{1}\) is the first variable of guess \(x_{1}\), and \(b_{2}\) is the first variable of guessing \(x_{2}\). The independent variables; wind speed, temperature, humidity, and pressure (\(x_{1}\), \(x_{2}\), \(x_{3} \) and \(x_{4}\)) and dependency variable (\(\hat{y}\)) solar radiation are correspondingly used as a in this study.

Decision tree regression

Decision trees are hierarchical non-parametric structures, which build both regression and classification models in a tree shape. A decision tree operates recursively and splits the original input space constantly into sub-sets to accumulate instances in smaller areas 56. The decision-making tree is gradually created during the breaking process, and a final decision-making tree with leaf nodes is generated. A blade node shows a choice on a discreet or ongoing objective. The ID3 and C4.5 decision tree algorithms, invented by Ross Quinlan, are frequently utilized in literature 57. A novel application of decision tree classifier in solar irradiance prediction was presented by Singh et al. 58. In this work, the technique of the C4.5 decision tree regression is used because of the continuous nature of the sun irradiance values 59. In the form of a model regression tree, a predictor space is divided into j regions \(\left( {R_{1} , R_{2} ,R_{3} \ldots ..R_{J} } \right)\) is depicted as Fig. 2. For all instances in the same region, the same prediction is made by the means of answers (for all training examples in the region). The basic goal throughout the construction of a decision tree regression model is to locate regions \(\left( {R_{1} , \ldots ..R_{J} } \right)\) which minimize the remaining square sum.

Figure 2
figure 2

Schematic representation of regression tree.

XG BOOST

eXtreme Gradient Boosting (XG-Boost or XGB) is one of the most recent machine learning algorithms that is very good for 1D dataset. In terms of precision and speed, it has the best performance for most tasks 60. It runs in parallel and distributed computing, thereby achieving a higher learning rate in comparison with other set algorithms. XG-boost is a modified algorithm for generalized gradient boosting and it creates a distinct type of tree from the boost algorithm for gradients. The split may be found using a similarity score and gain in XG-boost. The regulating parameter is used to prevent the split from overfitting. When the parameter regularization is nil it falls into the standard technique for gradient boosting. Two more approaches avoid overfitting together with regularization. One is the retraction scales that change the weight by a factor η at each step. Its goal is to decrease an individual tree's effect on the model. The second method is to employ subsampling of columns, which similarly improves training time. Another essential step is that an approximation method is used to identify the optimum division 61.

Long short-term memory (LSTM)

For the resolution of the disappearing and exploding gradient problem, LSTM offers memory blocks instead of traditional recurrent neural network (RNN) units 62. It then adds a cell state to stored long-term states (Fig. 3) which is the main difference between LSTM and the vanilla RNN. An LSTM network can recall and link prior data to current data 63. Three gates are integrated, including the input gate, "forgetful" gate, and output gate where \(x_{t}\) references the current input; new and predecessor cell states are referred by \(C_{t}\) and \(C_{t - 1}\) , respectively; and \(h_{t}\) and \(h_{t - 1}\) respectively the current and preceding cell outputs. The LSTM input gate principle is expressed in the following forms:

$$ i_{t} = \sigma \left( {W_{i} *\left[ {h_{i - 1} , x_{t} } \right] + b_{i} } \right) $$
(5)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t} = tanh\left( {W_{i} *\left[ {h_{i - 1} , x_{t} } \right] + b_{i} } \right) $$
(6)
$$ C_{t} = f_{t} C_{t - 1} + i_{t} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t} $$
(7)

where Eq. (5) is utilized to employ a Sigmoid layer to pass \(h_{i - 1}\) and \(x_{t}\) to determine the required information. Then \(h_{i - 1}\) and \(x_{t}\) passing through the tanh layer in Eq. (6) is used to obtain fresh information. In Eq. (7) \(W_{i}\) refers to a sigmoid output and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t}\) = a tanh output, the present moment information (\(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t - 1}\)) and the LSTM Information (\(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t}\)) is merged into \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{C}_{t}\) . Here, \(W_{i}\) indicates weight matrices and \(b_{i}\) is the LSTM gate bias.

Figure 3
figure 3

The internal structure of long short-term memory.

The forgetful gate of the LSTM then permits selective information transmission through a sigmoid layer and a dot product. The choice of forgetting the associated information of an earlier cell with some likelihood, with \(W_{f}\) referring to the weight matrix, bf the offset and σ is the sigmoid function, is done using Eq. (8).

$$ f_{i} = \sigma \left( {W_{f} *\left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right) $$
(8)

The output gate of the LSTM determines the state of the following inputs: \(h_{t - 1}\) and \(x_{t}\) in Eq. (9) and Eq. (13) respectively. The final result is acquired and multiplied through the vectors for state decisions which transmit through the tanh layer new information, Ct,

$$ O_{t} = \sigma \left( {W_{0} *\left[ {h_{t - 1} ,x_{t} } \right] + b_{0} } \right) $$
(9)
$$ h_{t} = O_{t} \tanh \left( {C_{t} } \right) $$
(10)

where \(W_{0}\) and \(b_{0}\) are the weighted matrices of the output gate and LSTM bias respectively.

Artificial neural network (ANN)

The ANN is an information processing model that imitates biological neural network activities and structures found in human brains 64. This AI model is used to solve linear and nonlinear regression tasks. Figure 4 illustrates a basic neural network, with 2 input neurons, X and Y, 3 neurons, and 1 neuron. For the desired offset, the threshold component is utilized. The weights \(w_{i,j}\) where the indexes of the neurons are \(i and j\) are \(a_{i} and b_{i}\). To compute the weighted amount, first X and Y are multiplied by their weights. The result is then added to a partial function and supplied into an activation. Every neuron computed in the hidden layer, \(h_{j}\), is calculated with \(h_{j} = s\left( {\mathop \sum \limits_{i} w_{i.j} *h_{i} } \right)\), where \(S\) is the activation function. The ReLU Rectified Linear Unit (ReLU) function, \(S\left( x \right) = {\text{max}}\left( {0, x} \right)\) is used for hidden layer activation and nonlinear activation while the Sigmoid function \(S\left( x \right) = \frac{1}{1} + e^{ - x}\) is applied on the output layer to model the network’s probability distribution. ANN is one of the most predominant supervised learning AI algorithm for solar radiation forecast in literature 6567, hence, its adaptation to the dataset in this study.

Figure 4
figure 4

Artificial neural network architecture.

Convolutional neural network (CNN)

This model is a special kind of multilayer perceptron, however, unlike other deep learning architecture, the basic neural network is unable to learn complicated characteristics. In several applications 68, CNN algorithms have shown great performance in the categorization of images, object recognition, and analysis of medical images. However, it has also been used for solar irradiance prediction tasks in the existing works of literature 69,70. The basic principle behind a CNN is that local features are obtained from high layer entrances and transferred for more complicated features to lower layers (as shown in Fig. 5). CNN converts the input data from the input layer into a collection of class scores for the output layer across all linked layers. A CNN includes the full connecting layers, the pooling, and the convolutional layers.

Figure 5
figure 5

Convolutional neural network architecture.

A collection of kernels 71 is used to determine the feature mappings tensor in the convolutional layer. These kernels converge a whole input with 'stride(s)' to make a volume in its dimensions 72. After the convolutional layer is employed for the processing, the dimensions of an input volume shrink. Therefore, zero-padding 73 is necessary for padding input volumes with zeros and maintaining low-level dimensions of an input volume. The functioning of the convolutional layer is:

$$ F\left( {i, j} \right) = \left( {I*K} \right)\left( {i, y} \right) = \mathop \sum \nolimits \mathop \sum \nolimits I\left( {i + m,j + n} \right)K\left( {m,n} \right) $$
(11)

\(I \) refers to an input matrix, \(K\) is a 2D filter of size \(m to n,\) and \(F\) is a 2D feature map output. \(I*K\) indicates the functioning of the convolutionary layer. The rectified linear unit (ReLU) layer is used to increase nonlinearity on feature maps 74. By maintaining the threshold input at zero, ReLU calculates the activation. The following is expressed mathematically:

$$ f\left( x \right) = {\text{max}}\left( {0,x} \right) $$
(12)

Downsampling of a particular dimension is performed by the pooling layer 75, in order to minimize parameters. The most frequent way of max-pooling in the input region generates the maximum value. The FC layer 76 is utilized as a classifier that decides on the characteristics derived from the convolutions and pooling layers. A CNN aims to learn more about data by use of convolutions. For CNN predictive models it is necessary to collect data from convolutional layers while regression work is carried out in the last fully connected layer 77. In this study, the Convolution-1D (Conv1D) which is most suitable for text input data is implemented to convolve the input data points over temporal or single spatial dimensional tensors.

Hybrid CNN-ANN architecture

The network CNN-ANN combines both networks with the extraction of functionalities. CNN uses kernel technology to upgrade filter weights to understand how the training data are represented. The model contains a single CNN layer with 5 * 2 * 2-stride filters that complement the input data. The model of CNN contains hidden neuronal layers depending on the model for a specific dataset. The output of the CNN layer is flattened so that the complimentary ANN model may be supplied. The ANN network also consists of hidden layers of neurons and a one-node output layer. Both models are formed to compute the relevant derivatives as a single end-to-end network with a loss function as a cross-entropy. Adam optimizer, a learning rate of 0.001, and a training lot size of 512 were used for different epochs. Figure 6 illustrates the architecture of the model. The neurons in this hybrid system can be summed up as a result of the secret layers.

Figure 6
figure 6

Hybrid CNN-ANN Architecture.

Every layer in a 1-D convolutional neural network mathematically extracts patterns in \(G_{i}\), as it pertains to other input variables using Eq. (13) 21.

$$ h_{{\ddot{y}}}^{k} = f\left( {\left( {W^{k} *x} \right)_{ij} + b_{k} } \right) $$
(13)

Wk is the kernel weight associated with the kth feature map, f represents the activation feature, and * is the operator. Equation (13), where c is the output \(h_{{\ddot{y}}}^{k}\)., can be rewritten under Eq. (14).

$$ q = f\left( {\left( {W^{k} *x} \right)_{ij} + b_{k} } \right) $$
(14)

A flattened layer is utilized in the hybrid model to transform the matrix into a unique vector (Eq. (15)), so that the matrix may be adapted to the ANN model input.

$$ Z = f\left( q \right) $$
(15)

ANN model is used as input for the output of the flattened layer (Z) (Eq. (16)).

$$ y\left( x \right) = L\left( {\mathop \sum \limits_{j = 1}^{N} w_{j} \left( p \right).\,Z_{j} \left( p \right) + c} \right) $$
(16)

where \(y\left( x \right)\) has been predicted \(G_{i}\) is the weight which links neurons to the input layer \(w_{j} \left( p \right)\), the variable \(Z_{j} \left( p \right)\) is the discrete input variable \(t\) and the neuronal bias \(c\), of the input variable, \(L\left( . \right)\) is the hidden transfer function.

Hybrid CNN-LSTM-ANN architecture

The threefold hybrid model has been created to compare the effectiveness of the model in extracting the data by complementing each other in order to understand short and long-term relationships. As shown in Fig. 7, a recurrent neural network is added for this hybrid model which is running in cycles and is extremely proficient in sequence analysis. The combined LSTM helps to maintain the required data from earlier concealed countries compared to the CNN-ANN model. The input data are supplied with neurons to the hidden layer(s) 1D CNN, and then sent to the LSTM network in hidden states and ultimately the densely linked network that generates the overall model forecast. For this hybrid, the ANN model consists of different layers of neurons depending on the data set. The architecture of CNN and ANN is similar to the hybrid CNN-ANN concept mentioned above. This model contains fundamental computations integrated with the synthesis of neurons in the hidden layers of a hybrid model. This is described in four different phases 78.

Figure 7
figure 7

Schematics of the Hybrid CNN-LSTM-ANN Architecture.

Phase One: The LSTM model determines the information that is thrown away from the \(f_{t}\) forgotten gate in Eq. (16), according to the hidden state ht-1, and the new input qt is modeled with Eq. (20).

$$ f_{t} = \sigma \left( {W_{f} \times \left[ {h_{t - 1} ,q_{t} } \right] + b_{f} } \right) $$
(17)

where Wf is the matrix weight, the logistic sigmoid function is \(\sigma \left( \ldots \right)\) and the bias function is \(b_{f}\).

Phase Two: The information stored in the cell state is chosen in this step. There is also a new cell candidate (\(\tilde{C}_{t}\)) created by the 'input gate' it is likewise scaled.

$$ \tilde{C}_{t} = tanh\left( {W_{C} \times \left[ {h_{t - 1} ,q_{t} } \right] + b_{C} } \right) $$
(18)
$$ i_{t} = \sigma \left( {W_{i} \times \left[ {h_{t - 1} ,q_{t} } \right] + b_{i} } \right) $$
(19)

The hyperbolic tangent function in Eq. (18) is Tanh (…).

Phase Three: A combination of the earlier cell state Ct-1 and \(\tilde{C}_{t}\). will update the new cell Ct. \(f_{t}\) is affected and is also scalable by it. in the previous cell.

$$ C_{t} = f_{t} *C_{t - 1} + i_{t} *\tilde{C}_{t} $$
(20)

Phase Four: The final step is to divide the output into two stages and define the resulting cell state by creating an \(o_{t}\)"output gate." The tanh function triggered Ct is filtered by \(o_{t}\). The outcome is the desired output \(h_{t}\)

$$ o_{t} = \sigma \left( {W_{o} \times \left[ {h_{t - 1} ,q_{t} } \right] + b_{o} } \right) $$
(21)
$$ h_{t} = o_{t} \times tanh\left( {C_{t} } \right) $$
(22)

The flattening layer transforms the matrix (Eq. (22)) into a single vector for this hybrid model.

$$ Z = f\left( {h_{t} } \right) $$
(23)

ANN model is used as input for the output of the flattened layer (Z) (Eq. (16)).

Data Acquisition and Preparation

The solar radiation dataset for this research is collected from three different databases namely; TMY79, SARAH 80, and WB-ESMAP81. These datasets have been measured for different and nine various specific locations within these countries. The specifics (including longitude, elevation, and latitude) of the locations from which these datasets were measured are summarized in Table 2. Since various solar irradiance types are considered in this study, the data timestep for the datasets also varies.

Table 2 Location details of research dataset.

Training and testing of the models

The proposed and compared artificial intelligence (AI) models can be trained using different data sizes. While the hourly solar radiation prediction based on TMY considers 12 years of hourly data, 34 years of data is used for daily solar irradiance prediction. For the WB-ESMAP data which considers the prediction of solar irradiance with the timestep being minutes, 2 years of data were used for training/testing and the dataset summary is presented in Table 3. Also, for all the case studies, 90% of the data are used for training while the remaining 10% are the test dataset. The countries considered for the GSR task include Algeria, the Central African Republic (CAR), South Africa (SA), and Egypt. While Nigeria is considered for the daily average DNI task and hourly DSR task, Senegal is the only country considered for DHIRSI, GHISil, and GHIpyr tasks (Table 2).

Table 3 Data training and test set summary.

Since the dataset varies based on the database it was extracted from, the input layers of the dataset also differ. For the datasets from all the databases, three input nodes namely year, month, and day are constant. All the AI models designed for the TMY dataset use an input layer of 7 nodes and these nodes represent the input parameters. In addition to the 3 constant nodes for all the datasets, the other TMY input nodes are hour, ambient temperature, wind speed, and sun elevation. Also, the input layer of the models designed for solar irradiance prediction with the SARAH dataset has (1 node in addition to the aforementioned 3 nodes) a total of 4 nodes. The additional node is the daily sunshine duration. Furthermore, the AI models based on the WB-ESMAP dataset consider an input layer with 10 nodes. These nodes (input parameters) are wind speed, wind direction, precipitation, wind speed, air temperature, relative humidity, barometric pressure, and the other constant 3 nodes (Table 3).

Model implementation and evaluation metrics

Since these AI models are designed for African (developing) countries, the selection of the number of hidden layers and their corresponding neurons were strategically optimized to ensure fast computation, and optimal convergence, and to avoid model over-fitting. All the AI regression models have been built using the Tensorflow and Keras Application Programming Interface (API) and the mean square error (MSE) in Eq. (24) has been adopted as the loss function while (ReLU) is used as the (nonlinear) activation function. For the deep learning models, the feedforward computation is completed, resulting in the model’s predicted value. This value is compared to the ground truth value or label and the loss is computed. Backpropagation is employed to find the derivative of the model parameters and the cost function is minimized using the “Adam” optimizer. All the AI models were implemented in a Python environment (via Jupyter notebook) which runs with a Core i7, 2.20 GHz system with 16 GB RAM, and GTX1060 6 GB Graphics card.

To have the same basis for comparison, the three most common evaluation metrics for numerical AI tasks are adopted in this study to evaluate the performance of all the models. These include root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (r). These metrics were chosen based on their adoption in (solar radiation prediction) existing works of literature (in developing countries) 6,30,82. The mathematical models of the following metrics are:

$$ MSE = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {G_{i}^{m} - G_{i}^{p} } \right)^{2} $$
(24)
$$ r = \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {\left( {G_{i}^{m} - \left\langle {G_{i}^{m} } \right\rangle } \right)\left( {G_{i}^{p} - \left\langle {G_{i}^{p} } \right\rangle } \right)} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {G_{i}^{m} - \left\langle {G_{i}^{m} } \right\rangle } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {G_{i}^{p} - \left\langle {G_{i}^{p} } \right\rangle } \right)^{2} } }}} \right)^{2} $$
(25)
$$ MAE = \frac{{\mathop \sum \nolimits_{N}^{t = 1} \left| {G_{i}^{m} - G_{i}^{p} } \right|}}{N} $$
(26)
$$ RMSE = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {G_{i}^{m} - G_{i}^{p} } \right)^{2} } $$
(27)

where \(G_{i}^{m}\) is the measured value and \(G_{i}^{p}\) represents the predicted value, and \( <{G_{i}^{m} } >\)/\(< {G_{i}^{p} }>\) are the average values of \(G_{i}^{m}\) and \(G_{i}^{p}\) respectively.

Results

In this study, the performance of 10 different artificial intelligence models has been compared for various solar irradiance prediction tasks in some selected developing (African countries). While most studies in existing literature have only focused on the hourly forecast of various solar radiation parameters, this study furthers the knowledge in literature by considering different timesteps namely minutes, hourly, and daily. Various solar irradiance parameters (from different measurement techniques) were also considered to highlight the intrinsic attention to detail of the AI models. Considering the technological developmental status of these countries, the models were built to be as simple as possible. In this section performance of all the AI models is discussed. The discussion is presented in three subsections following the timesteps of the solar irradiance parameters.

Daily average direct normal irradiance prediction

The average daily solar irradiance prediction task considers two locations (namely Akure and Abuja) in Nigeria. Also, the specific solar parameter considered is direct normal solar irradiance (DNI) and this is integral to the performance/ development of many solar-based technologies. The number of hidden layers (as well as the number of neurons in each hidden layer) in each AI model is summarized in Table 4. Also, the optimal number of training epochs and training batch size for each of the models are presented in the same table. This highlights the simplicity of these models and their adaptability to the targeted developing countries.

Table 4 Optimal AI training parameters for daily DNI task.

Furthermore, the performance of all the models based on the three evaluation metrics used in this study is tabulated in Table 5. Specifically, for Abuja_DNI prediction, two models (DTR and MLR) were found unsuitable for this AI task. This is due to the high RMSE and MAE as well as the low r-value (Table 5). In this study, the models were tasked to forecast the daily average DNI for 3.4 years and the forecasted results in comparison to the real data are compared in Fig. 8a. However, a more detailed pictorial representation (in Fig. 8b) of the forecasted result showed the inadequacies of MLR and DTR. While the performances of ANN, CNN-ANN, and LSTM are quite similar, the most suitable AI models for the Abuja_DNI prediction tasks are CNN-LSTM-ANN and XGB. However, XGB is preferable due to its unsupervised learning characteristics and its fast computational time when compared with CNN-LSTM-ANN.

Table 5 Daily DNI task evaluation metric summary.
Figure 8
figure 8

(a) 3-year ahead AI models’ predictive plot of Nigeria_Abuja_Daily DNI task. (b) Nigeria_Abuja_Daily_DNI task day-ahead AI models’ predictive plot for 100.

It is also noteworthy that XGB has the least MAE and RMSE (40.78282 W/m2 and 53.73310 W/m2 respectively) as well as the least r-value (0.800087) as highlighted in Table 5. The new hybrid deep learning CNN-LSTM-ANN model presented in this study is a viable alternative to XGB as the performance of this model differs slightly. While the CNN-LSTM-ANN r-value is 0.79643, the RMSE and MAE are 41.48851 W/m2 and 24.68782 W/m2 respectively. The close proximity of this model results (forecasted DNIs) to that of the real data in Fig. 8b further highlights its potency.

The AI models’ performance for the same task considering another location (Akure_DNI) has a similar pattern to its corresponding Abuja_DNI AI models. Although the only AI model that seems unsuitable for this task is DTR, its performance based on the evaluation metrics is still higher when compared to the Abuja_DNI task (Table 5). The difference in model performance between Abuja_DNI and Akure_DNI prediction tasks can be attributed to the solar distribution in these locations. Akure as a location has a more distributed daily average DNI when compared with Abuja (as seen in Fig. 9a as compared to Fig. 8a), hence the high predictive performance by all the AI models.

Figure 9
figure 9

(a) 3-year ahead AI models’ predictive plot of Nigeria_Akure_Daily DNI task. (b) Nigeria_Akure_Daily_DNI task day-ahead AI models’ predictive plot for 100.

While all the models (with the exception of DTR) recorded a good performance for the Akure_DNI prediction task, the best models for this particular task are ANN and XGB. The r-value, RMSE and MAE for these models respectively are 0.948073, 25.14591 W/m2, and 19.10983 W/m2 for ANN; 0.949997, 24.68782 W/m2, and 18.52771 W/m2 for XGB. The supervised learning feature of ANN creates room for further improvement of the model (especially when applied in other locations), however, the ANN model overfitting problem should be avoided. As seen in Fig. 9b, the forecasted Akure_DNI with XGB has the closest proximity to the real data. Therefore, it can be inferred that XGB models are most suitable for DNI daily average DNI forecasting.

Hourly solar radiation forecast

The hourly solar radiation prediction task in this study considers both diffused solar radiation (DSR) and global solar radiation (GSR). The AI models developed for this prediction task are adapted to five locations across Algeria, Nigeria, CAR, Egypt, and South Africa (Table 2). Due to the variation in location, the training parameters for the deep (supervised) learning AI models are optimized to achieve the best predictive performance in each location. Hence, the optimal batch size, number of epochs, number of hidden layers as well as the number of neurons in each hidden layer for all the deep learning models used are highlighted in Table 6.

Table 6 Optimal AI training parameters for hourly SR task.

Out of all the 10 AI models presented in this study, six models have a very good predictive performance on the evaluation metrics results (Table 7). These models are ANN, CNN-ANN, CNN-LSTM-ANN, CNN, PLR, and XGB. The predictive output data (results) in comparison to the real data for all the models over the total test period (for all the location that considers hourly solar radiation forecast) is illustrated (in Fig. A) in the appendix section of this study. From the results of this study, it can also be deduced that the MLR model is not suitable for this specific task (Fig. 10a).

Table 7 Hourly SR task evaluation metric summary.
Figure 10
figure 10figure 10

(a) Algeria GSR hourly prediction performance plot for three days. (b) Nigeria_Borno DSR hourly prediction performance plot for three days. (c) CAR GSR hourly prediction performance plot for three days. (d) Egypt GSR hourly prediction performance plot for three days. (e) SA GSR hourly prediction performance plot for three days.

The hybrid CNN-LSTM-ANN AI model proposed in this study recorded the best predictive performance for the Algeria_GSR task with an r-value, RMSE, and MAE of 0.977527, 81.101 W/m2, and 30.8785 W/m2. However, the close proximity of ANN, XGB, and CNN-ANN are evident in their predictive performance over a period of 72 h (Fig. 10a). The performance of the models presented in this study further strengthens existing works of literature in this field as the accuracies are higher than some of the reported results in literature.

Unlike Algeria, the hourly solar radiation prediction task for the location in Nigeria considers diffused solar radiation (DSR). While the r-values of the AI models developed for this task are comparatively smaller than that of the GSR task for other countries, the RMSE and MAE are also smaller. This is due to the statistical and meteorological distribution (as seen in Fig. 10b) of DSR when compared with GSR.

It is also noteworthy that most of the existing works of literature in the domain of solar radiation prediction worked on GSR hourly prediction. Therefore, this study further contributes to the literature as these AI models have been optimized for DSR prediction. While six AI models had high predictive performance when used for the Nigeria_DSR task, XGB is the most superior of all the models. As highlighted in Table 7, the RMSE, MAE, and r-value for the XGB model, when used for the Nigeria_DSR task, are 49.1553 W/m2, 17.0214 W/m2, and 0.904992. The predicted data for all the AI models are compared with the real data over a period of 72 h and highlighted in Fig. 10b.

The other three countries considered for the solar radiation task in this study are CAR, Egypt, and South Africa. The AI models were developed for GSR hourly prediction tasks in this study and the performance of each of these models is highlighted in Table 7. The models that are suitable for the CAR_GSR task are ANN, CNN-ANN, XGB, and PLR. Considering the evaluation metrics (r = 0.965303, MAE = 45.5573 W/m2, RMSE = 95.9444 W/m2 in Table 7) and the predictive output data plotted in Fig. 10c, ANN is the most suitable AI model for CAR_GSR forecast task.

It is noteworthy that the high MAE and RMSE values reported in this study for hourly solar radiation are due to the GSR unit. While the unit of GSR in this study is W/m2, in most literatures, kW/m2 is the unit adopted for GSR, hence the lower MAE and RMSE reported in these studies.

The performance of the AI models for the Egypt_GSR prediction task is the best in this entire study and this is due to the high solar intensity and good solar radiation distribution in the location chosen for this country. As seen in Fig. 10d. and Table 7, the most accurate model for GSR prediction in this location is the proposed CNN-LSTM-ANN model in this study. The r-value, RMSE, and MAE of the model are 0.987936, 60.49804 W/m2, and 22.31752 W/m2 respectively and these are the best evaluation metrics considering all the AI models for this particular location. Although the performance of XGB is quite similar to the CNN-LSTM-ANN model, the supervised learning nature of the model resulted in a better performance when compared to the XGB model. It is also worth noting that all the deep (supervised) learning models in this study have the capacity to give an accurate prediction of hourly solar radiation.

The last location considered for the GSR prediction (in a developing country context) is in South Africa. The performance (considering the r-value) of all the models (except DTR) is very similar for this location. However, as illustrated in Fig. 10e, the GSR forecast using the XGB model is the closest to the real data. This model had the least RMSE and MAE (91.15934 W/m2 and 32.59973 W/m2 respectively) as well as the highest r-value (0.968881) as highlighted in Table 7. The locations selected for the hourly solar radiation tasks in this study have been chosen considering data availability and good solar radiation potential. The fast computation speed for all the AI models in this study based on the models’ parameters further showcases their potency in application.

Solar irradiance prediction based on minutes timestep

One of the outstanding contributions of this present study is the development of AI models to forecast solar irradiance based on minutes timestep. Existing works of literature have majorly focused on the hourly solar irradiance prediction, however, the knowledge of solar irradiance minute by minute will further enhance the estimation of energy production from solar-based technology. Two locations in Senegal have been considered and three different measurement techniques for each location. The optimized training parameters for the deep learning models applied for each task are summarized in Table 8.

Table 8 Optimal AI training parameters for minute-ahead solar irradiance task.

One of the things noticed for the preliminary training of all the datasets in this category with the AI models is that the PLR cannot perform this prediction task. Therefore, nine AI models are considered in this section for the solar irradiance prediction task. Generally, the predictive performance of the models (based on the evaluation metrics) shows that it is more difficult for the AI models to accurately forecast solar irradiance minute-by-minute when compared with its corresponding hourly or daily AI models. The nine AI models were tested by using it to forecast the diffused and global horizontal irradiance (DHIRSI, GHIpyr, and GHISil) for 39 days in the two locations in Senegal. The forecasted results for Senegal_Toubal are plotted against the actual data and illustrated (in Fig. B) in the Appendix section. However, a day-ahead forecast is also conducted for Senegal_Toubal with the AI models and the results are illustrated in Fig. 11a and b.

Figure 11
figure 11

(a) AI models’ performance for Sengal_Touba_DHIRSI. (b) AI models’ performance for Sengal_Touba_GHIpyr. (c) AI models’ performance for Sengal_Touba_ GHISil.

Unlike other solar parameters prediction tasks or scenarios in this study (where various models are most suitable for different locations/solar parameters), the training/testing of the solar irradiance in this section showed that the XGB model is the most suitable in all the locations. As seen in Table 8, the AI models have a better performance for DHIRSI and GHIpyr in Senegal_Touba when compared to Senegal_Fatick. While the XGB model performance for DHIRSI forecast task in Sengeal_Touba are r = 0.778685, RMSE = 104.911 W/m2, and MAE = 69.41538 W/m2, the corresponding best model (XGB) for Senegal_Fatick location are r = 0.727731, RMSE = 118.5533 W/m2, and MAE = 82.44148 W/m2 (Table 9). As seen in Fig. 11a, while the CNN-LSTM-ANN, LSTM, and ANN models can learn the data part, the proximity of the forecasted data based on the XGB model is better for most of the minutes in the day-ahead task. The plotted results in Fig. 11b and c further confirm the superiority of the XGB model as it follows the real data pattern.

Table 9 Minutes timestep SR task evaluation metric summary.

Brief summary and discussion

Ten AI models have been used as the basis for developing specific algorithms to forecast solar irradiance parameters in this study. Considering the under-development and economic status of many developing countries, the AI models in this study have been adapted for this solar radiation forecast task in six developing (African) countries. It is worth noting that the applicability and the usefulness of the models are beyond developing countries. While two locations in Nigeria were considered for the daily average DNI task, another location in the same country is considered for the hourly average DSR estimation task. Similarly, two locations in Senegal were considered for the estimation of solar irradiance (DHIRSI, GHIpyr, and GHISil) estimation task based on minutes timestep. Also, four locations in different countries have been used for GSR estimation. In summary, a total of 13 solar irradiance estimation tasks were carried out in this study considering 10 AI models for each task.

With the aim to check if there is a universal model for solar parameter estimation in developing countries, the results of this study show that various AI models are suitable for different solar irradiance estimations. However, the deep learning models (ANN, LSTM, and CNN), the hybrid deep learning models (CNN-ANN, and CNN-LSTM-ANN) as well as the XGB model has better predictive performance when compared to other models in most location. The results for the prediction of solar irradiance in minutes showed that XGB is the best model for this task in all the locations considered. Also, despite the change in solar measurement parameters in minutes timestep, the performance of the XGB model was relatively suitable for the task. It is, however, noteworthy that the AI models had the least predictive accuracy when considering the minutes' timesteps.

Similarly, the XGB model is the most suitable model for daily average DNI estimation. While PLR and CNN-LSTM-ANN models had a comparatively good performance for this task, the prediction errors recorded by the XGB models are significantly lower. The daily average DNI estimation further shows the novelty of this study as the performance of the models for the Nigeria_Akure_DNI task is better in comparison to existing works of literature. The evaluation metrics for this specific task are r = 0.949997, RMSE = 24.68782, and MAE = 18.52771.

Deep learning models and XGB models are most suited for the hourly solar radiation task. While the innovative hybrid deep learning model (CNN-LSTM-ANN) proposed in this study is most suitable for GSR prediction in Northern African countries, the XGB model reported the best performance for Nigeria and South Africa. Also, the hourly solar radiation estimation accuracy is very high, hence it dominant in existing solar radiation research.

From this study, it can also be deduced that some AI models are not applicable for some specific solar irradiance tasks. PLR model could not learn any of the minute timestep tasks while DTR models also had a bad predictive performance for daily average DNI task. Therefore, these models can be excluded from these specific tasks in the future as they are machine (unsupervised) learning algorithms.

Conclusions

Based on the results of this study, all the models presented in this study showed their suitability for various solar irradiance prediction tasks. However, the XGB model can be concluded as the best model for solar irradiance prediction tasks out of all the developed AI algorithms considered that was considered within the scope of this research. This is due to its consistently high performance in all the tasks in the study. Despite the change in location and solar parameters, the XGB model had a relatively high performance/accuracy for all the tasks. While the results of the models in the study are better than some existing works of literature, the accuracy of the forecasted solar irradiance shows that more researches on the use of other AI models (such as reinforcement learning models and the developments of new hybrid AI models) are required.

In the future, more research will focus on the accurate prediction of solar irradiance considering the minutes' timestep. While this is the first study to present this (to the best knowledge of the authors), the estimation of solar irradiance in minutes will further help in forecasting solar technology’s production accurately. Thereby, improving the overall development of the solar energy sector.