Abstract

Extreme learning machine (ELM), as a new simple feedforward neural network learning algorithm, has been extensively used in practical applications because of its good generalization performance and fast learning speed. However, the standard ELM requires more hidden nodes in the application due to the random assignment of hidden layer parameters, which in turn has disadvantages such as poorly hidden layer sparsity, low adjustment ability, and complex network structure. In this paper, we propose a hybrid ELM algorithm based on the bat and cuckoo search algorithm to optimize the input weight and threshold of the ELM algorithm. We test the numerical experimental performance of function approximation and classification problems under a few benchmark datasets; simulation results show that the proposed algorithm can obtain significantly better prediction accuracy compared to similar algorithms.

1. Introduction

In recent years, artificial intelligence algorithms have drawn extensive attention from scientific research. As an important part of artificial intelligence, machine learning has been widely used in data mining [1], speech recognition [2], feature selection [3, 4], learning incentivization strategy [5], natural language processing [6], and the nonlinear function approximation and benchmark problem [7]. As a branch of machine learning, neural networks have been successfully applied in many tasks of learning from data. However, most of the traditional neural networks use the gradient learning algorithm for network training, which makes the network make problems such as low training efficiency, slow speed, and easy to fall into local optimal.

Extreme Learning Machine (ELM) is a new method of training artificial neural networks and includes supervised training methods, which is a kind of neural network structure put forward by Huang et al. using single hidden layer feedforward networks (SLFN) [810]. Huang et al. [11] argue that the existing neural networks have some defects in learning speed; the main reason for the low rate of learning is that all the parameters on the network are determined repeatedly by a training method. In the ELM learning algorithm, the weight feedback and threshold are generated randomly. Then, the output of the hidden layer matrix is used to calculate the final output weight. Computing the final weights was obtained using Moore–Penrose (MP) generalized inverse. Compared with other neural networks based on the gradient learning algorithm, the ELM learning algorithm has great advantages in learning speed, and it is capable of producing good generalization performance and greatly reduces the computational complexity of complex application problems [12, 13]. Meanwhile, these good performances have been widely promoted in various practical application fields such as biomedicine [1416], fault diagnosis [17, 18], and indoor positioning systems [19, 20]. However, since the input parameters are generated randomly and the ELM requires a large number of hidden neurons, the amplitude of the output weight will be large when the output matrix of the hidden layer is ill, which will cause the trained model to fall into the local minimum and show the phenomenon of overfitting [21]. In [22, 23], an ELM based on different regularization was proposed to effectively overcome the overfitting phenomenon. The accuracy and effectiveness of the ELM algorithm largely rest with the internal parameters of the model. So as to choose the suitable model parameters, many researchers use a bionic optimization algorithm to optimize the input weights and thresholds.

In the literature [24], the improved ELM algorithm was proposed, which used a differential evolution algorithm to choose the input weights and then used MP generalized inverse analysis to determine the output weights. This improvement enables it to obtain better generalization performance in a compact network. In the literature [25], the coral reefs optimization has been used for carrying out evolution in ELM weights to enhance the performance of these machines. A new evolutionary algorithm, particle swarm optimization , is introduced to optimize the input weight and hidden bias of ELM [26, 27] so that the network has better generalization performance in the benchmark classification experiment and is more suitable for some prediction problems. A real-coded genetic algorithm was proposed [28] to select the number of hidden neurons and the input weights, such that the generalization performance of the classifier is a maximum. But it needed to adjust many parameters in genetic operators artificially. The cuckoo search algorithm was proposed [2933], which was used to pretrain the ELM ensuring optimal solutions and to further improve the accuracy and stability of . References [34, 35] proposed model, which combines the improved cuckoo search algorithm with ELM. Both and select the input weights and biases before calculating the output weights, and they ensure the full column rank of the hidden layer output matrix.

Bat algorithm (BA) [36, 37] and cuckoo search algorithm (CS) [38, 39] are two new heuristic swarm intelligence optimization algorithms. Bat algorithm has the advantages of a simple model, fast convergence rate, strong global optimization, and so on and has been widely used in engineering optimization, model identification, and other problems. The cuckoo search algorithm has the characteristics of simple and efficient, few parameters, easy to implement, and excellent random search path and has been successfully applied to medical image optimization [40], multiobjective optimization [41], image processing [42], and other practical problems. Literature [43] shows that bat algorithm and cuckoo search algorithm have great advantages over genetic algorithm and particle swarm optimization in the new metaheuristic environment. In this paper, we combine the BACS hybrid algorithm with traditional ELM and propose an optimization algorithm of ELM based on BACS. The basic thought of the algorithm is to use the BACS algorithm to train the input weight and threshold value randomly generated by ELM to find the optimal parameter and then determine the output weights by using MP generalized inverse so as to improve the convergence speed and stability of the network model. The main contributions are as follows:(1)Based on the idea of a group intelligence optimization algorithm, this paper introduces how to train ELM by BACS hybrid algorithm. By using this method, the input weights and thresholds of the ELM network can be reasonably optimized to solve the randomness problem of hidden layer parameters so that the network parameters can reach the optimum.(2)By improving the traditional ELM network by BACS hybrid algorithm, the local and global optimization problems are effectively balanced, and the generalization performance of the network is improved.(3)Nonlinear function fitting and classification problems present that the algorithm can acquire better approximation effect and generalization performance than other algorithms.

The rest of the paper is arranged as follows: Section 2 introduces the traditional ELM network model and algorithm. Section 3 introduces the principles and implementation steps of the bat algorithm and cuckoo search algorithm. The hybrid algorithm of Extreme Learning Machine based on the bat cuckoo algorithm is described in Section 4. Some numerical experiments are discussed in Section 5. Section 6 offers some conclusions for this paper.

2. The Preliminary of ELM

In this section, we begin with the introduction of standard ELM, the network model of ELM is shown in Figure 1, and its network model can be divided into three layers, which are the input layer, hidden layer, and output layer. All of these works provide fundamental theoretical support for the new method proposed next. represents P arbitrary various samples, where and ; the traditional SLFN with L hidden nodes can be mathematically modeled aswhere is an activation function, which can take various kinds forms, such as the sigmoid function:

or Gaussian function:

The above SLFN can approximate these samples in the training process of gradual iteration. When the learning error is reduced to zero, , the learning capacity of the ELM is optimal, and then there exist and such thatwhere is the input weight, which links the -th hidden node as presented in Figure 1, is the threshold of the -th hidden node and is generated randomly, is the output weight of the -th hidden node, and represents the actual output of input in the network.

The above equations can be rewritten as the following matrix form:wherewhere is called the output matrix of the hidden layer and represents the final output matrix. The basic principle of ELM is to obtain the output weight through formula .

In practical training, the number of nodes in the hidden layer is usually less than the number of training samples . Therefore, on the premise that the activation function is differentiable, input weights and thresholds randomly selected before training should remain unchanged during training. In this way, the output weight of the network can be obtained by solving the least squares of the following linear system:

and the explicit solution iswhere represents the MP generalized inverse of [44]. Therefore, ELM can be described as follows (Algorithm 1).

Input: given a training set , activation function is , and the hidden nodes number is .
Output .
Step 1: setting learning parameters for hidden nodes and , .
Step 2: calculate the output matrix based on (5).
Step 3: calculate the output weight .

3. Algorithm Description

3.1. Bat Algorithm

Bat algorithm (BA) is a swarm intelligence optimization algorithm that simulates the predation behavior of bats. Because of its simple model, fast convergence speed, and strong global optimization, it has been widely used in data mining, wireless sensors, and power systems. However, there are also some problems in practical applications, such as easy premature convergence and low optimization accuracy.

The bat algorithm determines the optimal bat in the current search space by adjusting the frequency, wavelength, and loudness and then obtains the optimal solution to the optimization problem. For this algorithm, in order to simulate this predation behavior, the following assumptions are proposed in the process:(1)All bat individuals can use echolocation to perceive the distance and distinguish the difference between the target and the obstacle in a special way(2)The bat flies randomly at position at speed , finds the target with frequency , variable wavelength , and loudness , and automatically adjusts the wavelength (or frequency) and pulse emission rate through the distance from the target and so on(3)Assume that the loudness changes from the maximum value to the minimum value

Assuming that, in the search space with dimension , the number of iterations is , the update formulas for the frequency, velocity, and position of the bat individual in the -th generation are as follows:where represents the frequency of the -th bat and its adjustment range is , is a random number that obeys a uniform distribution in [0, 1], and represents the current optimal solution.

For the current local search domain, a random number is generated. If , the current new solution is generated by the random disturbance of the optimal solution. The update formula is as follows:where is a random number in [−1, 1] and represents the average loudness of the bat population.

When the bat is constantly approaching the target, its loudness will drop to a fixed value, and at this time, will continue to increase. Randomly generate a number ; if and the new fitness value , the new solution generated by (12) is accepted; that is . The update formula for the loudness and pulse rate of the first bat is as follows:where represents the loudness attenuation coefficient and . represents the pulse frequency enhancement coefficient and .

3.2. Cuckoo Search Algorithm

The cuckoo search algorithm (CS) is simplification and simulation of the cuckoo nest finding and spawning behavior. The special habit of cuckoos is parasitic brooding; that is, other host birds hatch and brood on their behalf. In order to make this phenomenon difficult to detect, the bird will first find a bird with similar characteristics to its own egg as the host during the breeding period. After being recognized by the host bird, the egg is removed or the host rebuilds the nest. In order to simulate its reproductive behavior, the following assumptions are proposed in the process:(1)Each cuckoo lays only one egg at a time and randomly selects the nest to hatch(2)The best bird’s nest is retained to the next generation(3)The number of available bird nests remains unchanged; there is a probability () that the host bird finds foreign eggs,

For the cuckoo search algorithm, randomly initialize bird nest positions in the -dimensional search space and leave the best position to the next generation. The new position is generated by Levy flight. Then the cuckoo’s nest search path and position update formula are as follows:where represents the position of the -th bird nest in the -th generation, represents the step-length control factor and , is the point-to-point multiplication, is the random search path, and .

After the position is updated, compare the random numbers and , and ; if , then use the random walk method to change the position so as to retain a set of better values and obtain the current optimal bird nest position and optimal solution through iteration. The update formula is as follows:where represents the uniformly distributed scaling factor within [0, 1] and both and represent the random solution in the -th generation.

3.3. Bat Cuckoo Hybrid Algorithm

Although the bat algorithm has low convergence accuracy, its global search ability is strong; in order to improve the quality of the cuckoo population, the bat algorithm is integrated into the cuckoo algorithm for optimization, and a bat cuckoo hybrid algorithm (BACS) is proposed. For this algorithm, the nest position obtained by the cuckoo algorithm is not directly used as the initial position, but the bat algorithm is used to continue to optimize the optimal value after the position is updated, which greatly accelerates the global search ability of the algorithm. Therefore, the integration of the two algorithms effectively balances the problem of local and global optimization. Based on this, the specific steps of the bat cuckoo hybrid algorithm are shown in Table 1.

4. Hybrid Algorithm of Extreme Learning Machine Based on Bat Cuckoo Algorithm

Extreme Learning Machine (ELM) selects hidden layer parameters randomly and does not need to update iteratively during training, and the output weight can be determined by the least square solution, which greatly accelerates the learning process. Although ELM overcomes the shortcomings of the traditional gradient descent algorithm, the number of hidden nodes still needs to be set in advance, which may lead to many redundant nodes. Therefore, ELM requires more random hidden nodes in some applications than traditional neural network algorithms. However, this will lead to a decrease in the sparsity and regulation ability of the hidden layer, the complexity of the network structure, and the extension of the training time and finally affect the generalization ability and robustness of the network.

BACS algorithm has the characteristics of strong search accuracy, fast convergence speed, and not easy to fall into local best and effectively balances local and global search. Using this optimization ability, the hidden layer parameters of ELM are selected appropriately to solve the problem that the hidden layer parameters need to be optimized due to randomness. Therefore, this paper considers the use of the BACS algorithm to optimize ELM so as to propose a hybrid algorithm of Extreme Learning Machine based on the bat cuckoo algorithm . We first use the BACS algorithm to train the input weights and thresholds randomly generated by ELM. The population is taken as the initially hidden layer parameter of ELM, and the fitness function of the BACS algorithm is used to conduct iterative optimization. The position of the individual of the population is constantly adjusted to find the optimal hidden layer parameter until the maximum number of iterations or search accuracy is reached. At the end of the iteration, the optimal individual position is obtained, and the optimized results are used as the input weights and thresholds of ELM to train the network so as to improve the convergence speed and stability of the network model. To prevent the problem of output saturation caused by excessive input value, we use the following formula to normalize the data:where is the original data and and represent the maximum and minimum values of the original data, respectively.

Next, the input weights and thresholds of ELM were represented by the cuckoo individuals using real coding rules. On the basis of Section 2, the number of neurons in the input layer and hidden layer is fixed as and , respectively. Therefore, the calculation formula of the coding length of the cuckoo individual is

Individual position of cuckoo can be expressed as

The input weights and thresholds of ELM are mapped to the individual position of the cuckoo, the population is randomly initialized, and the obtained random individuals are assigned to the input weights and thresholds of ELM one by one and placed in the ELM network. Here, the assignments of input weights and thresholds are, respectively, expressed as follows:

In the training sample process of ELM, in order to evaluate the prediction performance more objectively, we used the root mean square error as the evaluation index of model prediction, so the fitness function was designed aswhere is the total number of samples, represents the actual output value of samples, and represents the expected output value of samples. Table 2 shows the specific implementation steps of the algorithm.

5. Experimental Results

In order to verify the performance of the proposed algorithm, a function fitting and several classification problems are tested in this section, and the validity of is tested by comparing it with the , , and algorithms.

5.1. Function Fitting

In order to declare the performance of the proposed algorithm more intuitively and effectively, we take into account adopting , , , and to approximate the function and then compare the function approximation capabilities. The expression for the function is defined as follows:

The training set and test set of 5000 samples were selected, respectively, and the input variables obey the uniform distribution in the interval . In order to increase the authenticity and improve the generalization performance of the algorithm, random noise was added to the training samples, whereas the testing data remained noise-free. For different optimization methods, the initial parameter settings are presented in Table 3, and the maximum iteration number is set . The activation function is the RBF function, and the fitness function is RMSE. In order to compare the results of each algorithm more objectively, each experiment was run 20 times and then took the mean value.

The selection of the number of hidden nodes will have a direct influence on the performance of the model. Therefore, the experiment on was carried out by adjusting the number of hidden nodes, and the test results obtained are shown in Table 4. The results show that the function has the best fitting effect when the number of hidden nodes is 12, and the mean square error of training and testing tends to be stable with the increase of nodes. To ensure the performance of the algorithm and reduce the complexity of the model, the architecture of the optimized ELM network can be determined as 1-12-1.

Then, based on the selection of the above parameter values, simulation experiments were carried out on the ELM, , , and algorithms. It can be seen from Figure 2 that the approximation effect of the algorithm is better than that of other algorithms. Moreover, the performance comparison of each algorithm is shown in Table 5. According to the displayed results, the test RMSE value of the algorithm is the smallest, which means that the algorithm has higher accuracy and better stability. As can be seen from the training time in the table, due to the randomness of hidden layer parameters of ELM, it has a very fast learning speed, but the fitting effect is not ideal.

The results also show that the three optimization methods are all effective. But there is little difference in training and testing time between the , , and algorithms and the advantages of learning efficiency are not embodied. Nevertheless, the ELM model based on the BACS algorithm greatly improves the convergence accuracy of function fitting, so the computational efficiency is also within the acceptable range.

5.2. Classification Problems

In this section, in order to more accurately appraise the effectiveness of the algorithm, the performance of the algorithm will be compared on multiple classification problems. The relevant information of the dataset is given in Table 6. The initial parameter setting of each group was consistent with the above. The maximum iterations number and the activation function was the Sigmoid function. Each group of experiments was run 20 times to take the average value.

Figure 3 shows the comparison of the classification accuracy of the algorithm in different datasets with the change of the number of nodes. Figure 3(a) is based on the variation trend of breast cancer; it can be seen from the figure that ELM needs the most nodes to achieve relatively high accuracy, while other algorithms all achieve the highest accuracy when the node is 20, and further speaking, is slightly better. Figure 3(b) is based on the changing trend of heart failure. It can be seen from the figure that the four algorithms all show a similar curve trend when the number of hidden nodes increases and they all have the best accuracy when the node is 20, but at this time, has the highest value of 84.23%. Figure 3(c) is based on the variation trend of Iris. has the best accuracy when the node is 10, which is 5 fewer nodes than other algorithms when they get the maximum value. Figure 3(d) is based on the changing trend of the vertebral column. It can be seen from the graph that only needs the minimum number of nodes to obtain the best results, and the accuracy value fluctuates little, which indicates that the algorithm can achieve better stability.

Next, in order to better explain the accuracy of the algorithm in classification experiments, Figure 4 presents the fitness curves of the , , and algorithms under four classification problems, respectively. To maintain the consistency of the experimental environment, the number of hidden nodes for each problem was set as 20, 20, 15, and 30, respectively, while other parameters were unchanged. As can be seen from Figures 4(a)4(d), for different datasets, compared with the and algorithms, the algorithm can obtain the best fitness function value in the case of the least number of iterations. This is because when the BACS algorithm optimizes the input weights and thresholds of ELM, it has a strong local optimization ability at the initial stage of search and makes full use of the global optimization ability of the BA algorithm. The combination of the two greatly improves the convergence accuracy.

Based on the above analysis, the performance results of the four algorithms on the number of hidden nodes, training time, training, and test accuracy are also given in the experiment. It can be clearly seen from Table 7 that the algorithm can achieve the best test accuracy under the minimum number of hidden nodes in all the four datasets, which indicates that the algorithm can effectively optimize the parameters of the hidden layer of the ELM model by using BACS algorithm and then obtain a more appropriate and simplified network structure. At the same time, the best generalization performance and classification ability are obtained. In terms of computing time or efficiency, hidden layer parameters of ELM do not need to be iteratively tuned, so the learning speed is very fast, but the success rate of its classification is very low. In Table 7, we did not list the test time data because the values of the four algorithms for different datasets in the experimental results are very low, and the size is similar; that is to say, the impact of the data on the overall experiment results cannot be regarded as an evaluation item. Compared with the other two optimization methods, although the algorithm is slightly worse in learning efficiency, it shows great advantages in classification accuracy.

6. Conclusions

In this paper, we propose a hybrid Extreme Learning Machine algorithm based on the bat and cuckoo search algorithm to optimize the input weight and threshold of the traditional ELM algorithm, thus improving the disadvantages of traditional ELM, such as poor sparsity of hidden layer, low adjustment ability, and complex network structure. Meanwhile, the BACS algorithm has the characteristics of strong searching accuracy, fast convergence speed, and not easy to fall into the local optimal, which effectively balances the local and global optimization problems. Therefore, the proposed BACS-ELM algorithm can effectively solve the optimization problem due to the randomness of hidden layer parameters and improve the generalization performance of the network.

Experimental results show that the BACS-ELM algorithm is superior to other algorithms in function fitting and classification. In the future, we consider extending the BACS-ELM algorithm to practical application problems and solving a wider class of even tougher optimization problems.

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

Disclosure

This manuscript is the authors’ original work and has not been published nor has it been submitted simultaneously elsewhere.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Special Science Research Plan of the Education Bureau of Shaanxi Province of China (no. 18JK0344) and the Natural Science Basic Research Plan in Shaanxi Province of China (no. 2021JM-446).