Abstract

In order to improve the prediction performance of the existing nonlinear grey Bernoulli model and extend its applicable range, an improved nonlinear grey Bernoulli model is presented by using a grey modeling technique and optimization methods. First, the traditional whitening equation of nonlinear grey Bernoulli model is transformed into its linear formulae. Second, improved structural parameters of the model are proposed to eliminate the inherent error caused by the leap jumping from the differential equation to the difference one. As a result, an improved nonlinear grey Bernoulli model is obtained. Finally, the structural parameters of the model are calculated by the whale optimization algorithm. The numerical results of several examples show that the presented model’s prediction accuracy is higher than that of the existing models, and the proposed model is more suitable for these practical cases.

1. Introduction

Professor Deng [1] originally proposed the grey system theory to solve the uncertain system with partially known and partially unknown information. As a crucial branch of the grey system theory, it has been widely used to address numerous real-world problems owing to its effectiveness, such as electricity prediction [24], energy prediction [5, 6], and tourism prediction [7]. In these models, a common characteristic is that they do not require a large number of observations (not less than 4). It has attracted considerable interests of researchers because it is difficult, even impossible, to collect enough data to build the traditional models, including linear [8] or nonlinear regression models [9], autoregressive integrated moving average model [10] and its extensive versions [11], support vector machine [12], and artificial neural network [13].

Generally speaking, the development of discipline also benefits from practical applications. In the past three decades, various grey models have been emerged rapidly according to practical applications. For example, Xie and Liu [14] investigated the discrete grey model and analyzed the traditional grey model’s connection. Wu et al. [15] investigated the grey model with fractional order accumulation that made the grey model more flexible. For the purpose of considering the effects of related factors on the behavioral system, Tien [16] initially proposed a novel grey model called GM (1, n) in which the “n” stands for the driving variable. More recently, Wang et al. [17] presented a data-grouping approach-based grey modeling method to predict quarterly hydropower production in China. Subsequently, they proposed a seasonal grey model based on the accumulation operators for forecasting the seasonal electricity consumption of China [18]. Zeng et al. [19] predicted the sequence of ternary interval numbers using a novel multivariable grey model. Ma et al. [20] raised a conformable fractional grey system model; he also investigated the novel fractional time-delayed grey model with grey wolf optimizer [21]. A large number of related research studies emerge continuously. Zeng et al. [22] presented a new-structure grey Verhulst model for predicting China’s tight gas production. In the model, they deduced the time-response function and an initial value optimization method. The same year, they proposed another new-structure grey Verhulst model by introducing a new nonhomogeneous exponential function [23]. The model solved the problem of displacement substitution of parameters and optimization of initial values.

The metaheuristic algorithm is a strategy to solve the optimal or satisfactory solution of complex optimization problems, and it is derived from the behavior of biological systems and/or physical systems in nature [24]. The common metaheuristic algorithm includes simulated annealing algorithm [25], genetic algorithm [26], particle swarm optimization algorithm [27], and ant colony optimization algorithm [28]. For example, the simulated annealing algorithm was first used by Kirpatrick et al. [25] for combinatorial optimization problems, which overcomes the shortcoming of the hill-climbing method (HC) and is easy to fall into local solution. Yldz et al. [29] studied metaheuristic methods and proved that Henry gas solubility optimization algorithm can be used for solving shape optimization problems. The main development directions of the metaheuristic algorithm can be divided into three classes. The first is to combine with other algorithms to form a new hybrid algorithm to give full play to their characteristics. Yildiz et al. [30] presented a hybrid optimization algorithm combining the Nelder–Mead local search algorithm with the Harris hawks optimization algorithm for solving a milling manufacturing optimization problem. Similarly, a hybrid optimization algorithm based on the Nelder–Mead local search algorithm and whale optimization algorithm was proposed to accelerate global convergence speed of the whale algorithm [31], and the algorithm optimized the processing parameters in manufacturing processes. The second is to seek a new metaheuristic algorithm for optimization of complex problems in the real world from the mechanism of biological evolution. For example, Wang et al. [32] proposed a monarch butterfly optimization algorithm. The third is to improve the existing algorithms by introducing new mechanics or strategies. Hammou et al. [33] improved the particle swarm optimization algorithm with a strategy based on cooperation and hierarchization concepts for the updating of the best personal positions of particles. In recent years, metaheuristic algorithms are used in grey models for finding the optimal parameter solutions. Zhang et al. [34] optimized the background value weighting coefficients of the grey model using the genetic algorithm. In [35], a multiobjective grey wolf optimizer was used to optimize the kernel-based nonlinear extension of the Arps decline model to ensure both prediction stability and accuracy. Wu et al. [36] used the particle swarm optimization algorithm to search optimal system parameters of the nonlinear grey Bernoulli model.

This study focuses on improving the nonlinear grey Bernoulli model, which was initially proposed by Chen [37] and abbreviated as NGBM (1, 1). As is known, NGBM (1, 1) has been widely used in many problems with nonlinear characteristics and extended to general versions [38]. However, there are still spaces to improve its accuracy. The root cause of loss of information in the conversion of the grey differential equation to the grey difference equation is proposed in the paper [39]. Following the thought of Ma et al. [7], the model parameters of the NGBM (1, 1) model are optimized to better match these two equations to reduce prediction error. The main contributions of this paper are drawn as follows: (1) the grey differential equation is transformed into linear form rather than sharing the same form to the traditional NGBM (1, 1) model; (2) the optimized parameters are constructed and the whale optimization algorithm (WOA) is used to search for the optimal power index; (3) three cases are employed to verify the effectiveness of INGBM (1, 1).

The rest of this paper is organized as follows: Section 2 briefly describes the NGBM (1, 1) model and obtains the “linear” solution to the NGBM (1, 1) model. In Section 3, the NGBM (1, 1) model with improved parameters is deduced in detail. Section 4 provides two real-world examples to validate the effectiveness of the proposed model. Section 5 applies INGBM (1, 1) to predict the number of R&D institutions of higher education in China to reveal the forecasting ability of INGBM (1, 1), and the main conclusions are listed in the final section.

2. Description of the Nonlinear Grey Bernoulli Model

The nonlinear grey Bernoulli model (NGBM (1, 1)), originally proposed by Chen [37], has wide applications, especially in solving nonlinear problems. However, this model still has some drawbacks that impair the prediction accuracy of NGBM (1, 1). This section is to analyze the root reason and propose a novel method to reduce the modeling bias. First, a brief description of NGBM (1, 1) is introduced. Additionally, a “linear” solution to the whitening equation of NGBM (1, 1) is proposed to make the parameter optimization more simplified.

2.1. The Traditional Solution to the Nonlinear Grey Bernoulli Model

Assumeto be a nonnegative series, and then the first-order accumulative generating operator (1-AGO) series iswhere . The equationis called the whitening equation of nonlinear grey Bernoulli model and , regarded as the power index, cannot be equal to one. With the two-point trapezoidal formula, the discrete difference equation can be written aswhere represents the background value and is obtained as

Let The model parameters can be estimated by the least-squares method and shown that

Therefore, the solution to equation (3) with is

Using the firs-order inverse accumulative generating operator (1-IAGO), the simulated values of , , is

2.2. The “Linear” Solution to Nonlinear Grey Bernoulli Model

This section transforms the whitening equation of the nonlinear grey Bernoulli model (NGBM (1, 1)) into the linear formulation, rather than directly solving the whitening equation. That is, it does not share the same pattern as the traditional grey model. The detailed computational process can be depicted as follows.

Analogously to Section 2.1, both sides of whitening equation (3) are multiplied by , and then

Set ; furthermore,

Thereby, equation (10) can be written aswhich is called the linearization of the NGBM (1, 1) model. Moreover, it easily yields the discrete form by using the two-point trapezoidal formula as follows:where. If   and , the parameters can be estimated by the least-squares method and shown that

After estimating the model parameters, the whitening equation, equation (11), is resolved. Multiply both sides in equation (11) by the integrating operator :

Integrate both sides in equation (14) over the interval :and which is also

According to 1-AGO and ,

The solution of the NGBM (1, 1) model, either in linearization or in nonlinearization, is essentially approximate because the conversion of equations (11) and (12) is based on two-point trapezoidal formula regarded as an approximate method. It implies that the “misplaced replacement” of the model parameters will cause the following: (i) the difference grey equation does not match with the differential grey equation because model parameters have different meanings in these equations; (ii) the prediction model is not satisfied in most situations. It indicates the performance of the NGBM (1, 1) model must be improved. In other words, the model parameters should be optimized to better match equations (11) and (12) and to increase the forecasting ability of the NGBM (1, 1) model.

3. Parameter Optimization of Nonlinear Grey Bernoulli Model

The whitening equation parameters, , and power index , are important parameters of the nonlinear grey Bernoulli model. In this section, the parameters are calculated.

3.1. Whitening Equation Parameter Calculation

The optimized parameters, and , are denoted as and for simplicity. The optimized parameters are substituted into the time-response function, and the following equation is obtained:

Equation (18) is substituted into the left-hand side in equation (4):

According to equation (4), the left-hand side should be equal to the right-hand side ; that is, . Therefore,

It is easy to find that Part 1 and 2 both are equal to zero in equation (20); hence,

By doing so, the optimized parameters and can be estimated. Moreover, it is obviously believed that the optimized parameters can better match the differential equation and the difference equation and reduce the prediction error. For simplicity, NGBM (1, 1) with the improved parameters is abbreviated as INGBM (1, 1) in this study.

3.2. Power Index Estimation Based on the Whale Optimization Algorithm

In the above descriptions, the power index is assumed to be known. However, the power index is always changeable in a different situation that requires flexible adjusting over given datasets. To solve this problem, an intelligent algorithm, whale optimization algorithm, shorted for WOA, is employed to automatically determine the power index.

Based on the humpback whale’s hunting behavior that recognizes the location of prey and encircles them, Mirjalili and Lewis designed the WOA [40]. In this optimizer, assume the current best candidate solution (search agent) to be the target prey or be near the optimum. Once the best search agent is defined, the other search agents will update their positions towards the best search agent:(i)In this behavioral system, they update their position bywhere represents the current iteration, is the current best agent, and denotes the length of the individual whale approaching the current best search agent in spatial position. In particular, the coefficient vector and are defined aswhere is a random number generated from and is called convergence factor that linearly decreases from 2 to 0. That is,(ii)A spiral equation is also designed between the position of whale and prey to mimic the helix-shaped movement of humpback whales:where and implies the distance of the whale to the prey, is a constant for fixing the shape of the logarithmic spiral, and is a random number and .(iii)In addition, humpback whales also search for prey in a random way according to the position of each other. This behavior is written as the following mathematical expression:where is a random position chosen from the current position. For clearness, the detailed steps of the algorithm based on WOA to find the optimal are listed as follows:Step 1: set algorithm parameters , , and . Step 2: initialize the whales’ population . Step 3: calculate the fitness of each search agent . Step 4: update , , and according to equations (24)–(26). Step 5: generate a random number in . If , update the position of the current search agent by equation (27). If and , update the position of the current search agent by equation (29). If and , update the position of the current search agent by equation (22). Step 6: return to Step 3, until the optimal value is found.

Note that the fitness function, , as usual, is often defined as an objective function, MAPE, and shown in the next section. Moreover, the flowchart of the INGBM (1, 1) model is graphed in Figure 1 for clearness.

4. Validation of the Nonlinear Grey Bernoulli Model

This section provides two examples to demonstrate the efficacy of the proposed model comparing with three competing models, including the GM (1, 1), DGM (1, 1), NGBM (1, 1), and ONGBM (1, 1). Additionally, to evaluate the prediction accuracy of these grey models, the mean absolute percentage error (MAPE) and root mean square error (RMSE) are applied to measure the level of prediction performance, which are defined as

The grade of the prediction performance is depicted by Lewis [41] using the criteria for MAPE and listed in Table 1.

Case 1. Forecasting education-in-practice-intensive university: the example from paper [42] is used to test for efficacy and applicability of the grey model. The data from 1 to 7 are used to build different grey models, and the final data are used to test for the prediction accuracies of these models. Accordingly, the five models’ parameters are listed in Table 2, and especially parameter values of the proposed model by WOA are graphed in Figure 2.
Consequently, the simulation and prediction results are shown in Table 3.

Case 2. Forecasting subway passenger: the data sets of example from paper [43] are empirically broken down into two groups: the data from 2005 to 2012 are used to build five grey models, and the other data are used to test for the prediction accuracies of these models.
First of all, the parameter values of the five grey models are computed in Table 4. Moreover, the track of searching for the optimal nonlinear parameter of the INGBM (1, 1) model using WOA is graphed in Figure 3.
Furthermore, the simulation and prediction results are shown in Table 5.
In Tables 15, the desired conclusions can be drawn as follows:(1)In case 1, the INGBM (1, 1) model has a better prediction performance than that of other grey models whether in simulated or predicted period because of its lowest MAPE values which are 1.05% and 12.78%, respectively. Incidentally, it is notable that the MAPE values of all models increase to more than 10%, which are 19.17%, 18.86%, 14.17%, 15.44%, and 12.78%, respectively. This indicates that these models do not work quite well in this case. Nevertheless, the proposed model, INGBM (1, 1), outperforms these models. It can be seen that in this case, the fitting errors of all models are relatively small, while the prediction errors are relatively large, which indicates that the model has overfitted a little on this data set. The issue can be overcome by adding some penalty terms.(2)In case 2, the five grey models’ MAPE values are 2.61%, 2.67%, 2.23%, 2.23%, and 2.19% in the simulated period, respectively. According to the criteria for MAPE value listed in Table 1, it is easy to find that these models can effectively make predictions because of the low MAPE values. The proposed model has a smaller value that indicates higher accuracy. As is known, a favorable predictor performs well in the simulated period and satisfies prediction accuracy in the verifying period. Herein, the proposed model still is better than other grey models because of its lower MAPE value again in the predicted period. In this case, the fitting error and prediction error of all the models are small, which shows that no fitting has occurred. More, the nonlinear model (NGBM (1, 1), ONGBM (1, 1), and INGBM (1, 1)) performs better than the linear model (GM (1, 1) and DGM (1, 1)), which proves that the nonlinear grey model can well capture the nonlinear characteristics of the data.In cost-effectiveness, the grey model is a kind of model solving small sample modeling, so the time consumption is usually very small. For example, in case 1, the time cost of GM (1, 1), DGM (1, 1), NGBM (1, 1), and INGBM (1, 1) is 0.1638 s, 0.1489 s, 0.1744 s, and 0.1862 s, respectively. All the time costs are less than 1 s and within the allowable range. In summary, the INGBM (1, 1) model can enhance the prediction accuracy of the traditional NGBM (1, 1) model by optimizing the model parameters. Furthermore, the proposed model is applied to analyze the practical application.

5. Application

Universities play an irreplaceable role in the process of building a strong country in the field of science and technology in China, as the core department for cultivating talent and achieving technological innovation, which shoulder important responsibility and mission in the National Innovation System. As is expected, the number of R&D institutions of higher education has increased fast in the past few years. Accurately forecasting the number of R&D institutions of higher education will provide a reference for the Ministry of Education of the People’s Republic of China and the government to make better plans and strategies in advance. However, the effects of related factors on the number of R&D institutions of higher education are quite uncertain, and reliable observations are limited because of China’s rapid development, which implies the traditional models (e.g., regression analysis) are not suitable for this case because of the small sample size and uncertain factors. Herein, the proposed model, INGBM (1, 1), is obviously more suitable for this case with few observations.

Empirically, the data collected from China’s National Bureau of Statistics of the People’s Republic China and listed in Table 6, are divided into two groups, the data from 2011 to 2016 are used to build these five prediction models, and the others are used to assess the accuracy of these models.

Similar to Case 1 and 2, all the parameters in these models are computed and listed in Table 7. Moreover, the track of the power index using WOA is exhibited in Figure 4.

As a consequence, the simulated and predicted results are shown in Table 8.

In this case, by ignoring the first item of predicted results, it should be known that the RMSE values (see Figure 5) of five grey models are 0.14, 0.14, 0.04, 0.04, and 0.03 for simulation and are 0.87, 0.85, 0.39, and 0.40 for prediction, respectively. Moreover, the MAPE values (see Figure 6) of these models are 1.17%, 1.18%, 0.28%, 0.27%, and 0.25% for simulation and those of models mentioned here are 5.54%, 5.46%, 2.32%, 2.39%, and 1.72% for prediction, respectively. Therefore, in the simulation period, the proposed model outperforms other grey models with the lowest RMSE value of 0.03 and a MAPE value of 0.27%. The ONGBM (1, 1) model has the following prediction performance with a relatively lower MAPE value of 0.28%. As mentioned in [44], as a proper forecasting method, it performs excellently in simulation and should do well in the prediction stage. By observing Table 8, it is easy to find that the proposed model is better than other grey models again because of its lower RMSE value of 0.40 and MAPE value of 1.72%. Interestingly, the ONGBM (1, 1) is the second better because its MAPE value is a bit higher than that of the INGBM (1, 1), which implies the improved NGBM (1, 1) through optimization of background value can be regarded as the alternative model to predict the number of R&D institutions of higher education in this paper. In this case, the prediction and fitting errors of all models are not big, which shows that there has no overfitting in the modeling. At the same time, the prediction effect of the nonlinear grey model is better than that of the linear model, which shows that the nonlinear grey model can effectively capture the nonlinear characteristics of the data. Finally, the improved model has the highest accuracy, which indicates that our improvement strategy is effective.

In order to further verify the advantages of WOA, three kinds of intelligent optimizer, grey wolf optimizer (GWO) [45], particle swarm optimizer (PSO) [46], and ant lion optimizer (ALO) [47], are used for comparison. These four kinds of algorithms are all excellent optimizers with their own characteristics and advantages. The population numbers of the four algorithms are all set to be 100 and the search times to be 100. The population is initialized 100 times to compare the final MAPE with the corresponding nonlinear parameters and calculate the average time. For the four types of optimization algorithms, the MAPE, and the corresponding nonlinear parameters after running 30 times are shown in Figure 7, and the time consumption is shown in Table 9.

It can be seen from Figure 7 and Table 9 that the operation of WOA is relatively stable, and the running time of WOA is 9.9931 s, which is relatively small. Overall, the WOA is reasonable as an optimizer.

6. Conclusion

This paper aims to further promote the prediction accuracy of the nonlinear grey Bernoulli model (NGBM (1, 1)), and as a result, the nonlinear grey Bernoulli model with improved parameters, abbreviated as INGBM (1, 1), is proposed. This study does not share the same differential equation as the traditional NGBM (1, 1) model. Instead, the differential equation is transformed into the linear formula. Besides, considering that “misplaced replacement” is the root cause of contradiction when converting the differential equation to the difference equation, the model parameters are optimized to better match these two equations to reduce prediction error. In particular, the whale optimization algorithm is used to automatically determine the optimal power index of INGBM (1, 1). Three examples are employed to validate the proposed model’s effectiveness by comparing with commonly used grey models. In all cases, the proposed model both outperforms other grey models, implying that the INGBM (1, 1) model can effectively solve the nonlinear problems with a small sample size and provide valuable information for related decision-makers to make strategies in advance.

Although INGBM (1, 1) has a very good effect, there are some limitations that need to be overcome in future work: (1) although the model has a good effect, there may be overfitting in some special cases. (2) More accurate parameter values can be further obtained with multiple optimizers.

Data Availability

The data used to support the findings of this study are included in this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (11661001) and the Project of Enhancing the Basic Scientific Research Ability of Young and Middle-Aged Teachers in Guangxi Universities (2021KY0740).