1 Introduction

A default occurs when a borrower fails to repay the loan in full. Default risk identification has become one of the most important issues in the banking industry (Dia et al., 2022; Granja et al., 2022; Oreski & Oreski, 2014; Orth, 2012). Credit scoring is a common method of assessing the credit risk of loan clients and is also the basis of credit ratings, loan decisions, and risk management for financial institutions (Crone & Finlay, 2012). For example, in application scoring, lenders often use statistical or artificial-intelligence-based predictive models to estimate the probability of an applicant's risk of default. The financial crisis of 2008 caused huge losses around the world. Since then, banks and financial institutions have continued to pay attention to the credit risk prediction (Altman et al., 2020).

Industry 4.0 is opening up a new era of information technology, bringing great convenience and challenges to risk management. The term Industry 4.0 refers to various technologies used to facilitate the development of value chains and shorter manufacturing lead times, improved product quality, and organizational performance (Kamble et al., 2018). It provides new paradigms and opportunities for the industrial management of SMEs. Supported by big data technology and artificial intelligence algorithms, it is more convenient and cheaper than traditional enterprise management systems. However, SMEs still face credit risks in Industry 4.0. In the era of Industry 4.0, credit data of SMEs comes from a wide range of sources. Governments have SMEs' industrial and commercial registration, judicial, tax information, banks have SMEs' credit information such as lending, repayment, guarantee and default, SMEs have their own operation, sales, employment data, and social media platforms disclose information such as public opinion related to SMEs. The existence of big data and data informatization makes information acquisition very convenient, but also often leads to noisy information of a high dimensionality (Duan et al., 2019). Credit data is often composed of multiple dimensions. The total number of indicators is large and there are many missing values. Therefore, the feature selection of high-dimensional credit data for SMEs is particularly important before establishing credit scoring models.

The feature selection approaches include the filter, wrapper, and hybrid methods (Hinton & Salakhutdinov, 2006; Sefidian & Daneshpour, 2018). Unlike the other two methods, the filter method is used separately from the classification, so that it runs quickly and efficiently but cannot be applied to measure the default-discriminating performance of the selected indicators (Freeman et al., 2015; Nakariyakul, 2018). The indicator selection of the wrapper approach is regarded as a search optimization problem. Different indicator subsets are generated using an optimization algorithm, the classification model is used to evaluate those indicator subsets, and then the best one is selected. In general, the wrapper approach is better than the filter approach (Chen et al., 2020). The hybrid technique is a compromise between the filter and wrapper techniques with low computational complexity and high model accuracy (Lappas & Yannacopoulos, 2021). The above methods have played an important role in credit risk assessment, but there are few literature reports on the application of high-dimensional data reduction under the characteristics of big data.

In this paper, a novel binary opposite whale optimization algorithm (BOWOA) is proposed to screen features that have significant discrimination on SME' credit risk. This method combines the advantages of the binary whale optimization algorithm (BWOA) with the opposition-based learning (OBL). It can quickly realize feature screening of high-dimensional credit data for SMEs.

The contributions of this paper include the development of a novel wrapper-type credit risk feature selection methodology that combines the BOWOA with the KS method. In short, we combine the advantages of these two to construct a model which can quickly identify the crucial default features for SMEs. The proposed model enables us to select the smallest subset of indicators with the most information, from the original indicator set. Another contribution of this paper is in mining the relationship between the credit characteristics and credit risk of SMEs in the context of Industry 4.0. More broadly, the empirical analysis was conducted using Chinese credit data for 2044 SMEs, with robustness tests carried out using data for 2157 SMEs and 3111 SMEs. The proposed method can greatly reduce the computational complexity of traditional artificial intelligence wrapper techniques. This study also addresses some policy implications. Bank analysts can refer to the model to quickly check the credit default features of SMEs. Managers of SMEs can use the model to conduct risk assessments of their own companies to better understand their companies’ risk status, make credit decisions, and reduce their credit risk. Moreover, as feature selection is a common problem in data mining, this model can also be used to attribute reduction of high-dimensional data in other fields.

In the remainder of the paper, related work is reviewed in Sect. 2. Then, the methodology is described in Sect. 3. Next, the empirical results are presented in Sect. 4. Finally, the discussion and conclusions are provided in Sects. 5 and 6, respectively.

2 Literature review

SMEs are the most common form of business and play an important role in economic development. Many literatures have been analyzed and reported on the study of financing status of SMEs (Bagale et al., 2021; Beck et al., 2005). Financial institutions and bank managers have always paid attention to the effective prediction of SMEs’ default risk (Ciampi et al., 2021; Edmister, 1972; Laitinen, 1993; Medina-Olivares et al., 2022; Shi et al., 2018). However, SMEs face more difficulties than larger companies in their risk management processes (Ciampi, 2015). The technology and management models used by large companies are usually not fit for SMEs because they are either too complex or too expensive (Pereira et al., 2013). Compared with larger companies, SMEs often face different challenges and dilemmas (Wadhwa, 2012). The financing needs of large enterprises have benefited from the development of information technology and artificial intelligence technology, but small enterprises have benefited less in this regard because small enterprise loans are still constrained due to the large amounts of missing credit data. Researchers have examined various opportunities and challenges faced by SMEs and found that their production, operation and scale expansion rely heavily on access to capital investment and the returns on those investments. One very important feature affecting the sustainable development of SMEs is that it is difficult to guarantee the credit funds required for operation. Moreover, SMEs are more vulnerable to risk exposure than large companies, and should therefore be more engaged in risk management (Blanc Alquier & Lagasse Tignol, 2006). They face more challenges in accessing resources, have less diversification of economic activity, and weaker financial structures, and tend to face layoffs in the event of a crisis (Kim & Vonortas, 2014). Although, in the era of Industry 4.0 credit risk management is crucial to the development of sustainability and business value for SMEs, an effective risk management model for them is far from being adopted (Pereira et al., 2013). All these factors are encouraging more and more researchers, policymakers, and lenders to focus on ways of improving the precision of risk prediction and lowering the default rates of small enterprises (Oliveira et al., 2017). Therefore, it is necessary to build a complete credit risk management system for SMEs, and credit risk feature selection will be the cornerstone of the establishment of such a credit system.

The feature selection technique has been widely used to diminish the dimensions of data, reduce the computational complexity of models, and improve their accuracy (Mehmanchi et al., 2021). In general, feature selection involves picking an optimal subset of indicators from the original set of indicators. This optimal subset will contain fewer indicators, but still fully reflect the information of the original set (Shi et al., 2018). If the original set is assumed to have n indicators, then the computational complexity of the problem will be 2n, which turns into a non-deterministic polynomial (NP) problem. There is no direct method of obtaining such a subset, and most algorithms are based on approximate solutions. Scholars usually divide feature selection techniques into three categories: filter, wrapper, and hybrid (Kohavi & John, 1997; Lappas & Yannacopoulos, 2021).

The filter method is a statistical feature selection algorithm with low computational complexity. It directly filters the optimal feature subset using the statistical characteristics of the sample data. Therefore, this kind of algorithm is very suitable for data with a large sample size. Chi and Zhang (2017) built a feature selection model for SMEs using the rank-sum test, which avoids the assumption that the training data follows a normal distribution. By maximizing the Gini coefficient with nonlinear programming, Zhang et al. (2022) found the crucial credit risk indicators of Chinese SMEs. They also found that a single indicator may have a strong default discrimination ability but, when combined with other indicators, it may not necessarily have the same ability. Thus, it is important to screen the optimal combination of indicators.

The wrapper method is proposed by John et al. (1994). Compared with the filter algorithm, the wrapper algorithm is more accurate, but its computational complexity is greater. In order to seek the best subset of features, wrapper algorithms need to train themselves using classification models. These characteristics make wrapper algorithms powerless in the big data era. Maldonado and Weber (2009) constructed a wrapper algorithm with the support vector machine (SVM) model as a kernel function to screen the optimal subset of features for identifying default status. Chen and Xiang (2017) used the group least absolute shrinkage and selection operator (LASSO) method to select a group of optimal features. To get the key indicator for oil credit, Bellotti and Crook (2009) used the SVM model and compare it to logistic regression, and found the SVM model to be competitive for feature selection.

The hybrid method combines the advantages of the filter and wrapper algorithms. The idea of this kind of technique is to filter out a pool of features using the filter algorithm, and then use the wrapper method on the subsets from this pool. Therefore, the hybrid algorithm makes a trade-off between the accuracy of the model and computational complexity. Oreski and Oreski (2014) combined the genetic algorithm (GA) and neural networks, proposing a hybrid feature selection algorithm for evaluating credit risk. Chen and Li (2010) introduced a hybrid feature selection method based on the SVM, and their study shows that the hybrid method is robust. More recently, Lappas and Yannacopoulos (2021) constructed a machine learning framework combining expert experience with the GA for hybrid feature selection for credit risk evaluation.

The techniques used for searching the characteristic space in the wrapper and hybrid strategies are different. Because of the computational complexity of the problem, artificial intelligence optimization algorithms are commonly used, such as the GA, particle swarm optimization (PSO), and the whale optimization algorithm (WOA) (Bhattacharya et al., 2018; Kaur & Arora, 2018). The WOA is an optimization algorithm that simulates the natural behavior of humpback whales (a spiral motion and bubble-net foraging). It was proposed by Mirjalili and Lewis (2016) and is used for parameter optimization and large-scale function optimization. Compared with PSO, the WOA has higher storage efficiency because it only stores the globally optimal value during the iteration. Aljarah et al. (2016) applied the WOA to the optimization of weights and biases in artificial neural networks. Compared with PSO, the GA, and back propagation (BP), the WOA can not only avoid local optimization, but also improve convergence speed. Tharwat et al. (2017) used the WOA to find the optimal model parameters of the SVM so as to reduce classification error. To prevent the WOA from becoming trapped in a local optimum, Sayed et al. (2018) applied chaos theory, using the regularity and semi-randomness of the chaotic system to improve the WOA’s global optimization ability.

Although the existing research has achieved great success in dealing with feature selection issues for credit risk, it still has some limitations. First of all, when selecting a subset of indicators, most studies only consider the accuracy of the model, and not the total number of indicators in the subset, thus neglecting to consider the calculation cost and the data collection cost. Secondly, the computational complexity of the existing wrapper algorithms is very high, and they are inefficient at dealing with large-scale data. In the Industry 4.0 era, there is little research on the use of wrapper algorithms for large-scale credit risk samples with its high-dimension datasets. Therefore, a novel feature selection model, including more optimization algorithms and performance metrics, should be proposed. It will guide banks’ credit decision-makers and help them select the most appropriate methods for their own unique datasets.

3 Methodology

In this section, we put forward a novel credit risk feature selection model for SMEs. To begin with, the original data is standardized and sample expansion is performed on the default sample. Then, the BOWOA artificial intelligence optimization algorithms and the KS statistic are used to find the optimal subset of indicators that can accurately identify the default risk of SMEs. Finally, three measures including accuracy, area under the curve (AUC) and f1 scores are utilized to evaluate the predictive performance of the model. The reasons for these steps will be discussed in the following.

The first step is to collect SMEs’ credit data. Assuming that the sample consists of n indicators and m customers, the SMEs’ credit data can be described as a matrix X with m rows and n columns. We expect to select k indicators from these n indicators, such that using the newly obtained data matrix X′ to perform default prediction is better than using the original dataset. The number k of indicators should be as small as possible so that when banks collect small enterprises data, they will only need to collect k indicators instead of n. Fewer indicators mean that the banks’ data collection costs will be lower. Also, the fewer indicators there are, the faster the computer modeling will be. In addition, since normalizing sample data can enhance the performance of machine learning classification models, after collecting the data, necessary data cleaning and data normalization are needed. Because the credit data of SMEs is often imbalanced, this could lead to the model over-identifying SMEs that do not default. Studies have shown that sample expansion techniques have an active effect on the predictive performance of models. Therefore, sample expansion is performed on the original SMEs’ credit data.

Then, a new BOWOA method is proposed to reduce features. It is well known that the WOA is an emerging artificial intelligence optimization algorithm for high-dimensional problems (Zhang & Wen, 2021). In order to apply WOA to discrete problems, Hussien et al. (2020) propose the BWOA, which differs from the WOA as it replaces the continuous whale agent with a binary whale agent. Lappas and Yannacopoulos (2021) show that the OBL method can improve the prediction performance of unbalanced credit samples. This study combines the advantages of the OBL with BWOA, and proposes the BOWOA algorithm for feature screening of high-dimensional unbalanced credit data.

Based on the motivation of this study, those indicators that have strong default discrimination power are needed. To do so, a quantity is required to measure the default discriminatory ability of the model. This study uses the KS statistic to measure the default discriminatory ability of the model. Supposing the sample data consists of n indicators, and then these n indicators generate 2n subsets of indicators. When n is relatively small, all these subsets of indicators can be traversed to find the optimal indicator group. Theoretically, if it was possible to use some kind of machine learning algorithm and do an analysis on these 2n subsets of indicators, the KS statistic for each subset of indicators could be obtained. It is obvious that the subset of indicators with the highest KS statistic will be the group of indicators that would be best selected. Unfortunately, when n is not small, it is impossible to traverse all 2n subsets of indicators and obtain their KS statistics, as it would involve huge computational complexity. The computational complexity here is mainly due to three aspects: the high dimensionality of the data, the diversity of the indicator groups, and the complexity of machine learning models.

The model is summarized into the following three steps. Step 1. Establish the original indicator set and standardize the original data. Perform sample expansion on the default sample. Step 2. Filter default features for SMEs using the BOWOA algorithm and the KS statistic by a linear scoring model. Step 3. Evaluate the predictive performance of the model using accuracy, AUC and f1 scores.

figure a

It is worth mentioning that the linear model is chosen to calculate the KS statistics and the weight wj of the linear function of each individual indicator. In practice, users can choose the SVM, random forest (RF), or some other methods to replace the linear function. For the linear scoring model \(\sum\limits_{j = 1}^{k} {w_{j} x_{ij} }\), we already know the value of the j-th indicator of the i-th small enterprise xij. To evaluate the KS statistic of the given subset of indicators, the weight coefficient wj of each indicator should be determined. If there is only one indicator (k = 1), the weight of the indicator is 1. For multiple indicators \(\{ x_{1} ,x_{2} ,...,x_{k} \}\) (k > 1), the weight wj corresponding to the j-th indicator is defined as follows.

$$w_{j} = \frac{{KS(\{ x_{j} \} )}}{{\sum\limits_{j \le k} {KS(\{ x_{j} \} )} }}$$
(1)

In Eq. (1), KS({xj}) represents the value of KS statistic of the j-th indicator. This formula allows quick calculation of the fitness function value of the BOWOA algorithm. Although the linear function looks simple, we found it to be effective in the empirical analysis.

3.1 Data standardization and SMOTE sample expansion

The original data should be normalized to eliminate the influence of different dimensions of features on the selection of indicators (Chai et al., 2019). The scoring standard of qualitative indicators is usually assessed through expert investigation and rational analysis (Bai et al., 2019). The standardization formulas for the positive, negative, and interval indicators are given below.

Equation (2) shows the standardization formula for positive indicators:

$$x_{ij} = \frac{{v_{ij} - \min (v_{ij} )}}{{\max (v_{ij} ) - \min (v_{ij} )}}$$
(2)

where xij denotes the standardized score of the i-th SME for the j-th indicator, and vij denotes the raw data for the i-th SME and j-th indicator.

Equation (3) shows the standardization formula for negative indicators.

$$x_{ij} = \frac{{{\text{max}}(v_{ij} ) - v_{ij} }}{{{\text{max}}(v_{ij} ) - {\text{min}}(v_{ij} )}}$$
(3)

Equation (4) shows the standardization formula for interval indicators.

$$x_{ij} = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {1 - \frac{{q_{1} - v_{ij} }}{{\max (q_{1} - \min (v_{ij} ),\max (v_{ij} ) - q_{2} )}},} & {v_{ij} < q_{1} ,} & {(a)} \\ \end{array} } \\ {\begin{array}{*{20}c} {1 - \frac{{v_{ij} - q_{2} }}{{\max (q_{1} - \min (v_{ij} ),\max (v_{ij} ) - q_{2} )}},} & {v_{ij} > q_{2} ,} & {(b)} \\ \end{array} } \\ {\begin{array}{*{20}c} {1\;\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad ,} & {q_{1} \le v_{ij} \le q_{2} } & {(c)} \\ \end{array} } \\ \end{array} } \right.$$
(4)

where q1, q2 denote the left and right boundary of the ideal interval respectively.

The SMOTE method is a method of synthesizing new samples from a few existing samples. It has been widely used in credit risk evaluation. The main idea is to interpolate between two default sample points to obtain a new sample point without losing the original data or over-fitting.

Specifically, let x1 denote a default sample point and x2 denote another default sample point. Then, a new sample point x3 can be synthesized according to the following rules (Fernandez et al., 2018):

$$x_{3} = x_{1} + rand(0,1) \times (x_{2} - x_{1} )$$
(5)

3.2 Construction of indicator selection model based on BOWOA

3.2.1 Initialization of population of indicator subsets based on OBL

The population of subsets of indicators needs to be initialized before using the BOWOA to find the optimal subset. The OBL algorithm proposed by Tizhoosh (2005) has often been used to optimize algorithms to improve global search performance (Ali et al., 2012; Park & Lee, 2015). Thus, OBL is used to initialize the population of subsets of indicators. By introducing OBL, we ensure that the BOWOA not only considers the subset of common indicators but also the corresponding opposite subset of indicators. This lays the foundation for the global search of the BOWOA.

To search for the optimal indicator subset in binary space, it is necessary to extend the definition of the opposite point (Wang et al., 2011) to the binary opposite point in a binary space.

Definition 1

(Opposite Point). Let X = (xl, x2, …, xD) denote any point in a D-dimensional space, where xi is any point in the real interval [ai, bi]. Then the opposite point to X can be denoted by OX = (oxl, ox2, …, oxD), where oxi = ai + bi-xi, i = 1, 2, …, D.

Definition 2

(Binary Opposite Point). Let X = (xl, x2, …, xD) denote any point in a D-dimensional binary space, where xi = 0 or 1. Then the opposite point of X can be denoted by OX = (oxl, ox2, …, oxD), where oxi = 1-xi, i = 1, 2, …, D.

Based on Definition 2, the basic steps for initializing the population of subsets of indicators are as follows. Firstly, the indicator subset population IP is initialized. To be specific, we randomly generate N indicator subsets Xi = (xil, xi2, …, xiD) as the initial indicator subset population IP, where i = 1, 2, …, N, and D is the number of original indicators. \(x_{i}^{D} \in \{ 0,1\}\), xiD = 1 denotes keeping this position indicator as the indicator subset element. Secondly, the opposite indicator subset population OP is generated. According to Definition 2, we can get the opposite indicator subset OXi of the indicator subset Xi from Step 1, and obtain the opposite indicator subset population OP = {OX1, OX2, …, OXN}. Thirdly, the new initial indicator subset population FP is generated. The fitness values of the 2 N indicator subsets from the above two steps are calculated and they are sorted in descending order of fitness value. The first N indicator subsets are then selected and used to construct the new initial indicator subset population, which is recorded as FP.

3.2.2 Indicator selection based on the BOWOA

Based on the initial population of subsets of indicators, FP, obtained in subsection 3.3, in this section the BOWOA is used to select indicators that are significant in identifying customer default status. The BOWOA seeks the optimal subset of indicators in two ways: exploitation and exploration (Mirjalili & Lewis, 2016). When finding the optimal subset, the BOWOA adopts the exploitation method. The process is as follows.

Let Φ be the original set of indicators, φ be the optimal subset of indicators, and Xgb be a subset that is approximate to the optimal subset φ.

  1. (1)

    Search for the optimal subset of indicators φ based on the exploitation method

    In the exploitation phase, the BOWOA uses two approaches to find the optimal subset of indicators φ. First, “shrinking encircling” is defined, as shown in Eqs. (69):

    $$X_{i} (t + 1) = X_{gb} (t) - R|C \cdot X_{gb} (t) - X_{i} (t)|$$
    (6)
    $$R = 2a \cdot r_{1} - a$$
    (7)
    $$C = 2 \cdot r_{2}$$
    (8)
    $$a = 2 - \frac{2t}{{t_{max} }}$$
    (9)

    where t is the current number of iterations and tmax is the maximum number of iterations. Xi(t) is the i-th subset of indicators of the current iteration t, R and C are coefficient vectors, and a is the convergence coefficient, which is decreased linearly from two to zero. The value of a in Eq. (9) is controlled to imitate the shrinking encircling mechanism. As the convergence factor a decreases, the fluctuation range of A decreases, and the subset of indicators Xi(t) approaches Xgb(t).

    Second, “spiral updating” is used, as shown in Eq. (10):

    $$X_{i} (t + 1) = \left| {X_{gb} (t){ - }X_{i} (t)} \right| \cdot e^{bl} \cdot {\text{cos(2}}\pi l{) + }X_{gb} (t)$$
    (10)

    where b is a constant parameter, and l is a random real number in the interval [-1, 1].

    In general, the above two ways of finding the optimal subset of indicators are implemented at the same time. To model this simultaneous behavior, it is assumed that there is a probability of 50% of choosing the spiral updating method, and an equal probability of choosing the shrinking encircling method (Hussien et al., 2020). The mathematical model is as follows:

    $$X_{i} (t + 1){ = }\left\{ {\begin{array}{*{20}c} {X_{gb} (t) - R|C \cdot X_{gb} (t) - X_{i} (t)| \, p{ < 0}{\text{.5 }}} \\ {\left| {X_{gb} (t){ - }X_{i} (t)} \right| \cdot e^{bl} \cdot {\text{cos(2}}\pi l{) + }X_{gb} (t) \, p \ge 0.5} \\ \end{array} } \right.$$
    (11)

    When p < 0.5, the BOWOA searches for the optimal subset of indicators through the shrinking encircling method; otherwise, it uses the spiral method. It is worth noting that, if the updated Xi(t + 1) is better than Xgb in terms of identifying default status, we need to update Xgb using Xi(t + 1) as the approximate subset to the optimal subset of indicators for the next iteration.

  2. (2)

    Search for the optimal subset of indicators φ based on the exploration method

    It is easy to see that the optimal subset of indicators obtained by the exploitation method may be a local optimum rather than the global optimum. To alleviate this problem, Xgb is replaced with a random subset of indicators Xr(t) to force the BOWOA to do a global search. The mathematical equation is as follows:

    $$X_{i} (t + 1) = X_{r} (t) - R \cdot |C \cdot X_{r} (t) - X_{i} (t){| }$$
    (12)

    Since this paper is searching for the optimal subset of indicators in a binary space, Xi(t + 1) in Eqs. (6), and Eqs. (10, 11 and 12) to be transformed into binary form (Hussien et al., 2020). The mathematical functions are as follows:

    $$S(X_{i}^{k} (t + 1)) = \frac{1}{{1 + e^{{ - X_{i}^{k} (t + 1)}} }}$$
    (13)
    $$X_{i}^{k} (t + 1) = \left\{ {\begin{array}{*{20}c} {0 \,\,\, rand < S(X_{i}^{k} (t + 1))} \\ {1 \,\,\, rand \ge S(X_{i}^{k} (t + 1))} \\ \end{array} } \right.$$
    (14)

    where Eq. (13) is the sigmoid function that can transform any real number into the interval (0, 1), Xik(t + 1) is the value of the i-th subset of indicators in the k-th position, and S(Xik(t + 1)) is the sigmoid value corresponding to Xik(t + 1). Equation (14) transforms S(Xik(t + 1)) into a variable that takes either 0 and 1, achieving the purpose of the BOWOA which is to search for the optimal subset of indicators in a binary space. Algorithm 2 presents the pseudo-code of the BOWOA.

    figure b

    There are three reasons why the BOWOA algorithm is used for feature selection of high-dimensional credit data of SMEs. First, the BOWOA algorithm can select the optimal group of indicators with strong default identification ability. Some traditional feature selection methods, such as the RF and logistic regression, can find the importance of a single indicator, but they cannot obtain the optimal group of indicators (Zhang et al., 2022). Second, compared with other optimization algorithms, the BOWOA can quickly find the global optimal solution, and it uses stochastic optimization to avoid falling into local optimality. Third, in the age of big data, credit data often presents the characteristics of high dimensionality and noise. From the empirical analysis below, the BOWOA shows it has advantages in processing high-dimensional and imbalanced data.

3.2.3 Model parameter selection

  1. (1)

    Fitness function

    In the artificial intelligence optimization method, the fitness function lies at its core and directly affects the speed and effect of the optimization. The linear scoring model and the subset of indicators to predict the KS statistic are chosen as the fitness value of the BOWOA optimization algorithm, following the definition of the KS statistic given by Rezac and Rezac (2011).

    The empirical distributions of the KS statistics of the default and non-default SMEs are calculated, followed by the maximum value of the difference between the two distributions under different thresholds. Supposing that each SME has a credit score, the characteristic function of default can be defined as:

    $$D(k) = \left\{ \begin{gathered} 1,{\text{ if the }}k{\text{ - th SME is non - default }} \hfill \\ 0,{\text{ if the }}k{\text{ - th SME is default }} \hfill \\ \end{gathered} \right.$$
    (15)

    The empirical cumulative distribution functions of the scores of the non-default and default SMEs are as follows.

    $$\begin{gathered} F_{{n_{1} ,non - default}} (a) = \frac{1}{{n_{1} }}\sum\limits_{i = 1}^{{n_{1} }} {I((S_{i} \le q) \, and \, (D(i) = 1))} \hfill \\ F_{{n_{2} ,default}} (a) = \frac{1}{{n_{2} }}\sum\limits_{i = 1}^{{n_{2} }} {I((S_{i} \le q) \, and \, (D(i) = 0)),} \, q \in {[}L{,}{\kern 1pt} {\kern 1pt} H{]} \hfill \\ \end{gathered}$$
    (16)

    where Si is the credit score of the i-th SME, n1 is the number of non-default SMEs, n2 is the number of default SMEs, and function I(x) equals 1 if and only if statement x is true. q is the threshold used for judging whether a small enterprise is default. L and H are the lower and upper bounds of all credit scores, respectively The KS statistic is defined as

    $${\text{KS = }}\mathop {\max }\limits_{q \in [L,H]} |F_{{n_{1} ,non - default}} (q) - F_{{n_{2} ,default}} (q)|$$
    (17)

    As an example, Fig. 1 exhibits the estimated cumulative distribution functions for default and non-default SMEs, including a rough estimate of the KS statistic (marked by a blue line). The KS statistic measures the predictive model’s power to distinguish between default and non-default SMEs. In general, a larger KS value implies greater discrimination power.

    Fig. 1
    figure 1

    Example of KS statistic

    In the credit rating of SMEs, the accuracy value and the AUC value are most often applied to judge the predictive performance of models. Few studies use the KS statistic, and we believe that the introduction of the KS statistic can give researchers a new perspective. The KS statistic measures the ability of a model to distinguish between default and non-default enterprises. This study chooses the KS statistic as the metric for three reasons. First, the credit data of SMEs is often imbalanced (Mahbobi et al., 2021). The KS statistic measures the default discrimination power of a model and can show the characteristics of imbalanced SME credit data. Second, the KS statistic is a non-parametric statistic. Its calculation depends neither on the distribution of the sample nor on the threshold used to judge default. This is more suitable for the era of Industry 4.0, with its high-dimensional and noisy data. Third, the KS statistic can reveal the default discrimination ability of a single indicator or a group of indicators (Yu et al., 2018), which makes it applicable to the feature selection problem studied in our work.

  2. (2)

    Parameters settings for the BOWOA-KS model

    In the BOWOA-KS model, the binary sequence needs to be initialized randomly at first to represent a subset of indicators. In order to ensure that every indicator can be randomly selected with equal probability in the screening process, following random generation rules are adopted. Every time a binary sequence is randomly generated, a positive integer N with a value from 0 to n is first generated using a uniform distribution. This represents the number of numbers with a value of 1 in the binary sequence. Then the positions in N binary sequences are randomly selected and appointed as 1. The advantage is that the total number of indicators will be obtained with the same probability, and each indicator also has equal probability of being selected.

    Usually, the number of iterations of the BWOA is in the range [5, 50] (Aljarah et al., 2016), and the initial population is in the range [5,100]. If the sample has n indicators, then theoretically there should be 2n subsets. Compared to 2n, the initial population size of 100 is too small. We found that, if the initial population size was small, this led to an inability to find the optimal solution. After considering the accuracy of the model and the running speed of the program, we set the initial population of the model to 1,000 and the number of iterations to 50. The fitness function value of a population of 1,000 can be calculated quickly because the model of this paper is based on using a linear function for scoring the SMEs’ credit risk. Since the weights in the linear model can be calculated based on the fitness value of each indicator, we only need to save those fitness values in advance.

3.3 Evaluation measures

The quality of the selected subset of indicators is mainly evaluated by two aspects. The first is the total number of indicators in the subset. It is desirable to have the number of indicators to be as small as possible, because this will reduce the cost of data collection and the time required for the computer modeling. The second aspect is how well credit risk can be predicted using the subset of indicators, which is captured by metrics such as accuracy, f1 score, receiver operating characteristic curve (ROC), etc. (Rezac & Rezac, 2011). These metrics can be assessed usually based on a confusion matrix. Table 1 describes the confusion matrix for SME credit scoring.

$$TPR = \frac{TP}{{TP + FN}}{,}{\kern 1pt} {\kern 1pt} {\text{ Precision = }}\frac{TP}{{TP + FP}},{\kern 1pt} {\kern 1pt} {\kern 1pt} FPR = \frac{FP}{{FP + TN}}$$
(18)
Table 1 Confusion matrix for SME credit prediction

where TP represents the number of non-default SMEs that are correctly identified, and FN represents non-default SMEs that are incorrectly identified. FP represents the number of default SMEs that are incorrectly identified, and TN represents the number of default SMEs that are correctly identified. Then the Accuracy can be given by Eq. (19) (Sun et al., 2022):

$$Accuracy = \frac{TP + TP}{{TP + FN + FP + FN}}$$
(19)

Studies usually use TPR and precision to evaluate the performance of binary classification model. However, when the results of these two measurement approaches are close, it is difficult for researchers to judge the performance of the prediction model. F1-score takes the advantages of precision and TPR:

$${\text{f1 - score}} = 2 \cdot \frac{{TPR + {\text{Precision}}}}{{TPR \cdot {\text{Precision}}}}$$
(20)

The ROC curve is another measure, especially for models with imbalanced data (Abedin et al., 2021). It is an FPR-TPR diagram in which the abscissa indicates FPR and the ordinate indicates the TPR. Due to the unbalanced characteristics of credit data with many non-default samples and few default samples, the prediction model is easy to misjudge default samples as non-default samples. ROC can effectively avoid this misjudgment caused by data imbalance. For convenience, researchers usually utilize the area under the ROC curve (AUC) to evaluate the prediction performance of the model for unbalanced samples. In the following, this paper uses three evaluation measures including accuracy, f1-score, and AUC to test the effectiveness of the BOWOA-KS model.

4 Empirical analysis

4.1 Sample and data sources

This paper uses credit data for 2,044 SMEs of a Chinese commercial bank, and conducts empirical analysis. The 2,044 SMEs consist of 228 default customers and 1,816 non-default customers. Thus, the sample imbalance ratio is 8:1, satisfying the characteristics of imbalanced data. Based on Standard & Poor’s, Fitch, and Moody’s (Standard & Poor’s Services, 2011; Fitch Ratings, 2013), the indicators used by the China Postal Savings Bank (2009) and the Agricultural Bank of China (2011), a total of 44 indicators of SMEs were selected. These include five layers: “C1 basic information of lenders”, “C2 repayment ability”, “C3 repayment willingness”, “C4 guarantee”, and “C5 macro environment”. It includes 44 indicators, as shown in column c of Table 2. Descriptive statistics of the indicators are shown in Table 3. The standardization for the quantitative and qualitative indicators described in Sect. 3.1 is adopted here to standardize the original data from Table 2. The standardized data are shown in Table 4, column e.

Table 2 Original data on 2,044 SMEs
Table 3 Descriptive statistics of data for 2,044 SMEs
Table 4 Standardized data of the 2,044 SMEs

4.2 Solution of feature selection problem

In this study, seven machine learning classification methods were used to test the results. These methods include the K-Nearest Neighbor (KNN), linear regression (LR), SVM, decision tree (DT), RF, the gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost), which are very commonly used in risk prediction (Abedin et al., 2021; Sun et al, 2022). These artificial intelligence classification methods were used to test whether the subset of indicators obtained using the BOWOA-KS model is reliable. Other models were also looked at for comparison. For the optimization algorithm, besides the BOWOA, the PSO algorithm was used. For the model’s fitness function, besides the KS statistic, the Gini coefficient and the AUC value were used. The accuracy or f1 score were not used because, for linear models, these two performance metrics require the default threshold to be determined. In order to avoid artificial subjectivity, models that do not require this were used as the fitness function. The RF method was also used to obtain the feature importance of each indicator in the raw data set. For a given subset of indicators, the grid search with cross validation was used to determine the hyper parameters of the machine learning classification methods (Abedin et al., 2020). Table 5 shows a list of the hyperparameters of each machine learning classification model for BOWOA-KS.

Table 5 Hyperparameters of artificial intelligence classification methods

The accuracy, f1 score, AUC, and ROC curve for different subsets and machine learning methods were also obtained. The measurements of model performance are shown in Tables 6, 7 and 8, respectively. The accuracy distribution of the different machine learning methods with different numbers of indicators is shown in Fig. 2. The corresponding ROC curve of the different methods for the BOWOA-KS model is shown in Fig. 3.

Fig. 2
figure 2

Accuracy of different machine learning methods with different numbers of indicators

Fig. 3
figure 3

ROC curve of different methods used in the BOWOA-KS model

In order to determine the 15 features that have the greatest impact on the 2,044 SMEs’ default risk, the RF model is adopted to screen the 44 indicators. Figure 4 shows the top 15 features.

Fig. 4
figure 4

15 most important features in the data for the 2,044 SMEs using RF

4.3 Model robustness test

To verify the robustness of the proposed model, it was applied to two other sets of small enterprise loans in China. One of the samples contains data on 2,157 SMEs from a bank in China, with a total of 60 credit indicators, and a default ratio of about 9%. The other sample comprises 3,111 SMEs and comes from a Chinese bank, with a total of 80 credit indicators, and a default ratio of about 2%. Both samples are imbalanced. The proposed method was used to analyze these two samples and the results are shown in Tables 9, 10 and 11.

The 15 most important features from the original sets of indicators (60 and 80 indicators) are calculated in the same way as above. The results are shown in Figs. 5 and 6.

Fig. 5
figure 5

15 most important features for dataset of 2,157 SMEs using RF

Fig. 6
figure 6

15 most important features for dataset of 3,111 SMEs using RF

5 Discussion

To estimate the efficiency of the proposed BOWOA-KS model, the accuracy, AUC, and f1 scores were compared with those of other models, using two other credit datasets of Chinese small enterprises. By using the same initial parameters for the classifiers, a comparison of the feature selection techniques is possible. It is observed that, for different machine learning classifiers, if the number of indicators in the optimal subset of indicators and the prediction performance of the classifier are considered at the same time, the BOWOA-KS model is better (see Tables 6, 7, 8, 9, 10 and 11).

Table 6 Accuracy of different models for different subsets
Table 7 f1 scores of different models for different subsets of indicators
Table 8 AUC of different models for different subsets of indicators
Table 9 Accuracy of different models for different subsets of indicators

At the same time, it is evident that no artificial intelligence method is omnipotent. Specifically, the prediction performance levels of the different artificial intelligence classification methods are very close to each other. For accuracy, the prediction performances of SVM, RF, GBDT, and XGBOOST are better. If the f1 score is used for comparison, the performances of SVM and LR are worse than those of the other models. For the AUC value, the prediction performances of RF, GBDT, and XGBOOST are better. It is worth noting that, compared to the original small enterprise credit indicator set (44 indicators in total), the BOWOA-KS model established in this paper filters out fewer indicators. Although the number of indicators is reduced, we can observe that the model’s predictive power has not declined. This shows that the BOWOA-KS model established in this paper is effective. Using the PSO algorithm and the BOWOA for comparison, with the same parameter settings, we found that the indicator subset produced by the BOWOA contains fewer indicators and, except for the AUC values, has prediction performance metrics that are almost the same as for the PSO. Much to our surprise, the BOWOA-KS model finally selects only eight indicators from the original 44. That is to say, only 18% of the original set of indicators is used, but the prediction performance does not decrease. Compared with the optimal subset of indicators selected by the PSO algorithm, that selected by the BOWOA contains fewer indicators, meaning that the small enterprise default feature selection model proposed in this paper is effective and can reflect the information characteristics of the original set of indicators with fewer indicators. This provides new ideas for the selection of default characteristics of small enterprises and will reduce the cost of data collection for banks.

The interpretation of data in Tables 9, 10 and 11, shows that the model established in this paper is effective and robust. They allow fewer indicators to be used to reflect the default characteristics of the original indicator set. Moreover, the sample and subsets of indicators will largely affect the prediction performance of the machine learning classification model. The performances of the different artificial intelligence classification methods show no great differences. For the dataset of 3,111 small enterprises, the prediction performance of each classifier is better than that for the 2,157 small enterprises. The reason is that the default rate among the 3,111 small enterprises is only 2%, so the model overfits the non-default samples, resulting in high-performance metrics. In addition, it was noticed that, for these two samples, the number of indicators selected by the BOWOA is less than the number selected by the PSO algorithm. This also confirms that the BOWOA established in this paper is effective. Although the BOWOA-KS model has more indicators than the BOWOA-AUC and BOWOA-GINI, careful observation reveals that each artificial intelligence classifier in the BOWOA-KS model selects a subset of indicators with higher predictive performance. This means that no model can select a subset with both a small number of indicators and high predictive performance, indicating that decision-makers need to make a trade-off between the number of indicators and the predictive performance of the model. In general, the subset of indicators selected by the BOWOA-KS model is better than the subset of indicators selected by other models.

Table 10 f1 scores of different models for different subsets of indicators
Table 11 AUC of different models for different subsets of indicators

From the above discussion, we can identify those key components that affect the performance of the BOWOA-KS model. The performance is influenced by the quality of the credit evaluation data, machine learning classification algorithms, and performance evaluation criteria. First, the most important factor affecting model performance is the quality of the credit evaluation data. The accuracy of the credit risk prediction using 3,111 SMEs is better than that of the predictions using the other two datasets, i.e. with 2,044 SMEs and 2,157 SMEs. The quality of the data generally includes three aspects: information capacity, missing values, and imbalance ratio. It can be observed that the more imbalanced the sample, the more accurate the model will be. A potential reason may be that the model overidentifies non-defaulting SMEs in an unbalanced sample. In addition, the more information the sample contains, the higher the accuracy of the model will be, which is consistent with our expectations. Second, the machine learning classification algorithms also affect the prediction performance, and no single machine learning classification algorithm is suitable for all data. Different datasets will be best served by different machine learning classification algorithms. Third, if multiple model performance evaluation criteria are considered simultaneously, for example, for the same dataset, the highest AUC and the highest accuracy rate will not necessarily be found for the same model. This indicates that the evaluation criteria used will also affect the performance of the model. The above analysis shows that the quality of credit data plays a dominant role in the prediction performance of any model. This has inspired us to work on establishing a multi-source SME credit database in the era of Industry 4.0. In addition, when analysts use a machine learning classification model to predict default risk, they should choose a classification model that suits the data.

We also calculated the 15 most important features of the other two samples using the RF method (as shown in Figs. 5 and 6). This shows that different sample sets have different features associated with defaults. For the sample of 2,044 SMEs, external environmental factors such as the Engel’s coefficient are significantly related to whether a small company defaults while, for the sample of 2,157 SMEs, the age of the company’s manager and the number of years for which the company has been operating are factors that significantly affect whether a company defaults. However, for the sample of 3,111 SMEs, indicators such as a company’s business scope and whether it has had many legal proceedings in the past three years are significantly linked to whether a company defaults. We recommend that banks consider these indicators comprehensively when making lending decisions.

6 Conclusions and future work

SMEs are the driving force behind a country's economic development. In the Industry 4.0 era, the development of small enterprises is particularly important. The most crucial features of this era are intelligence, automation, and big data. Undeniably, the information age has brought many benefits to SMEs. At the same time, the development of SMEs still faces some challenges, especially in financing. During the COVID-19 pandemic, the cash flow of many SMEs has been limited, and some have been forced to close due to capital problems. An SME’s most common method of solving a cash flow shortage is to apply for a loan from a bank. However, the credit scoring system for SMEs is incomplete, and the method used for large enterprises cannot be applied directly to them. In this era of big data, knowing how to quickly and accurately identify indicators that are strongly related to SMEs’ credit risk from high-dimensional datasets is the key to easing SMEs’ financing constraints. Feature selection provides a possible way to solve this problem. By using an appropriate feature selection method, analysts can improve the accuracy of credit risk prediction models, reduce the computational complexity, and better understand the key default features of small enterprises.

This paper proposes a BOWOA-KS model for a wrapper algorithm that can quickly screen the default characteristics of SMEs. The BOWOA-KS model is used to conduct an empirical study on three different datasets in China. First, the methodology is verified using actual bank data for 2,044 SMEs. Then robustness tests are carried out using data for 2,157 SMEs and 3,111 SMEs. The results reveal that the proposed method can greatly reduce the computational complexity of traditional artificial intelligence wrapper techniques. We find that the optimal number of indicators and the optimal classification model differ among the datasets. If the total number of selected features and prediction performance of the model are considered comprehensively, the BOWOA-KS model is shown to be reliable and it can quickly and accurately identify the indicators that are significantly related to the default characteristics of SMEs.

In the age of Industry 4.0, the amount of information available is huge, and credit data has a very high dimensionality and contains a lot of noise. This makes it necessary to simplify the data structure in credit evaluation. The model for SMEs’ default feature selection provided in this paper can reduce the dimensionality of credit data effectively. When applying the model, operators should pay attention to the characteristics of their datasets, so as to select the most appropriate artificial intelligence optimization algorithm and model parameters. Meanwhile, financing constraints for SMEs in the post-COVID-19 era are severe. Constructing a novel credit evaluation model for SMEs could effectively reduce the information uncertainty of the market and aid the recovery of small enterprises.

Policy implications have been generated by this empirical study and methodology. In the context of Industry 4.0, artificial intelligence tools and big data acquisition technologies bring new opportunities for fast and accurate calculation of the credit risks of SMEs. Learning about the top features that are most likely to be exhibited by successful loans can help bankers not only identify creditworthy SMEs, but also guide them towards figuring out which data they should collect for their databases. Additionally, financial institutions can refer to the proposed method to reduce the dimensionality of their credit data, so as to reduce both the costs of data collection and those of modeling. This will also help SMEs to focus on those factors that will let them improve their creditworthiness and access to loans from financial institutions. At the same time, this work can benefit governmental agencies and policymakers by pinpointing the characteristics that can lead to the success of SMEs in the economy, helping them to develop policies to help SMEs guard against financial risks.

Despite the improvements it makes in terms of sample expansion, feature reduction, and the selection of performance evaluation indicators for classifier prediction, this research can be further expanded in the future. An interesting extension of this research would be to apply the BOWOA-KS models to other datasets and other types of credit applicants, which can help to evaluate the applicability of the methods. This study found the optimal subset of indicators based on the BOWOA artificial intelligence optimization algorithm. It would be interesting for scholars to use and compare other optimization algorithms in future research. Finally, for imbalanced datasets, the SMOTE algorithm was used to expand the data. Future research could compare the performance of different data expansion algorithms and find the best expansion imbalance ratio.