Abstract

To improve the bearings diagnosis accuracy considering multiple fault types with small samples, a new approach that combined adaptive local iterative filtering (ALIF), multiscale entropy features, and kernel sparse representation classification (KSRC) is put forward in this paper. ALIF is used to adaptively decompose the nonlinear, nonstationary vibration signals into a sum of intrinsic mode functions (IMFs). Multiple entropy features such as sample entropy, fuzzy entropy, and permutation entropy with multiscale are computed from the first three IMFs and a total of one hundred and eighty features are obtained. After normalization, the features are employed to train and test the classifier KSRC, respectively. Finally, the proposed approach is evaluated with two experimental tests. One is concerned with different types of bearing faults from the centrifugal pump; and the other is from Case Western Reserve University (CWRU) considering 12 bearing fault states. Experimental results have proved that the proposed approach is efficient for bearing fault diagnosis, and high accuracy will be obtained with high dimensional features through small samples.

1. Introduction

The rolling bearings are mostly used in rotating machinery and their working conditions are concerned with maintenance of machines and safety of workers. Since the faults in bearings are always companied with the vibration which is easy to measure, many works are focused on the fault diagnosis based on the vibration analysis. Generally, the procedure for bearing fault diagnosis is composed of four steps: (1) preprocessing based on adaptive mode decomposition for nonlinear and nonstationary vibration signals, (2) extract features that are relatively insensitive to the data length and immune to the noise, (3) dimension reduction of the feature matrix based on principal component analysis (PCA) or Laplacian scores (LS), etc., and (4) fault pattern identification with the classifier. For example, Zhao et al. [1] computed multiscale permutation entropy of subbands by wavelet packet decomposition (WPD) and employed hidden Markov model (HMM) to identify the fault pattern of the rolling bearing. Yang et al. [2] extracted the energy entropy from the intrinsic mode functions (IMFs) by empirical mode decomposition (EMD) [3] as features and employed artificial neural network (ANN) to identify the fault types. Li et al. [4] utilized local mean decomposition (LMD) [5] for preprocessing, improved multiscale fuzzy entropy as features, Laplacian scores for feature selection, and improved support vector machine based binary tree for bearing fault diagnosis. Yang et al. [6] combined variational mode decomposition (VMD) [7], local linear embedding (LLE) with support vector machine (SVM) to diagnose mechanical faults of the rotor-bearing-casing system. Good effects in bearing fault diagnosis have been realized to some extent from the above description; however, some problems still exist and need to be investigated further.

The first problem is concerned with adaptive mode decomposition [8]. WPD needs to prespecify the basis function and could not decompose the signals adaptively. The representative approach for adaptive decomposition is EMD, which could decompose a complicated signal into the sum of some certain IMFs, yet it is subjected to the problems of end effect and mode mixing. Some modifications are proposed following EMD, such as LMD and VMD. Recently, a new approach called adaptive local iterative filtering (ALIF) was proposed by Cicone in 2016 [9, 10]. It follows the structure of EMD and has advantages of the adaptive filter adjusted with the Fokker-Planck (FP) equation and an adaptive filter length. Authors in [11] have successfully applied ALIF and approximate entropy for wind turbine bearing. Consequently, ALIF will be more suitable to process the faulty vibration signals of the rolling bearings.

The next problem is how to extract efficient features for classification with high accuracy [12]. The traditional features are from time domain, frequency domain, and time-frequency domain. To deal with the nonlinear dynamic characteristics of bearing fault signals, entropy is introduced [13], such as approximate entropy (ApEn) [11, 14], sample entropy (SaEn) [15, 16], fuzzy entropy (FuEn) [17, 18], and permutation entropy (PE) [19, 20]. However, they all estimate the complexity of signals at a single scale, which may be not conducive to the extraction of signal features. To overcome this drawback, multiscale entropy (MSE) was proposed by Costa et al. to measure the complexity of signals over a range of scales [21, 22]. Based on MSE, multiscale sample entropy (MSaE), multiscale fuzzy entropy (MFE), and multiscale permutation entropy (MPE) are proposed, which have been proved to have better performance compared with SaEn, FuEn, and PE in application of diagnosis on rolling bearing fault [2325]. However, it is not ideal to use the entropy features directly for classification because of the influence of noise and interference harmonics in the vibration signals. Hence, ALIF is utilized to decompose the original signals into a sum of IMFs, which reduce the interference of noise and harmonics and highlight the effect of the fault information. In the following, the three multiscale entropy features are computed with the IMFs containing the most fault information. Considering the advantages of the three multiscale entropy features in feature extraction and characteristics of the following classifier, all of them are employed in this paper.

Before classification, feature selection like LS or dimensionality reduction like PCA and LLE should be performed. In the following, classifiers such as HMM [1], ANN [2], SVM [6], and multiclass relevance vector machine [20] are carried out for identification of the fault type. Though theories of them are well established, the inherent limitations have confined them to some extent. For example, ANN and VPMCD [26] need large training samples to obtain high classification accuracy; also, the SVM is a binary classifier which requires a classification strategy such as one against one, one versus all, and binary tree. Nevertheless, this is a two-stage combined feature reduction and classifiers. In addition, the training samples in the practical application are small, but with multiple features. Hence, a sparse representation classifier [27] is introduced to achieve the two stages at one time and realize feature selection through regularization. The classifier is firstly proposed to recognize human faces viewed in front considering cases of varying expression and illumination. Its advantage lies in the requirement of a sufficiently large number of features for high classification accuracy, but the number of samples. To modify the classification accuracy with high dimensional features, kernel approach is introduced and KSRC is proposed and applied in face recognition [28, 29]. Hence, KSRC is employed in this paper to identify bearing fault states combined with ALIF-enhanced multiple entropy features.

The organization of this paper is as follows. Theoretical backgrounds including ALIFD MMPE, and KSRC are briefly introduced in Sections 2, 3, and 4. The illustration concerning the proposed method based on the theoretical backgrounds is presented in Section 5. Experiment datasets are employed to verify the proposed method in Section 6, and in Section 7 the conclusions are finally drawn.

2. Adaptive Local Iterative Filtering

Given a nonstationary, nonlinear signal , it could be reconstructed as the sum of several IMFs and the residue:where represents the IMF, is the number of IMFs and the IMF should satisfy two conditions [3]: (1) extrema in the whole data set must have the same number with zero crossings or differ by one at most; and (2) at any point, the mean value of the upper envelope connecting all the local maxima and the lower envelope connecting all the local minima is zero. Generally, the decomposition process consists of two loops: the inner loop and the outer loop, where the former is used for IMF extraction, while the latter is used to determine the number of IMFs and the residual. In EMD algorithm, the cubic spline interpolation is employed for the upper and the lower envelope functions, which will be susceptible to singularities. Consequently, iterative filtering computes the moving average of the signal by the convolutionin lieu of the envelop functions. In (2), represents the convolution operator, constrained with is a low pass filter, and is the mask length. Afterwards by the sifting process, the first IMF is generated:where is the iterative number, , and . Since the number is impossible to achieve infinite in (3), so (4) is adopted as a stop criterion for iterations:where represents the moving average of the -th iteration of the -th IMF and ξ is a prespecified parameter. If ξ is large, rough decomposed results may be obtained. However, if ξ is too small, the computation will be expensive and noise will be introduced. Finally, 0.001 is determined to ξ after trials. In the next step, the second IMF will be obtained by repetition of the previous iterative process to the residual signal . With the same manner, all the subsequent IMFs are produced byFinally, if does not satisfy the two conditions of IMF, then treat it as the residual and stop the iteration.

The ALIF method is improved from the iterative filtering technique, which could adaptively adjust the filter with the FP equation and adaptively compute the filter length. Consequently, the above equation (2) can be rewritten aswhich is subjected towhere , , is the filter at time , and is the mask length varying with .

To show the advantage of ALIF in signal decomposition, a simulation is performed. Signals of the rolling bearing are written asin which is the periodic exponential decay signals, is the surplus function, and ; and are harmonic interferences. Then the mixed signals with its components in the time interval are shown in Figure 1.

The decomposition results of ALIF and EMD are shown in Figures 2 and 3. In Figure 2, IMF1, IMF2, and IMF3 correspond to the components , , and . Moreover, the decomposed components from IMF4 to IMF8 are residuals. However, the IMF1 by EMD corresponding to the is distorted in Figure 3, and IMF2 and IMF4 are corresponding to and because IMF3 is a false component. The absolute error is employed to compare the decomposition results of ALIF with EMD. From the comparison in Figure 4, ALIF outperforms EMD.

3. Multiscale Entropy Features

Following the preprocess of signals with ALIF, entropy features will be extracted from the IMFs for the preparation of fault diagnosis. Since a fault type with more features could be better represented, yet with the consideration of computation efficiency, hence sample entropy, fuzzy entropy, and permutation entropy are just introduced.

3.1. Sample Entropy

Considering a time series , then the dimensional vector at time can be constructed aswhere is the time delay. The distance between and is defined asSet the threshold , and the ratio of distance less than is defined aswith the meanRepeat the above steps for and obtain the mean ; then the sample entropy isSince is a finite value, (13) can be rewritten as

3.2. Fuzzy Entropy

The distance in (10) is used to measure the fuzzy similarity as follows:Define the function at :and the function at :and then the fuzzy entropy isWhen is a finite number, (18) is rewritten as

3.3. Permutation Entropy

Let ; then has a permutation if it satisfies the fact thatwhere , and when .

For each permutation , the relative frequency can be defined aswhere represents the number of belonging to the type . Then the definition of PE with dimension can be written as

3.4. Coarse Grained Process

The multiple scales are realized through the coarse-grained process for better feature extraction. Further, the length of the coarse-grained time series depends on the length of the original time series divided by the corresponding scale factor, which is illustrated in Figure 5. Hence, the coarse grained time series at a scale factor of can be constructed according toThen SaEn, FuEn, and PE of each coarse-grained time series are calculated based on (14), (19), and (22) and, respectively, plotted them as functions of the scale factor :In this paper, the prespecified parameters are set in Table 1. Especially, SD represents the standard deviation (std.) of the original signals.

4. Kernel Sparse Representation Classifier

4.1. Sparse Representation Classification

Let a matrix represent features of the th class for auxiliary training samples, namely, , where is the feature dimension, and is the number of auxiliary training samples of the ith class. The auxiliary testing samples from the same class could be approximately expressed asConsidering the overall object classes and the whole training sets of the auxiliary training samples with , then the total matrix can be formed asConsequently, the linear representation of along with all auxiliary training samples is expressed aswhere is a coefficient vector, in which the entries are zero if they do not belong to the th class.

The sparse solution to can be achieved by optimizing the following -minimization problem:When considering small noise, a noise term with is introduced to (29) and the formula can be modified asThe flexible -minimization problem for a sparse solution isWhen a new sample is for testing, it could be expressed as , where is a vector in which parts of entries associated with class are nonzero but the rest are zeros, and is a function that achieves coefficients selection related to the i-th class. Finally, the object class of the new testing sample could be identified with the residual between and :

The algorithm for SRC is summarized as follows.(1)Input: a matrix of auxiliary training samples for classes, auxiliary testing samples , and an optional error tolerance .(2)Normalize the columns of to have unit -norm.(3)Solve the -minimization problem: subject to .(4)Compute the residual , for .(5)Output: .

4.2. Kernel Sparse Representation Classification

By means of the kernel trick, SRC is extended to KSRC for nonlinearity. Suppose a nonlinear mapping : , , which realizes the transformation of auxiliary training samples from the original feature space into the kernel feature space . Similar to SRC, the -norm minimization problem of (32) can be reformulated asSince and are unknown, (34) cannot be solved directly. But, according to Theorem 1, (34) can be transformed aswhereand a kernel function is defined as . In this paper, the linear kernel is used. Solution of (35) is achieved with the software package of disciplined convex programming [35]. Finally, the new testing sample can be assigned to one prespecified class by minimizing the residual between and :

Theorem 1. For any , there must exist such that we have , as long as is satisfied.
The algorithm for KSRC is summarized as follows.(1)Input: a matrix of auxiliary training samples for classes, auxiliary testing samples , and an optional error tolerance .(2)Normalize the columns of to have unit -norm.(3)Calculate and by (36) and (37).(4)Solve the -minimization problem in (35).(5)Compute the residuals defined in (38).(6)Output: .

5. Illustration of the Proposed Method

Since the proposed method could simultaneously perform feature selection and multiclass classification, the corresponding procedure based on ALIF enhanced multiscale entropy features and KSRC is set up, and the steps are as follows.(1)Collect vibration signals of bearings with healthy and different defective types, in addition to different defect sizes for each defective type.(2)Decompose the vibration signals into a sum of IMFs with ALIF. The first three IMFs containing prominent fault information are selected to extract multiscale entropy features and they are used to construct feature vectors after normalization with where represents all features in one sample, denotes the sequence of samples, and is the sequence of features in each sample.(3)Set the number of training samples and testing samples. The training samples are randomly selected for KSRC. It is noted that the number of training samples includes the number of auxiliary training sample and the number of auxiliary testing samples.(4)After successful training, KSRC is used to test samples and identify the fault patterns with different severity levels.The illustration of the proposed approach is shown in Figure 6.

6. Experimental Verification

To validate the capability of the proposed approach, two cases concerning bearing faults are investigated. One is about the bearing in the centrifugal pump considering different fault types [16]. The other is about rolling bearings in the test rig from CWRU with different fault categories and severity levels [36].

6.1. Bearing Fault of the Centrifugal Pump

The centrifugal pump test system is shown in Figure 7, and the experimental details can be found in [16]. Five commonly occurring faults in the centrifugal pump were set, including normal, bearing roller wearing (BRW), bearing inner race wearing (BIRW), bearing outer race wearing (BORW), and centrifugal pump impeller wearing (PIW). Vibration signals at the five fault states are shown in Figure 8, and the corresponding first five IMFs by ALIF are shown in Figure 9. From the comparison, it is shown that the first three IMFs contain the most part of energy. Hence, multiscale entropy features including MSaE, MFE, and MPE over 20 scales of the first three components by ALIF are extracted according to the parameters prespecified in Table 1, and a total of 180 entropy features are obtained corresponding to one sample.

Labels of fault types are specified for classification, and description of bearing fault states can be found in Table 2. Each fault type has 50 samples, and the average of each entropy feature of fifty samples in each fault type is shown in Figure 10. In each test, 10 random samples are selected to train the SRC classifier, and the rest 40 samples are utilized for test. The accuracy formula of testing samples is defined asand the accuracy formula of training samples iswhere is the number of right classified testing samples; is the number of testing samples; is the number of right classified training samples; is the number of training samples. As listed in Table 3, mean of the testing classification accuracy by (40) with repetitions of ten times is 96.95% with std. 0.98%, and mean of the training classification accuracy by (41) is 100% with std. 0. In Table 4, accuracies of the ten tests are listed, and the maximum accuracy could reach 98%. The corresponding classification result of the proposed method at accuracy 98% is shown in Figure 11. Compared with [16], their mean accuracy varies from 94.58% to 97.08% according to ratio of the std. of the added noise in ensemble empirical mode decomposition (EEMD); moreover, the ratio of training samples in [16] is 40%, yet it is 20% in our paper. To show the advantage of high dimensional features in KSRC for accuracy improvement of bearing fault diagnosis, a comparison is performed as listed in Table 5. The sequence of the effect in accuracy from small to large, respectively, is MSaE, MFE, and MPE, and in pairs.

6.2. Artificially Seeded Damage Bearing

The bearing data are obtained from Bearing Data Centre of CWRU, and the bearing test system is shown in Figure 12. The drive end bearing 6205-2RS JEM SKF is investigated, which is seeded with single point faults using electrodischarge machining. There are four states, including norm, ball fault (BF), inner race fault (IRF), and outer race fault (ORF) (at the 6 o’clock position). Vibration signals are collected from the accelerometers placed at the drive end of the motor housing with the sampling frequency 12 kHz. The defective bearing at 0 HP is investigated in this case. The sampling time is 10s in each state, and the overall length of the collected vibration signals is divided into nonoverlapping segments. Vibration signals of normal, BF, IRF, and ORF at defect size 0.007 inch and load 0 HP are shown in Figure 13, as well as the first five IMFs by ALIF in Figure 14.

Considering the defect size, labels of fault types corresponding to12 bearing fault states are specified for classification, and description of bearing fault states can be found in Table 6. Each state has 50 samples, and MSaE, MFE, and MPE over 20 scales of the first three components by ALIF are averaged from the fifty samples as shown in Figure 15. It is shown that the distances among different types of the three entropy features are not distinct; hence a combination is considered. To prove the accuracy of the proposed approach, ten repetitive tests are performed with randomly selected samples. As listed in Table 7, mean of the testing classification accuracy by (40) with repetitions of ten times is 98.48% with std. 0.7%, and mean of the training classification accuracy by (41) is 100% with std. 0. Furthermore, in each time, 10 random samples are selected to train the SRC classifier, and the rest 40 samples are utilized for test. In Table 8, accuracies of the ten tests with mean 98.48% and std. 0.7% are listed, and the maximum accuracy could reach 99.38% at 0 HP. For better illustration, the corresponding classification result of the proposed method at accuracy 99.38% with 0 HP is shown in Figure 16.

In addition, another condition at 2 HP is considered to test the flexibility of the proposed approach with different loads. Ten random samples of load 0 HP are used to train KSRC and all samples of load 2 HP are used to test. Ten tests are performed as above and the results are listed in Table 9. The mean diagnostic accuracy at 2 HP is 89.73% with std. 2.02%, and the maximum accuracy is 92.83% with the corresponding illustration in Figure 17. The high accuracy rate of diagnosis demonstrates the usefulness of the proposed approach under different loads.

Since the multiscale entropy features in Figure 15 are not easy to be distinguished from others due to the multiple faults, a comparison is performed considering different combinations of features in Table 10. Ten tests with different random testing samples as above are carried out as well. The mean and the std. corresponding to different features are listed in Table 10. The results have verified the advantage of the combination of MSaE, MFE, and MPE.

Besides, a list of literatures using the CWRU bearing data is collected in Table 11, and they are arranged according to the classified states. Based on the comparisons in Table 11 and our work, respectively, it is shown that the entropy-based features, especially MPE, are mostly employed for bearing fault diagnosis and good results could be obtained. When considering more classified classes, combinations of multiscale entropy features will be a solution. Though the result of the proposed approach could not reach 100% like in [34] with 12 classified states as well, the proposed approach avoids the problem of feature selection and parameter optimization of SVM. Compared with the remaining researches, the proposed approach could deal with more classified states with high accuracy.

7. Conclusion

To improve accuracy of the bearing fault diagnosis considering multiple fault states with small samples, a novel bearing fault diagnosis method based on ALIF-enhanced multiscale entropy features and KSRC is proposed in this paper. Adaptive local iterative filtering could decompose the nonlinear and nonstationary vibration signals adaptively into a sum of IMFs with different scales. MSaE, MFE, and MPE values of the first three IMFs by ALIF are computed and normalized. Further, KSRC could accurately identify multiple faulty types of roller bearings with the normalized entropy features and realized features selection through regularization. Eventually, the proposed method is evaluated with experimental data concerning bearing faults in the centrifugal pump and multiple bearing faults from CWRU. The comparison shows that high dimensional features through small samples could achieve high accuracy of bearing fault diagnosis at 0 HP as well as varying working condition 2 HP. The results demonstrate that the proposed method is feasible and effective in bearing fault diagnosis.

Data Availability

The bearing data used to support the findings of this study have been, respectively, deposited in the following: 6.1 Bearing fault of the centrifugal pump https://www.researchgate.net/profile/Chen_Lu15/publications 6.2 Artificially Seeded Damage Bearing http://csegroups.case.edu/bearingdatacenter/pages/apparatus-procedures. In addition, the references concerning the data have been clearly cited in the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 51505100). The authors would like to thank the editors and the anonymous reviewers for their valuable suggestions which have greatly improved the paper.