A semi-supervised fault diagnosis method for axial piston pump bearings based on DCGAN

You He; Hesheng Tang; Yan Ren; Anil Kumar

doi:10.1088/1361-6501/ac1fbe

1. Introduction

The axial piston pump plays an extremely critical role in industrial applications [1–3]. The bearing is a key component in the axial piston pump, and its health condition directly affects the stable operation of the axial piston pump [4]. Currently, most fault diagnosis methods are based on vibration signals, which is an important basis for equipment condition recognition [5]. However, background noise and periodic pulses of piston affect the accuracy of these methods. In addition, traditional intelligent fault diagnosis methods will not work well without enough labeled data. Therefore, a robust fault diagnosis method is necessary for the fault diagnosis of axial piston pump [6].

With the rapid development of machine learning, the disadvantages of traditional fault diagnosis methods are increasingly prominent, such as long time-consuming, weak generalization ability, and serious interference from human factors [7]. In order to solve these problems in traditional fault diagnosis methods based on machine learning, researchers are committed to remove human factors from fault feature extraction and selection tasks, and then have improved the ability of machine learning to automatically extract fault features [8]. Jing et al [9] employed deep convolutional neural networks (CNNs) to extract combined features of the gearbox vibration signal. Guo et al [10] optimized the network by adding a multi-scale module to the original CNN structure. This module could not only extract feature information of multiple scales, but also reduced the interference of contaminated data. Zhang et al [11] improved the training efficiency of the residual learning network and makes the network more suitable for processing mechanical vibration signals. Guo et al [12] improved a deep CNN with hierarchical adaptive learning rate to deal with the problem of maximizing classification accuracy. Shao et al [13] combined long short-term memory (LSTM) with CNN and proposed a multi-channel LSTM-CNN fault diagnosis method. This network structure had the ability to extract multiple hidden layer features of the data.

Generative adversarial network (GAN) was proposed by Goodfellow et al in 2014, allowing the creation of a new dataset based on a small amount of available data [14]. These generated data were not only close to the original data, but also combined the characteristics of the original data. This network provided new ideas for its application in classification problems [15]. Feng et al [16] proposed a multiclass spatial-spectral GAN (MSGAN) based on GAN and employed it for hyperspectral image classification. Compared with other advanced methods, satisfactory classification performance was obtained with limited training samples. Xuan et al [17] proposed a multiview-GAN for pearl image classification. Liu et al [18] got inspiration from GAN and proposed Categorical Adversarial Autoencoder (CatAAE) for unsupervised fault diagnosis of bearings. This method achieved the purpose of accurate classification without labeling samples in advance. Gao et al [19] combined dynamic simulation with GAN for bearing fault diagnosis. GAN was used to augment the obtained data and achieved satisfactory results. Fu et al [20] combined GAN with stacked denoising auto-encoder (SDAE), and proposed a GAN-SDAE algorithm to improve the accuracy of fault diagnosis. Note that the feature extraction in the above GAN applications is achieved through independent external algorithms, such as calculating multiple signal indicators.

As a special implementation of GAN, deep convolutional generative adversarial network (DCGAN) integrated the automatic feature extraction function into GAN. This model completed image generation and feature extraction without other algorithms. DCGAN was more prominent in image generation ability, and it was widely used to perfect unbalanced dataset. Cai et al [21] used DCGAN to balance the highway dataset, which assisted real-time crash prediction to facilitate traffic management. Yang et al [22] combined DCGAN and simple recurrent unit (SRU) to classify unbalanced network data,which detected network intrusions. In [23, 24], DCGAN was applied to perfect the image dataset, which solved the instability of the medical image classification model caused by the imbalance of the dataset. In terms of image processing, Heo et al [25] combined DCGAN with U-net, and added pixel loss and feature loss to realize automatic sketch coloring. However, there were only a few applications of DCGAN in fault diagnosis currently. Li et al [26] combined the filter model with the DCGAN model to identify various gear faults at one time. Liang et al [27] proposed a multi-time scale deep convolutional generative adversarial network (MTS-DCGAN) for anomaly detection on the industrial time scale, which was more robust.

In most of these studies, the DCGAN was employed to expand the dataset. In this paper, DCGAN is used for feature extraction and samples augmentation. Experimental result shows that for a small amount of labeled data, the proposed method can get high scores of clustering indicators. In summary, the main contributions are as follows:

(a)
A two-stage semi-supervised learning method is proposed for bearing fault diagnosis of axial piston pump. In the proposed method, both labeled data and unlabeled data are utilized to train the model, and semi-supervised learning corrects and optimizes the results of the unsupervised learning stage.
(b)
DCGAN is also used to expand the original data to avoid overfitting the model due to insufficient training data. In addition, three clustering indicators are introduced to make a persuasive evaluation of the fault diagnosis results.
(c)
This method is verified by measured vibration signal data of the axial piston pump and the Case Western Reserve University dataset. Compared with advanced unsupervised methods, this method proposes different fault diagnosis strategies based on the number of labeled data, which makes fault diagnosis more flexible and effective.

The rest of the article is organized as follows. In section 2, the basic theory of DCGAN, data preprocessing and result evaluation method are briefly introduced. Section 3 describes the semi-supervised fault diagnosis method proposed in this paper in detail. Section 4 states the acquisition process of the data used in the experiment and verifies the effectiveness of the proposed method on two datasets. Finally, section 5 concludes the paper.

2. Preliminary

This section mainly introduces the model structure and algorithm involved in the method proposed. The original GAN, DCGAN, continuous wavelet transform (CWT), and clustering algorithms are all introduced in this section, which provides the theoretical basis for the proposed method.

2.1. GAN

GAN can be regarded as a coding network, which includes two independent neural networks, namely the discriminator and the generator. On the one hand, the pre-define probability distribution $\boldsymbol{P}\left( \boldsymbol{x} \right)$ is learned by the generator network to generate a new probability distribution $\boldsymbol{P_G}\left( \boldsymbol{x} \right)$ . On the other hand, the discriminator distinguishes $\boldsymbol{P}\left( \boldsymbol{x} \right)$ and $\boldsymbol{P_G}\left( \boldsymbol{x} \right)$ . When the discriminator cannot discriminate $\boldsymbol{P}\left( \boldsymbol{x} \right)$ and $\boldsymbol{P_G}\left( \boldsymbol{x} \right)$ , the generator has learned probability distribution $\boldsymbol{P}\left( \boldsymbol{x} \right)$ [28]. From the perspective of the fault diagnosis, the generator can learn the potential features of the original signals, and generate similar signals. The discriminator tries to find the difference between the generated signal and the original signal, and determine whether it is true or false. In this process, the discriminator also learns to distinguish the differences between different types of signals. As shown in figure 1, the general process of GAN is that the generator takes the noise distribution Z as input to generate fake signal I', and the discriminator judges whether I' is real or false. The generator is finally optimized to generate signals which are real enough. The generated signal can even fool the discriminator. The discriminator is finally optimized to have better ability to distinguish signals. The process of adversarial training is described mathematically as a minimax game, and eventually the two networks reach Nash equilibrium as the follow function:

$\begin{align}\mathop {{\text{min}}}\limits_{\text{G}} \mathop {{\text{max}}}\limits_D V(G,D) & = {E_{x \sim {P_{{\text{data}}(x)}}}}\left[ {\log (D(x))} \right] \nonumber\\ & \quad + {E_{x \sim {P_{{\text{moise}}(z)}}}}\left[ {\log (1 - D(G(z)))} \right]\end{align} \tag{ 1 }$

**Figure 1.** GAN overview.
Download figure:
Standard image High-resolution image

where ${P_{{\text{data}}}}\left( x \right)$ means the distribution of real data, ${P_{{\text{noise}}}}\left( z \right)$ means the distribution of noise.

According to equation (1), the loss functions of the generator and the discriminator is written as:

$\begin{equation}{L_{\text{G}}}({\text{Z}};{\theta _{\text{G}}}) = {E_{Z \sim {P_Z}(Z)}}[\log (1 - D(G(Z)))]\end{equation} \tag{ 2 }$

$\begin{align}{L_{\text{D}}}({\text{i}};{\theta _{\text{D}}}) = {E_{x \sim {P_{{\text{data}}}}(x)}}[\log D(x)] + {E_{Z \sim P(Z)}}[\log (1 - D(G(Z)))]\end{align} \tag{ 3 }$

where ${L_{\text{D}}}$ represents the loss function of the discriminator, ${L_{\text{G}}}$ represents the loss function of the generator, $Z$ and $i$ are the inputs of generator and discriminator, respectively. ${\theta _{\text{G}}}$ and ${\theta _{\text{D}}}$ are the parameters of generator and discriminator, respectively.

2.2. DCGAN

DCGAN is a variant of GAN. In order to utilize the image processing capabilities of CNNs, the structures in the discriminator and generator are replaced with it [29]. Figure 2 shows the structure of DCGAN. Some adjustments have been made to guarantee steady training of the discriminator and generator in the GAN as follows:

(a)
Pooling layers are replaced by stridden convolutions for discriminator and transposed convolution for generator.
(b)
The fully connected structure of the hidden layer in the generator and the discriminator is eliminated, and the fully connected layer of the output part in the discriminator is only kept.
(c)
Batch normalization (BN) is applied to all hidden layers except the input and output layers of the discriminator and generator, so that the mean and variance of the output of each hidden layer are 0 and 1, respectively.
(d)
The rectified linear unit (ReLU) activation function is used on the input and hidden layers, and the Tanh activation function is used for the output of the generator to speed up training.
(e)
The LeakyReLU activation function is utilized on each layer of the discriminator.

**Figure 2.** The structure of DCGAN.
Download figure:
Standard image High-resolution image

Although DCGAN has made some improvements to GAN, there are still some problems. For example, the risk of model collapse increases with the long training. Therefore, it is necessary to check whether the model is normal after training.

Compared with GAN, the stability of DCGAN was improved, but its operating principle was not changed. The discriminator in the trained DCGAN can learn the inherent features of the samples, and these features can reflect the essence of the image sample. Therefore, DCGAN is selected for unsupervised feature extraction of samples.

2.3. CWT

Compared with one-dimensional time-domain or frequency-domain analysis, two-dimensional processing methods have more advantages in nonlinear signal analysis. There have been many time-frequency analysis methods used to process time series signals, and CWT is one of them. The reason why CWT is employed for time-frequency transformation in this study is: CWT performs analysis in variational time-frequency window [30]. The expression of CWT is written as:

$\begin{equation}W\left( {a,\tau } \right) = \frac{1}{{\sqrt a }}\int_{ - \infty }^\infty {x\left( t \right){\psi ^*}\left( {\frac{{t - \tau }}{a}} \right)} dt\end{equation} \tag{ 4 }$

where $a$ is a scale factor, ${\text{ }}\tau$ * is shifting factor. $\frac{1}{{\sqrt {\text{a}} }}$ is used to ensure energy conservation.

There are many wavelet choices in complex functions and simple functions, such as Morlet, Gaussian, Laplace, Shannon, etc. Because the Morlet wavelet shape is similar to the impact signal of many mechanical faults, it is widely used in the detection of mechanical system fault signals. The modified complex Morlet wavelet basis function is defined in the wavelet toolbox of MATLAB [31], as follow:

$\begin{equation}{\psi _{{\text{cmor}}}}\left( t \right) = \frac{1}{{\sqrt {\pi {f_{\text{b}}}} }}{e^{i2\pi {f_{\text{c}}}t}}{e^{ - \frac{{{t^2}}}{{{f_{\text{b}}}}}}}\end{equation} \tag{ 5 }$

where f_b and f_c are the bandwidth and central frequency parameters of the modified complex Morlet mother wavelet ψ_cmor(t), respectively.

2.4. Clustering

The clustering algorithm mainly selects the center point through some rules, calculates the number of clustered categories, and divides all the received data into the categories. The choice of clustering algorithm is related to whether the evaluation result is reliable, so the k-means++ algorithm is selected as an advanced clustering method to analyze all feature vectors [32]. It is improved on the basis of the original k-means algorithm. Compared with the k-means algorithm, the selection rules of cluster centers are mainly optimized in the k-means++ algorithm. The specific algorithm steps are shown in table 1.

Table 1. K-means++ algorithm.

Step1: A point is randomly selected from the data points U as the initial clustering center c₁;

Step2: First, the shortest distance between each point (s∈U) and the current existing cluster center is calculated (that is, the distance between each sample and its nearest cluster center), which is expressed as s(x); After that, the probability of each point being selected as the next cluster center, $\frac{{s{{\left( x \right)}^2}}}{{\sum\nolimits_{s \in U} {s{{\left( x \right)}^2}} }}$ $\frac{{s{{\left( x \right)}^2}}}{{\sum\nolimits_{s \in U} {s{{\left( x \right)}^2}} }}$ is calculated. Finally, next cluster center is selected through the above process;

Step3: The Step2 of the operation is repeated until M centers are selected;

Step4: For each point $s$

$s$ in the dataset, its distance to the selected M cluster centers is calculated. They are divided into the category of the cluster center with the shortest distance from them.

Step5: For each category c_i , its cluster center ${c_i} = \frac{1}{{\left| {{c_i}} \right|}}\sum\nolimits_{s \in {c_i}} s$ ${c_i} = \frac{1}{{\left| {{c_i}} \right|}}\sum\nolimits_{s \in {c_i}} s$ is recalculated.

Step6: Step 4 and Step 5 are repeated until the cluster center position no longer changes.

3. Proposed method

3.1. Samples segmentation and standardization

In order to meet the sample requirements of subsequent training, the vibration signal obtained from the sensors needs to be divided into multiple signals according to some rules. The principle is that each signal contains at least one rotation period of vibration information, and the number of points of each signal is at least:

$\begin{equation}Q \geqslant k\frac{{60}}{V}f\end{equation} \tag{ 6 }$

where $Q$ means the number of points contained in each segment of the slice signal; $k$ is a positive integer, indicating that the signal is selected to rotate $k$ circles; $V$ means the pump speed in r min⁻¹ ; $f$ means the sampling frequency in Hz.

The vibration signal of the axial piston pump has obvious pulses. For deep learning networks, the pulse may be used as basis for distinguishing different features. Therefore, each sample must be standardized to limit the fluctuation of the amplitude, which improves the generalization ability of the model. Z-score standardization is the most commonly standardization method. The samples processed by this method conform to the standard normal distribution, and the conversion function is written as:

$\begin{equation}{x_*} = \frac{{\left[ {{{x^{\prime}}_*} - \mu \left( {{{x^{\prime}}_*}} \right)} \right]}}{{\sigma \left( {{{x^{\prime}}_*}} \right)}}\end{equation} \tag{ 7 }$

where ${x_*}$ represents each sample after normalization, $\mu \left( {x_*^{\text{'}}} \right)$ and $\sigma \left( {x_*^{\text{'}}} \right)$ represent the mean value and standard deviation of sample $x_*^{\text{'}}$ , respectively.

3.2. Samples augmentation

The lack of training samples leads to overfitting and low generalization ability for the trained model, so samples need to be expanded by some necessary measures. On the one hand, DCGAN can generate images with the same features as real images after training. On the other hand, DCGAN has been widely applied in other image processing tasks, and the satisfactory results was obtained. Therefore, DCGAN is selected for samples augmentation in this paper. The training method of DCGAN used for samples augmentation is the same as the described in section 2. When the generator and the discriminator reach the Nash equilibrium, the discriminator cannot accurately distinguish between the original samples and the generated samples, and the generated samples and the original samples constitute the complete samples.

3.3. Feature vector acquisition

Similar to other coding networks, DCGAN also has the ability to encode data. In the training process of DCGAN, the discriminator can discriminate whether the generated data of different categories is true, and the discriminator has learned the features of data from different categories. In this paper, a global average pooling (GAP) layer is followed the last convolutional layer of the discriminator to replace the original flat layer, which can extract the feature of each filter. The specific process is shown in table 2.

Table 2. Feature vector extraction process of ${\text{ }}d$ images.

For

$d$ images:

Step1: Feed forward through the discriminator

Step2: The output of the last convolutional layer passes through the global average pooling layer, and finally outputs m 1 × 1 filters

Step3: Concatenate the output m 1 × 1 filters into a 1 × m vector

Step4: Normalize this vector with L₂ Norm: $\left| x \right| = \sqrt {{\sum_{m = 1}^n} {{\left| {{x_m}} \right|}^2} }$ $\left| x \right| = \sqrt {{\sum_{m = 1}^n} {{\left| {{x_m}} \right|}^2} }$ . After that, the vector is labeled as $F{\text{ }}$ $F{\text{ }}$ vector, which is the final feature vector.

3.4. Semi-supervised learning

With the increase of fault information, the number of labeled data increases. These labeled data is considered for correcting and optimizing the results of unsupervised fault diagnosis. When part of the labeled data is acquired, unsupervised learning is replaced by semi-supervised learning to obtain better feature extraction capability. The semi-supervised GAN structure is shown in the figure 3 [33]. Compared with the fully unsupervised DCGAN, the final output of the semi-supervised GAN (SGAN) discriminator is no longer one value but K + 1. The first K outputs correspond to the probability that the real data belongs to the Kth category. The last output represents that the input data is identified as generated data by the discriminator. For the loss function of the first K outputs of the discriminator (that is, the supervised output), the cross-entropy loss is added, and the unsupervised loss function is the same as the original GAN. The specific loss function is as follows:

$\begin{align} L & = {L_{\text{D}}} + {L_{\text{G}}} \nonumber\\ {L_{\text{G}}} & = - {E_{x,y \sim {P_{{\text{data}}}}(x)}}\log {P_{\bmod el}}(y|x,y < K + 1) \nonumber\\ {L_{\text{D}}} & = - {E_{x \sim {P_{{\text{data}}}}(x)}}\log [1 - {P_{\bmod el}}(y = K + 1|x)] \nonumber\\ & \quad - {E_{x \sim {\text{G}}}}\log [{P_{\bmod el}}(y = K + 1|x)] \end{align} \tag{ 8 }$

**Figure 3.** The structure of semi-supervised GAN.
Download figure:
Standard image High-resolution image

where $L$ represents the total loss function, and $G$ represents the generated data distribution.

After fully unsupervised learning, the separation between each different fault category is obvious. If there is no obvious separation, adding the labeled data is a useful method to separate different fault categories. However, the effect of semi-supervised learning is not necessarily better than that of unsupervised learning, which depends on the amount of label data. If the amount of label data is small, semi-supervised learning may be inferior to unsupervised learning. The reason for this result is that the amount of labeled data is so small that the supervised training part is not stable. Eventually, with more labeled data, the model can segment different categories of fault data with quality, and then semi-supervised learning can stop.

3.5. Overview of the proposed method

The proposed method is shown in the figure 4. The specific process of proposed method is depicted as follows:

(a)
Data preparation. After obtaining the vibration signals in the case of various bearing faults from the sensors, some processing is needed to make them suitable for model training. The original data is processed in three steps, namely standardization, slice and CWT, and finally their corresponding image data is obtained.
(b)
Data augmentation. Since the amount of processed image data cannot meet the requirements of model training, data expansion is required. DCGAN is chosen to augment the data. After DCGAN is trained, it must be eliminated from model collapse and chessboard artifacts. Then, the model can be used to generate different types of image data, and the generated image data and the original image data are combined to compose complete data.
(c)
Network training. The complete image data is divided into training dataset and test dataset according to the ratio of 4:1, and then the training data is fed into the network to finish the training. After the model is completed, the test data is used to test the generalization ability of the model.
(d)
Feature extraction. Each sample in the complete image data is fed into the trained model, and the output of the last convolutional activation layer of the discriminator is extracted as the feature vector of each sample.
(e)
Clustering. The feature vectors extracted by the network corresponding to all samples are processed by L₂ norm and fed into the clustering algorithm. These feature vectors are classified according to the method described in section 2.
(f)
Principal component analysis (PCA) visual evaluation. In order to reflect the final result more intuitively, PCA is taken to reduce the dimension of the result and show it in a two-dimensional graph.
(g)
Semi-supervised learning optimization. After obtaining some labeled data, they are added to the semi-supervised learning model to modify and optimize the classification results of unsupervised learning, which makes the fault diagnosis more accurate.

4. Fault diagnosis of axial piston pump

4.1. Data description

All processing of the acquired data is performed on a PC with a Nvidia GPU GeForce GTX 1050Ti, CPU Core i7-9700 3 GHz, 32 GB RAM, Tensorflow 2.0, MATLAB 2019 and Cuda 10.1.

4.1.1. Test bench dataset.

A test bench for bearing fault of axial piston pump is built to examine the feasibility of the proposed method. The three categories of bearing faults are shown in figure 5. These faults are inner race fault, outer race fault and ball fault. The partial fault of the bearing is processed by wire cutting machines. The axial groove is 0.5 mm wide and 0.5 mm deep. The vibration signal is measured from acceleration sensors installed in each part of the axial piston pump. The installation positions of all sensors and the vibration signal acquisition device are shown in figure 6. According to the Nyquist sampling theory and the frequency selection of the signal acquisition system,the sampling frequency of 48 kHz is adopted. The machine run at 1000 rpm, the shaft rotation frequency is 16.7 Hz, and the piston frequency is 150.3 Hz. The data is collected when the output pressure is 0MPa and 1MPa.

**Figure 5.** The three localized faults of bearing in the axial piston pump.
Download figure:
Standard image High-resolution image

**Figure 6.** Test bench.
Download figure:
Standard image High-resolution image

Various categories of vibration signal are shown in figure 7. The vibration signal of normal condition is shown in figure 7(a). There is obvious periodicity in the amplitude under normal conditions as show in figure 7(a). This is caused by the reciprocating motion of the piston in the pump cylinder when the axial piston pump is operating. Figures 7(b)–(d) are respectively the vibration signal of inner race fault, the vibration signal of outer race fault, and the vibration signal of ball fault . The number of points contained in each sample affects the results of subsequent operations. On the one hand, it affects the size of image after CWT and bilinear interpolation. In the case of constant scaling in bilinear interpolation, more points mean larger images, which directly imposes a burden on subsequent model calculations. On the other hand, too many points make each sample contain much redundant information, increasing useless calculation time. Therefore, in the case of satisfying formula (6), comparing the results of multiple experiments, each sample is determined to contain 4800 points. The segmented signal is transformed by CWT to obtain some gray scale images. To reduce the burden on the computer, bilinear interpolation is used for all image data scaling. After many attempts, the image data is scaled to 96 × 96 is the most suitable.

When these operations are completed, the image data is augmented. When the output pressure is 0MPa, the number of image samples is expanded to 2497, of which the number of normal samples, inner race fault samples, outer race fault samples, and ball fault samples are 609, 651, 643, and 594, respectively. Table 3 shows the 96 × 96 pixel image samples, which are obtained by slicing and CWT of the original vibration signal. Table 4 shows the image samples generated by DCGAN.

Table 3. 96 × 96 pixels image samples processed by CWT.

Baseline	outer race	inner race	ball

Table 4. Output image samples of trained generator in DCGAN.

Baseline	outer race	inner race	ball

When the output pressure is 0MPa, the number of image samples is expanded to 3084, of which the number of normal samples, inner race fault samples, outer race fault samples, and ball fault samples are 723, 785, 798, and 778, respectively. Table 5 shows the 96 × 96 pixel original image samples, and table 6 shows the image samples generated by DCGAN.

Table 5. 96 × 96 pixels image samples of 1MPa output pressure.

Baseline	outer race	inner race	ball

Table 6. Generated samples of 1MPa output pressure.

Baseline	outer race	inner race	ball

Cosine similarity is introduced to prove the similarity between the generated sample and the original sample. Cosine similarity represents the image as a vector, and the similarity between the images is expressed by calculating the cosine distance between the vectors. The average cosine similarity between the generated image and the original image is 0.86 when the output pressure is 0MPa, and the average cosine similarity is 0.89 when the output pressure is 1MPa. The cosine similarity scores of the two conditions are above 0.8, indicating that the generated images are very similar to the original images and can meet the requirements of sample augmentation.

4.1.2. CWRU dataset.

In addition to the data obtained from the test bench, the bearing dataset published by Case Western Reserve University is also utilized to verify the effect of this method [34]. This dataset includes two different types of bearing data, one is the fan-end bearing signal, and the other is the drive-end bearing signal. The bearing vibration signal of the axial piston pump contains natural periodic pulses, fault periodic pulses and background noise. In order to approximately simulate this vibration signal, the vibration signal at the fan-end of Case Western Reserve University is selected. Since the fan blade is close to the fan-end bearing, its working vibration is the main source of the natural periodic pulse. The air vibration caused by the fan blade operation becomes the main source of background noise. The abnormal vibration caused by the fault of the bearing itself is the source of the fault periodic pulse. Therefore, the expression of the fan-end signal is written as:

$\begin{equation}R\left( t \right) = N\left( t \right) + F\left( t \right) + B\left( t \right)\end{equation} \tag{ 9 }$

where R(t) represents the benchmark signal, $N$ (t) is the natural periodic pulses, $F$ (t) represents the fault periodic pulses, $B$ (t) is the background noise. The sampling frequency of the selected data is 12 kHz, the number of sampling points is 120 000, and the other information of the data is shown in the table 7.

Table 7. CWRU bearing data information.

Motor speed (r min⁻¹)	Motor load (horsepower)	Relative position of outer race fault
1797	0	6 o'clock direction

Figure 8 shows the normal raw signal, inner race fault raw signal, outer race fault raw signal and ball fault raw signal in the adopted CWRU dataset. In general, the signal contains a lot of noise, and it is difficult for distinguishing the different faults represented by the four signals from the time-domain signal.

For the CWRU dataset, the data of each health condition is sliced in units of 2400 points. The segmented data is also transformed into 96 × 96 grayscale image samples after CWT and bilinear interpolation. These grayscale images are shown in table 8.

Table 8. 96 × 96 pixels CWRU images.

Baseline	outer race	inner race	ball

These image samples of CWRU dataset are expanded to 2677, of which the number of normal samples, inner race fault samples, outer race fault samples, and ball fault samples are 656, 656, 683, and 682, respectively. The image samples generated by DCGAN are shown in table 9.

Table 9. The generated image samples for CWRU dataset.

Baseline	outer race	inner race	ball

The average cosine similarity between the generated sample and the original sample of the CWRU is 0.91, which also meets the requirement of sample expansion.

4.2. Setting of the DCGAN

For the DCGAN built in this paper, it contains two CNN structures, one is a generator and the other is a discriminator. With reference to the structure of DCGAN in the image processing task, this paper constructs the DCGAN in this fault diagnosis task. After many experiments, the network structure and its hyperparameters are determined. The network of the discriminator is shown in figure 9. The discriminator is consisted of four two-dimensional convolutional layers (Conv2D), a GAP and a fully connected layer (FC). Each convolutional layer is followed by a BN layer and a LeakyReLU activation layer with α = 0.2. A Dropout layer is followed these layers, which is used to prevent the networks from overfitting, and the random discard rate is set to 0.3. Note that the final flat layer of the original DCGAN discriminator is replaced with a GAP layer in this paper. Each value in the output of this GAP layer is regarded as a feature of each corresponding filter, and finally the true and false probability of the received image is output. Figure 10 shows the network of generator. The generator is consisted of four deconvolution layers (DC2D) and a fully connected layer (FC). The BN layer and the ReLU activation layer follow each deconvolution layer. Note that the fully connected layer of the generator is applied to project and reshape the noisy data. The image specification output by the generator is equivalent to the specification of image data input DCGAN.

**Figure 9.** Structure of discriminator.
Download figure:
Standard image High-resolution image

**Figure 10.** Structure of generator.
Download figure:
Standard image High-resolution image

The specific structure and parameters of the discriminator and generator are shown in tables 10 and 11, respectively. The input of discriminator and the output of the generator have the same size as the image sample, which is 96 × 96 pixel grayscale image. The size of the convolution kernel in the discriminator and generator are 5 × 5 and 2 × 2, respectively. Adam optimizer (learning rate = 0.002, ${\beta _1}$ = 0.5, ${\beta _2}$ = 0.9) is selected to train the discriminator and generator.

Table 10. Structure of discriminator.

Layer type	Activation function	Kernel size	Strides	Output size
Input	—	—	—	(96,96,1)
Conv2D_1	LeakyReLU(0.2)	5 × 5 × 64	(2,2)	(48,48,64)
Conv2D_2	LeakyReLU(0.2)	5 × 5 × 128	(2,2)	(24,24 128)
Conv2D_3	LeakyReLU(0.2)	5 × 5 × 256	(2,2)	(12,12 256)
Conv2D_4	LeakyReLU(0.2)	5 × 5 × 512	(2,2)	(6,6,512)
GAP	—	—	—	(512)
FC	—	—	—	(1)

Table 11. Structure of generator.

Layer type	Activation function	Kernel size	Strides	Output size
Input	—	—	—	(100,1)
FC	—	—	—	(6 × 6 × 512)
Reshape	—	—	—	(6,6,512)
DC2D_1	ReLU	2 × 2 × 256	(2,2)	(12,12 256)
DC2D_2	ReLU	2 × 2 × 128	(2,2)	(24,24 128)
DC2D_3	ReLU	2 × 2 × 64	(2,2)	(48,48,64)
DC2D_4	ReLU	2 × 2 × 1	(2,2)	(96,96,1)

4.3. Fault diagnosis results

4.3.1. Evaluation index.

In order to visually evaluate the effect of each method, three evaluation indexes are introduced in this study, such as adjusted rand index (ARI), normalized mutual information (NMI) and purity. The first two are commonly used evaluation indexes for clustering; the purity is a relatively new evaluation indicator. These indexes are applied to measure the similarity between the clustering results and the true distribution. Suppose there is a dataset $X$ with N samples and two partitions of these samples, namely $\mathbb{C}$ = { ${c_1},{\text{ }}{c_2}, \ldots ,{c_k}$ } (the cluster result) and $\mathbb{P}$ = { ${p_1},{p_2}, \ldots ,{p_k}$ } (the true partition). Let ${n_{ij}} = \left| {{c_i}\mathop \cap \nolimits^ {p_j}} \right|$ be the number of common nodes of groups ${c_i}$ and ${p_j}$ , ${b_i} = \mathop \sum \limits_{j = 1}^N {n_{ij}}$ and ${d_j} = \mathop \sum \limits_{I = 1}^N {n_{ij}}$ , The ARI is defined as:

$\begin{align}{\text{ARI}} = \frac{{\sum\nolimits_{ij} {\left( {\begin{array}{*{20}{c}} {{n_{ij}}} \\ 2 \end{array}} \right)} - \left[ {\sum\nolimits_i {\left( {\begin{array}{*{20}{c}} {{b_i}} \\ 2 \end{array}} \right)} \sum\nolimits_j {\left( {\begin{array}{*{20}{c}} {{d_j}} \\ 2 \end{array}} \right)} } \right]\Bigg/\left( {\begin{array}{*{20}{c}} N \\ 2 \end{array}} \right)}}{{\dfrac{1}{2}\left[ {\sum\nolimits_i {\left( {\begin{array}{*{20}{c}} {{b_i}} \\ 2 \end{array}} \right)} + \sum\nolimits_j {\left( {\begin{array}{*{20}{c}} {{d_j}} \\ 2 \end{array}} \right)} } \right] - \left[ {\sum\nolimits_i {\left( {\begin{array}{*{20}{c}} {{b_i}} \\ 2 \end{array}} \right)} \sum\nolimits_j {\left( {\begin{array}{*{20}{c}} {{d_j}} \\ 2 \end{array}} \right)} } \right]\Bigg/\left( {\begin{array}{*{20}{c}} N \\ 2 \end{array}} \right)}}.\end{align} \tag{ 10 }$

The maximum mutual information value is defined as:

$\begin{equation}{\text{NMI}} = \frac{{2\sum\nolimits_i {\sum\nolimits_j {{n_{ij}}} } \log \left( {\frac{{{n_{ij}}N}}{{{b_i}{d_j}}}} \right)}}{{ - \sum\nolimits_i {{b_i}} \log \frac{{{b_i}}}{N} - \sum\nolimits_j {{d_j}} \log \frac{{{d_j}}}{N}}}.\end{equation} \tag{ 11 }$

Simply summarized, it represents the ratio of the majority of categories in an ethnic group to the total number of categories. The specific formula is as follows:

$\begin{equation}{\text{Purtiy}}\left( {{w_i}} \right) = \frac{1}{{{n_i}}}\mathop {\max \left( {{n_{ij}}} \right)}\limits_n \begin{array}{*{20}{c}} {}&{j \in C} \end{array}\end{equation} \tag{ 12 }$

where $w$ means clusters, $n$ means members, and $C$ means the number of classes.

The above three indicators measure the results of clustering from different aspects and indirectly indicate the effects of different methods. The closer these three indicators are to 1, the better the clustering effect. In short, ARI is regarded as the accuracy of clustering and provides the proportion of proper classification. For purity and NMI, the number of clustering centers has an impact on them, so a single index cannot accurately evaluate the clustering effect. This is the reason why this article uses three indexes to evaluate the results.

4.3.2. The unsupervised learning results for test bench dataset.

For the proposed method, 500 iterations of training are carried out to optimize the network. The model is qualitatively assessed through the two-component PCA. The clustering result of the output pressure of 0MPa is shown in figure 11. The clustering takes the DCGAN F vector as input. The qualitative assessment is necessary for the method, because the first stage for the data separation is to distinguish healthy samples form unhealthy samples for unsupervised fault diagnosis. From figure 11, for the best trained model, the normal samples are obviously separated from other fault samples. Comparing the real distribution of all samples, some samples of each category are clustered incorrectly. Even so, the results of the proposed method are satisfactory.

**Figure 11.** K-means++ PCA(0MPa), DCGAN F vector output. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

When the training is complete, the DCGAN F vector with 96 × 96 output images and with k-means++ clustering reaches the best performance. As shown in table 12, ARI, purity, and NMI are 0.886, 0.955, and 0.889, respectively. (The following descriptions of the three indicators in this article follow this order.) These results show that the DCGAN-based fault diagnosis method creates nearly pure clusters, and the number of cluster centers is also quite appropriate.

Table 12. The unsupervised fault diagnosis result of axial piston pump bearing (0MPa).

Model	ARI	Purity	NMI
DCGAN	0.886	0.955	0.889

Similarly, the clustering results at the output pressure of 1MPa are shown in figure 12. In the figure, only a small number of samples are misclassified. Overall, the classification result is more satisfactory than the result of the output pressure of 0MPa. As shown in table 13, the three evaluation indicators are 0.980,0.993,and 0.969, respectively. The result of the indicators verifies the analysis under the two pressure conditions. The reason why the classification result is better when the outlet pressure is 1MPa is that when the load exists, the fault characteristics generated by the fault location are more obvious.

Table 13. The unsupervised fault diagnosis result of axial piston pump bearing (1MPa).

Model	ARI	Purity	NMI
DCGAN	0.980	0.993	0.969

4.3.3. The semi-supervised learning results for test bench dataset.

The unsupervised results of the test bench dataset when the output pressure is 1MPa are relatively high, but they still seem to get a profit from adding labeled data to the training. In response to this, increasing the percentage of labeled image samples in GAN training is considered to improve the fault diagnosis results. When labeled data is added, the model enters the semi-supervised learning stage.

For semi-supervised fault diagnosis, the three indicators are also employed to evaluate the model. As the percentage of labeled data increases, the evaluation results are shown in table 14. Before the labeled data reaches 15%, as the amount of labeled data increases, the results get better and better, but they are not better than the results of unsupervised fault diagnosis. When the amount of labeled samples achieves 15%, the semi-supervised model is more effective. The point that the semi-supervised effect exceeded the unsupervised effect for the axial piston pump bearing data is between 8% and 15%. When 30% of the labeled data are obtained, ARI and NMI are greatly improved to 0.985 and 0.972, respectively, whereas they are only 0.886 and 0.889 in the case of unsupervised learning. Therefore, more labeled data make the fault diagnosis result more perfect. This is not difficult to understand, the more labeled data the model is closer to fully supervised learning, and the classification accuracy also improves.

Table 14. Test bench data, SGAN k-means++ clustering results.

Labeled data (%)	ARI	Purity	NMI
0	0.886	0.955	0.889
2	0.762	0.797	0.724
8	0.864	0.892	0.859
15	0.932	0.973	0.936
30	0.985	0.988	0.972

4.3.4. The result for CWRU dataset.

The same model is applied to CWRU dataset. After 500 iterations of training, the PCA results for the k-means++ clustering of the final model are shown in figure 13. Obviously, the cluster boundaries of the four health conditions are extremely clear, and only a small number of samples are classified into the wrong category. Compared with the PCA results of the test bench data, the results of the CWRU dataset perform better. The reason for this performance may be that the features of fault in the CWRU dataset are more obvious and easy to be learned and separated.

**Figure 13.** K-means++ PCA, DCGAN F vector output for CWRU dataset. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

The scores of the three evaluation indicators corresponding to this clustering result are shown in the table 15. The scores of the three indicators are 0.984, 0.994, and 0.980, respectively. The results of these indicators show that almost perfect clusters are obtained. These results are better than the results of this method applied to the test bench dataset, and also confirm the comparison results of PCA visualization. Combining all the results, the proposed method can still perform well when the signals contain strong noise. The clustering results of the CWRU dataset in the unsupervised stage are already very amazing, so it is no longer necessary to optimize the results through semi-supervised learning.

Table 15. The unsupervised fault diagnosis result of CWRU dataset.

Model	ARI	Purity	NMI
DCGAN	0.984	0.994	0.980

4.4. Comparison with VAE and infoGAN

4.4.1. Comparison result for test bench dataset.

A baseline against VAE and infoGAN is completed on the two datasets to check the progressiveness of this method. Similarly, the above three indicators are applied to assess the clustering results of these methods. The z-mean outputs of VAE are the feature vectors which are the input of clustering algorithm.

For test bench dataset, only the comparison results of 0MPa are discussed. After 1000 iterations, the best results of VAE and infoGAN are obtained and shown in table 16. For VAE, the final results are 0.668, 0.903, and 0.772. It is obvious that the results of VAE are much worse than those of DCGAN. For the infoGAN, the results are obtained: the three indicators are 0.879, 0.952, and 0.845, respectively. Compared with the results of DCGAN, infoGAN is slightly inferior. The measured axial piston pump bearing fault signal contains little noise signal. Therefore, even if without information from the latent space in infoGAN, the powerful representations are provided for the samples by DCGAN.

Table 16. The comparison results of test bench dataset fault diagnosis.

Model	ARI	Purity	NMI
VAE	0.668	0.903	0.772
InfoGAN	0.879	0.952	0.845

Similarly, PCA is also used to visually evaluate the results of these two methods. Note that the explained variance of all PCAs is close to 90% so the visualizations are very suitable for the general structure of the data. The results of VAE and infoGAN are shown in figures 14 and 15, respectively. The normal samples cannot be distinguished well from all samples in these two models. For the visualization results of VAE, part of the inner race fault samples and ball fault samples are classified as outer race faults. In particular, a large number of normal samples are classified as outer race fault samples. This has led to such poor results of VAE. For the visualization results of infoGAN, a small number of samples of all types are misclassified into other categories. Combining the evaluation results and the visualization results, compared to VAE and infoGAN, DCGAN has a better effect in the unsupervised fault diagnosis stage.

**Figure 14.** K-means+ + PCA, VAE feature vector output. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

**Figure 15.** K-means+ + PCA, infoGAN feature vector output. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

4.4.2. Comparison result for CWRU dataset.

After 1000 iterations, the best results of VAE and infoGAN are obtained and shown in table 17. For VAE, the scores of the three indicators reached 0.953, 0.972 and 0.936 respectively; for infoGAN, the scores of these indicators are only 0.692, 0.837 and 0.758 respectively. The result of VAE is close to that of DCGAN, indicating that DCGAN has a little advantage for this dataset. However, the result of infoGAN is disappointing, and there is a big gap with that of DCGAN.

Table 17. The comparison results of CWRU dataset fault diagnosis.

Model	ARI	Purity	NMI
VAE	0.953	0.972	0.936
InfoGAN	0.692	0.837	0.758

In addition to the results of these indicators, their results of PCA visualization are shown in the figures 16 and 17. For the visualization results of VAE, the separation of normal samples from other samples is extremely obvious, but the other three types of samples are still mixed. Although the results of the evaluation indicators are almost the same as DCGAN, it is difficult to catch up with DCGAN in terms of separating each type of samples. From figure 17, infoGAN can only separate normal samples from all samples, but many samples are clustered incorrectly, failing to achieve the goal of correct clustering. In general, the performance of DCGAN is more prominent than the other two unsupervised methods.

**Figure 16.** K-means+ + PCA, VAE feature vector output. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

**Figure 17.** K-means+ + PCA, infoGAN feature vector output. (a) Real. (b) Predicted.
Download figure:
Standard image High-resolution image

5. Conclusion

In this paper, a deep semi-supervised learning method based on DCGAN is proposed for axial piston pump bearing fault diagnosis. According to the number of labels provided, different strategies are given. When the number of labels is extremely scarce, the unsupervised fault diagnosis method is adopted; when the number of labels reaches a certain number, the labels are fed into the semi-supervised fault diagnosis model to obtain better diagnosis results. In order to verify the effectiveness of the proposed method, experiments are conducted on two datasets and the conclusions are drawn as follow.

(a)
The proposed method performs well on the provided test bench dataset which contains strong pulses, indicating that the method has the advantage of fault diagnosis for signals with many pulses.
(b)
The proposed method obtains almost pure clusters on the CWRU dataset with much noise, showing that the method has excellent anti-noise ability and strong robustness.
(c)
When the prior knowledge increases, the label of the data also increases. Then, the label is fed into SGAN for semi-supervised learning, which can obtain higher clustering indicators than unsupervised learning.

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (Grant No. 51805376) and ZheJiang Provincial Natural Science Foundation of China (Grant No. LD21E050001) and Wenzhou Major Science and Technology Innovation Project of China (ZG2020051) and Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems (GZKF-201719).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

A semi-supervised fault diagnosis method for axial piston pump bearings based on DCGAN

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Preliminary

2.1. GAN

2.2. DCGAN

2.3. CWT

2.4. Clustering

3. Proposed method

3.1. Samples segmentation and standardization

3.2. Samples augmentation

3.3. Feature vector acquisition

3.4. Semi-supervised learning

3.5. Overview of the proposed method

4. Fault diagnosis of axial piston pump

4.1. Data description

4.1.1. Test bench dataset.

4.1.2. CWRU dataset.

4.2. Setting of the DCGAN

4.3. Fault diagnosis results

4.3.1. Evaluation index.

4.3.2. The unsupervised learning results for test bench dataset.

4.3.3. The semi-supervised learning results for test bench dataset.

4.3.4. The result for CWRU dataset.

4.4. Comparison with VAE and infoGAN

4.4.1. Comparison result for test bench dataset.

4.4.2. Comparison result for CWRU dataset.

5. Conclusion

Acknowledgments

Data availability statement

A semi-supervised fault diagnosis method for axial piston pump bearings based on DCGAN

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Preliminary

2.1. GAN

2.2. DCGAN

2.3. CWT

2.4. Clustering

3. Proposed method

3.1. Samples segmentation and standardization

3.2. Samples augmentation

3.3. Feature vector acquisition

3.4. Semi-supervised learning

3.5. Overview of the proposed method

4. Fault diagnosis of axial piston pump

4.1. Data description

4.1.1. Test bench dataset.

4.1.2. CWRU dataset.

4.2. Setting of the DCGAN

4.3. Fault diagnosis results

4.3.1. Evaluation index.

4.3.2. The unsupervised learning results for test bench dataset.

4.3.3. The semi-supervised learning results for test bench dataset.

4.3.4. The result for CWRU dataset.

4.4. Comparison with VAE and infoGAN

4.4.1. Comparison result for test bench dataset.

4.4.2. Comparison result for CWRU dataset.

5. Conclusion

Acknowledgments

Data availability statement