Learning transferable features in deep convolutional neural networks for diagnosing unseen machine conditions

doi:10.1016/j.isatra.2019.03.017

ISA Transactions

Volume 93, October 2019, Pages 341-353

https://doi.org/10.1016/j.isatra.2019.03.017 Get rights and content

Highlights

•
A deep transfer learning framework for diagnosing unseen faults in target applications.
•
Exploring the feature transferability in disparate levels of the pre-trained model.
•
Enables to build and train a large network efficiently with limited data for new tasks.

Abstract

Recent years have witnessed increasing popularity and development of deep learning spanning through various fields. Deep networks, and in particular convolutional neural network (CNN) have also achieved many state-of-the-art competition results in the intelligent fault diagnosis of mechanical systems. However, most of the existing studies have been performed with the assumption that the same distribution holds for both the training data and the test data, which is not in accord with situations in real diagnosis tasks. To tackle this problem, a transfer learning framework based on pre-trained CNN, which leverages the knowledge learned from the training data to facilitate diagnosing a new but similar task, is presented in this work. First, the CNN is trained on large datasets to learn the hierarchical features from the raw data. Then, the architecture and weights of the pre-trained CNN are transferred to new tasks with proper fine-tuning instead of training a network from scratch. To adapt the pre-trained CNN in a specific case, three transfer learning strategies are discussed and compared to investigate the applicability as well as the significance of feature transferability from the different levels of a deep structure. The case studies show that the proposed framework can transfer the features of the pre-trained CNN to boost the diagnosis performance on unseen machine conditions in terms of diverse working conditions and fault types.

Introduction

The condition-based maintenance (CBM) strategy has received great attention in modern industries, with its prominent advantages of availability, sustainability, safety, and reliability. Unlike breakdown maintenance and time-based maintenance, CBM aims to schedule the maintenance time and planning depending on the actual running state of the equipment. As the process of making a decision on machine conditions is based on appropriate analyses of the monitoring data, diagnostics are the core blocks of CBM [1]. Essentially, machine diagnostics can be formulated as a pattern classification and recognition problem [2].

Little over a decade ago, artificial intelligence techniques brought about the idea that the machine health conditions can be efficiently assessed by a well-trained classifier instead of a diagnosis specialist, which is commonly known as intelligent fault diagnosis [3], [4]. Backed by an automatic operating procedure, high-efficiency data processing capability as well as relatively satisfactory diagnosis accuracy, this strategy led to a series of works dealing either with feature extraction and selection, or with the design and optimization of classification algorithms, in the fields of diagnostics for mechanical equipment [5], [6], [7]. With reference to the recent literature, most of the research about intelligent fault diagnosis can be categorized into two groups: manual feature extraction methods and deep learning methods (as shown in Fig. 1(a)). The general steps, including the data preprocessing, feature extraction and selection and pattern recognition, are needed in the manual feature extraction methods. The influential algorithms in data mining cover noise filtering, dimensionality reduction, instance reduction, imbalanced data preprocessing [8], [9], [10], have been widely studied in fault diagnosis problem. However, for manual feature extraction methods, relevant studies have revealed two inherent weaknesses. (1) The proper designs of feature extraction and selection approaches largely rely on prior knowledge about the objects of analysis. (2) The prevailing pattern recognition algorithms such as artificial neural network (ANN) and support vector machine (SVM) for automatically identifying the machine health condition, are shallow learning architectures, indicating that they are unable to approximate the highly complex non-linear functions [11], [12], [13]. Tackling some of the weaknesses of the traditional framework, the second deep learning methods have enjoyed increasing popularity for a wide range of problems, such as image processing, computer vision, video processing, natural language processing, etc., in recent years [14]. One of the great advantages of this framework is that feature learning is automatic. Based on the labeled data and back-propagation, the deep model methods can initiatively capture the essential features for diagnosing machine conditions, rather than depending on the classification capacity of handcrafted features or pattern recognition algorithms. The deep neural network, especially convolutional neural networks (CNNs), has also led to many state-of-the-art competitive results in the fault diagnosis of diverse mechanical objects, such as spindle bearings, planetary gearboxes and rotor systems [15], [16], [17], [18], [19], [20], [21], [22], [23], [24].

Although the marvelous success of deep learning in a variety of diagnosis applications has often been reported, the topic is still largely open. First, these works are mostly under the assumption of the same distribution between the training data and the test data. The diagnostic ability will evidently degenerate when the training and testing data have different feature distribution. Second, larger amounts of data are often required to train the deep learning models, while typically labeled fault samples are generally scarce in actual industrial tasks. Consequently, it would be highly desirable to develop methods that can leverage knowledge from pre-existing tasks (source domain) to facilitate model training and diagnosis in a unseen machine diagnostic problem (target domain) that is similar but not same as the existing task (namely transfer learning), as shown in Fig. 1(b). This would allow the knowledge to be adequately utilized across different tasks, so that new diagnosis applications can be done more flexible, and more importantly, to improve the learning ability with limited number of samples.

Currently, transfer learning based algorithms and frameworks have been widely studied in different research areas, such as image classification [25], [26], text classification [27], acoustic event recognition [28] and biometrics [29]. According to the latest surveys in this community [30], [31], transfer learning techniques can be mainly categorized into three classes: inductive transfer learning, transductive transfer learning and unsupervised transfer learning. In this work, we focus on the inductive transfer learning for fault diagnosis tasks where sufficiently labeled data in the source domain are available for model pre-training and some labeled data in the target domain are employed to induce a transferred model for use in target tasks. Combined with the rise of deep learning, the signal characteristics can be adaptively and hierarchical compiled as the hierarchical weights of the deep network. Instead of training a deep neural network from scratch, the network parameters can be easily recaptured in other new tasks for feature and knowledge transfer. In the field of object recognition, Oquab et al. [32] designed a deep CNN based transfer learning method for the reuse of the convolutional layers learned on large-scale annotated datasets (ImageNet) to compute the mid-level image representation for target tasks. To reduce the effects of different image statistics (types of objects, viewpoints) across domains, new adaptation layers (two fully connected layers) were added to replace the output layers of pre-trained networks, and trained with a limited amount of target data. The competition results on two image datasets show the high feature transferability of ImageNet-trained CNN for object and action recognition tasks. Also utilizing CNN, the paper by Yosinki et al. [33] further investigated how well the features of each layer transfer from the source domain to the target domain in image classification problems. This research reveals two issues, i.e., optimization difficulties when separating certain layers from the whole network without considering the fragile co-adapted features on successive layers, and the specialization of higher layer features to the source domain, which may increase the difficulties in bridging the gaps between the source and target tasks.

However, studies about transfer learning with the pre-trained deep network in fault diagnosis cases are few. Zhang et al. trained a shallow ANN with enough source data from the bearing data center of Case Western Reserve University (CWRU) for bearing fault diagnosis tasks, and then they transferred the parameters and modified the structure into new but similar tasks with a small amount of target data under different working conditions [34]. The other scenarios, such as diverse fault types across domains, will also lead to a large domain discrepancy. Investigations of transfer fault diagnosis under more scenarios and fault datasets are crucial and necessary. Besides, the transferability of the features in the deep structure is not well discussed with mechanical fault data, and the proper transfer strategy with respect to fault data quality is not clear. Motived by this, this work presents three strategies, namely fine-tuning of the fully connected layers, fine-tuning of the whole pre-trained model, and fine-tuning of the feature descriptor layers, to explore feature transferability in diagnosis applications. With two diagnosis datasets (i.e. an open-access gearbox fault dataset [35], and our single-stage cylindrical straight gearbox fault dataset), the performance of the presented transfer learning strategies is investigated with the assumption that limited samples are available in the target task, where two scenarios i.e., diverse working conditions and diverse fault types, are considered. The main contributions of this work are summarized as follows:

(1) A novel deep transfer learning framework is presented on the basis of the pre-trained CNN, consisting of three strategies: (i) transferring the feature descriptor (convolutional layers) and fine tuning the classifier (fully-connected layers), (ii) transferring and fine-tuning both the feature descriptor and the classifier, and (iii) transferring the classifier and fine-tuning the feature descriptor.

(2) Through exploring the feature transferability in disparate levels of the pre-trained model, the questions of ‘how to transfer’ and ‘what to transfer’ [36] are answered for actual industrial diagnosis applications, where approaches that can handle new but similar scenarios (target tasks) with no need for large amounts of new labeled data and intense expert knowledge, are highly appreciated.

(3) The presented framework is successfully applied in two different scenarios, (i) feature transfer between varying working conditions to properly identify the fault, and (ii) feature transfer between diverse fault types to correctly isolate the new fault by its location.

The remaining part of this paper is organized as follows. In Section 2, the research backgrounds are briefly described. In Section 3, we present the proposed transfer learning strategies in the mechanical diagnosis application. Section 4 presents two case studies, and the base systems. The results and a thorough discussion are made in Sections 5 Results, 6 Discussion, respectively. Finally, the conclusions are drawn in Section 7.

Section snippets

End-to-end diagnosis framework via CNN

The main advantage of deep networks is the automatic feature learning ability, which is capable of adaptively capturing and extracting fault-sensitive features from the raw signal, providing an end-to-end framework for mechanical fault diagnosis. This work focuses on a type of widely used deep network, i.e., CNN. A typical CNN generally consists of a multiple combinations of one or more convolutional layers and pooling layers, followed by one or more fully connected layers.

Feature extraction:

A transfer learning framework for machine fault diagnosis

Essentially, a deep CNN model can be divided into a total of two parts. The initial convolutional and max-pooling blocks are used to learn the signal feature, and to appropriately represent the input signal, which can be treated as a feature descriptor (purple region in the left part in Fig. 2). Additionally, the subsequent fully-connected layers are trained to make a decision for a supervised classification problem, which can be seen as a classifier (buff region in the right section in Fig. 2

Diagnosis case 1

The gearbox fault dataset from the 2009 challenge data of Prognostics and Health Management (PHM) society is used as the first diagnosis case. This fault dataset is representative of generic industrial gearbox fault data. The details of the gearbox structure, the positions of the apparatuses that are used to collect the data, and an overview of the gearbox are presented in Fig. 4. The experiments were performed on two sets of gears, i.e. spur gears and helical gears, at 30, 35, 40, 45 and

Results

This section conducts case studies to verify effectiveness and efficiency of the investigated techniques mainly from two aspects: the comparison between traditional framework and transfer learning framework considering domain discrepancy problem, and the performance investigation of diverse transfer strategies.

Discussion

For transfer strategy 1, as the convolution kernels in the pre-trained CNN are completely learned from the source data, thus, they are tailored to the source tasks. Keeping the weights of the convolution kernels frozen and only fine tuning the classifier could not essentially adapt the feature representation to the target task when the signal characteristics exhibited distinct differences across the domains. Different working conditions or machine failure types generally result in different

Conclusions and future works

Transfer learning is a promising tool for improving the performance of target problems by exploiting knowledge from the previous tasks that are different but similar. Unlike traditional machine learning techniques on fault diagnosis, transfer learning focuses more on transferring existing knowledge or skills to novel tasks, to meet the needs for there being not enough data to train a model from scratch, especially if some of the faults rarely occur in reality. By exploring the transferable

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 11572167). The authors would like to express their sincere gratitude to lab associate Mr. Shaohua Li for his contribution on the experiments. Special thanks should also be expressed to the editors and reviewers for their review work.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (40)

LeeJ. et al.
Prognostics and health management design for rotary machinery systemsreviews, methodology and applications
Mech Syst Signal Process
(2014)
WidodoA. et al.
Intelligent fault diagnosis system of induction motor based on transient current signal
Mechatronics
(2009)
HanT. et al.
Intelligent fault diagnosis method for rotating machinery via dictionary learning and sparse representation-based classification
Measurement
(2018)
LiuC. et al.
Global geometric similarity scheme for feature selection in fault diagnosis
Expert Syst Appl
(2014)
CuiL. et al.
Hvsrms localization formula and localization law: Localization diagnosis of a ball bearing outer ring fault
Mech Syst Signal Process
(2019)
CuiL. et al.
Application of pattern recognition in gear faults based on the matching pursuit of a characteristic waveform
Measurement
(2017)
ShaoH. et al.
A novel deep autoencoder feature learning method for rotating machinery fault diagnosis
Mech Syst Signal Process
(2017)
SchmidhuberJ.
Deep learning in neural networks: An overview
Neural Netw
(2015)
HanT. et al.
An adaptive spatiotemporal feature learning approach for fault diagnosis in complex systems
Mech Syst Signal Process
(2019)
HanT. et al.
A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults
Knowl-Based Syst
(2019)

WangS. et al.

Convolutional neural network-based hidden markov models for rolling element bearing fault identification

Knowl-Based Syst

(2018)

JiaF. et al.

A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines

Neurocomputing

(2018)

KhatamiA. et al.

A sequential search-space shrinking using cnn transfer learning and a radon projection pool for medical image retrieval

Expert Syst Appl

(2018)

JiangD. et al.

Machine condition classification using deterioration feature extraction and anomaly determination

IEEE Trans Reliab

(2011)

HenriquezP. et al.

Review of automatic fault diagnosis systems using audio and vibration signals

IEEE Trans Syst Man Cybern Syst

(2014)

HanT. et al.

Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery

Trans Inst Meas Control

(2018)

SabinaB. et al.

Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation

Anal Chem

(2006)

CateniS. et al.

A procedure for building reduced reliable training datasets from real-world data

Acta Press

(2014)

GarcaS. et al.

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

Knowl-Based Syst

(2016)

JiaoJ. et al.

A multivariate encoder information based convolutional neural network for intelligent fault diagnosis of planetary gearboxes

Knowl-Based Syst

(2018)

Cited by (129)

Intelligent multi-severity nuclear accident identification under transferable operation conditions
2024, Annals of Nuclear Energy
Nuclear power plants (NPPs) have witnessed significant advancements in intelligent accident identification in recent years. However, comprehensive research on fine-grained analysis of accident severity levels has been lacking, thus limiting its practical application in real operating environments. This study proposes a novel intelligent nuclear accident identification method based on the designed dual-branch domain adaptation neural network (DBDANN). The network employs the multiple kernel maximum mean discrepancy (MK-MMD) to measure the feature distribution discrepancy between the source and target domains. Additionally, a deep residual convolutional neural network (DRCNN) is integrated with two separate classifiers with dual-branch design for accident type and severity identification parallelly. The results from three transferrable scenarios for accident identification in nuclear simulation platforms demonstrate the effectiveness of the proposed method in providing fine-grained severity level recognition. Moreover, the proposed method holds promising potential for future intelligent model expansion and practical deployment.
Digital twin-assisted dual transfer: A novel information-model adaptation method for rolling bearing fault diagnosis
2024, Information Fusion
Rolling bearing fault diagnosis is of great importance to the safety management of mechanical equipment. The scarcity of labelled fault data makes it difficult to adequately perform the training process of intelligent diagnosis models, and this will result in these intelligent models not being effectively and widely used in practice. Although some recent studies have verified that the addition of dynamic model response to the training process will greatly improve the ability of the model with low cost and high efficiency, it is still stuck in poor effect caused by large information distribution difference between dynamic model response and real measured data. Focusing on this issue, a digital twin-assisted dual transfer (DTa-DT) method with information and model adaptation was proposed for rolling bearing fault diagnosis. Different from the traditional digital-analogue driven transfer methods, the proposed DTa-DT aims to simultaneously synthesize data information transfer and feature model transfer together with domain transfer error minimization. In particular, it should be noted that the DTa-DT architecture consists of a dual transfer learning process, including digital twin-driven information transfer (DTd-IT) and digital-analogue-driven model transfer (DAd-MT), where the information is collaborated with the model to improve the integrated transfer diagnosis effect under sampling. On one aspect, with the employment of bearing dynamic model responses, DTd-IT is innovatively designed to establish the transfer of dynamic information and measured information. The information distribution difference between these twin data and real measured data is effetely adjusted with the introduced actual inference components, where the twin data with low information distribution difference can be well fusion generated by the information transfer digital twin (ITDT) model. On the other aspect, considering the truth that there are still small sample cases of real measured data and information distribution differences will affect the quality of the twin data, a digital-analogue driven model transfer (DAd-MT) method is further proposed, where the deep branch transfer network (DBTN) model with improved convolutional neural network (CNN) is used to achieve an accurate fault diagnosis effect with the help of digital twin data. Experiments and wear analysis verified that the proposed DTa-DT can significantly reduce the distribution difference between the dynamic model response and the real measured data, thus achieving low-cost and efficient rolling bearing transfer diagnosis compared to other ten state-of-the-art deep learning models. It can be predicted that the proposed dual transfer architecture provides more opportunities for the practical application of intelligent fault diagnosis under small sample sizes.
Fault diagnosis in rotating machines based on transfer learning: Literature review
2024, Knowledge-Based Systems
With the emergence of machine learning methods, data-driven fault diagnosis has gained significant attention in recent years. However, traditional data-driven diagnosis approaches do not apply to engineering diagnosis problemssince they require that training and testing data have a consistent distribution. Transfer learning (TL) approaches for fault diagnosis are gaining popularity as a means of resolving this issue. These approaches aim to design models efficiently addressing target tasks by leveraging data from related but distinct source domains. The purpose of this study is to present a comprehensive survey of the recent progress made in applying TL techniques to diagnose faults in rotating machines. An overview of parameter-based, instance-based, feature-based, and relevance-based knowledge transfer is provided, followed by a summary of the various categories under which knowledge is transferred. These categories encompass various working environments, different machines, fault locations and their severity, imbalanced data, and more.This paper offers its readers a framework that can assist them in better understanding and recognizing the research status, problems, and future directions of transfer learning techniques for fault identification.
A novel stochastic resonance based deep residual network for fault diagnosis of rolling bearing system
2024, ISA Transactions
Rolling bearings constitute one of the most vital components in mechanical equipment, monitoring and diagnosing the condition of rolling bearings is essential to ensure safe operation. In actual production, the collected fault signals typically contain noise and cannot be accurately identified. In the paper, stochastic resonance (SR) is introduced into a spiking neural network (SNN) as a feature enhancement method for fault signals with varying noise intensities, combining deep learning with SR to enhance classification accuracy. The output signal-to-noise ratio(SNR) can be enhanced with the SR effect when the noise-affected fault signal input into neurons. Validation of the method is carried out through experiments on the CWRU dataset, achieving classification accuracy of 99.9%. In high-noise environments, with SNR equal to −8 dB, SRDNs achieve over 92% accuracy, exhibiting better robustness and adaptability.
Unsupervised transfer learning for intelligent health status identification of bearing in adaptive input length selection
2023, Engineering Applications of Artificial Intelligence
Input length (IL) is an important element in transfer learning (TL) network for intelligent health status identification of bearing (IHSIB). However, fixed IL are used in most studies. In this paper, a TL network via adaptive IL selection module for IHSIB (AILTLN) is proposed, which includes adaptive IL module, feature extractor module, health status identification module, and domain discriminator module. Firstly, an adaptive IL selection module based on envelope spectrum analysis is proposed. The module varies with bearing structure, motor speed, and sampling frequency. Secondly, group convolution, transposed convolution, and instant normalization are constructed in feature extractor. Thirdly, softmax cross-entropy loss function and maximum mean discrepancy are used for health status identification and domain alignment. The TL results of open bearing dataset and high-speed train bearing experiment show that AILTLN is better than the other existing methods in the TL of IHSIB. The ablation study shows the reuse of low-dimensional features and the adaptive IL help to improve to the accuracy of the proposed method.
A class-level matching unsupervised transfer learning network for rolling bearing fault diagnosis under various working conditions
2023, Applied Soft Computing
As an effective method, deep transfer learning is used to solve the problem of unsupervised fault diagnosis of rolling bearings. In the process of obtaining domain invariant features, the feature matching at the domain-level does not consider the distribution of each category in matching the global distribution of source domain features and target domain features. To solve this problem, a class-level matching transfer learning network is proposed. In this method, source domain data and target domain data from different categories are matched first. Meanwhile, domain-level matching and class-level matching are combined based on the maximum classifier discrepancy structure. Then the transfer training process is divided into three stages: domain-level matching, class-level matching, and target domain pseudo-label direct guidance network training. The proposed method utilizes the target domain pseudo-label output by the network to extract the features of the target data by self-directing. Compared with the improved maximum classifier discrepancy on the Paderborn University dataset, the proposed method improves the fault diagnosis average accuracy by 12.95% on the six transfer tasks. It significantly improves the diagnostic accuracy of unsupervised bearing fault diagnosis under various working conditions.

View all citing articles on Scopus

View full text

Practice articleLearning transferable features in deep convolutional neural networks for diagnosing unseen machine conditions

Highlights

Abstract

Introduction

Section snippets

End-to-end diagnosis framework via CNN

A transfer learning framework for machine fault diagnosis

Diagnosis case 1

Results

Discussion

Conclusions and future works

Acknowledgments

Conflict of interest

Mech Syst Signal Process

Mechatronics

Measurement

Expert Syst Appl

Mech Syst Signal Process

Measurement

Mech Syst Signal Process

Neural Netw

Mech Syst Signal Process

Knowl-Based Syst

Knowl-Based Syst

Neurocomputing

Expert Syst Appl

Machine condition classification using deterioration feature extraction and anomaly determination

IEEE Trans Reliab

Review of automatic fault diagnosis systems using audio and vibration signals

IEEE Trans Syst Man Cybern Syst

Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery

Trans Inst Meas Control

Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation

Anal Chem

A procedure for building reduced reliable training datasets from real-world data

Acta Press

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

Knowl-Based Syst

A multivariate encoder information based convolutional neural network for intelligent fault diagnosis of planetary gearboxes

Knowl-Based Syst

Practice article
Learning transferable features in deep convolutional neural networks for diagnosing unseen machine conditions