Elsevier

ISA Transactions

Volume 93, October 2019, Pages 341-353
ISA Transactions

Practice article
Learning transferable features in deep convolutional neural networks for diagnosing unseen machine conditions

https://doi.org/10.1016/j.isatra.2019.03.017Get rights and content

Highlights

  • A deep transfer learning framework for diagnosing unseen faults in target applications.

  • Exploring the feature transferability in disparate levels of the pre-trained model.

  • Enables to build and train a large network efficiently with limited data for new tasks.

Abstract

Recent years have witnessed increasing popularity and development of deep learning spanning through various fields. Deep networks, and in particular convolutional neural network (CNN) have also achieved many state-of-the-art competition results in the intelligent fault diagnosis of mechanical systems. However, most of the existing studies have been performed with the assumption that the same distribution holds for both the training data and the test data, which is not in accord with situations in real diagnosis tasks. To tackle this problem, a transfer learning framework based on pre-trained CNN, which leverages the knowledge learned from the training data to facilitate diagnosing a new but similar task, is presented in this work. First, the CNN is trained on large datasets to learn the hierarchical features from the raw data. Then, the architecture and weights of the pre-trained CNN are transferred to new tasks with proper fine-tuning instead of training a network from scratch. To adapt the pre-trained CNN in a specific case, three transfer learning strategies are discussed and compared to investigate the applicability as well as the significance of feature transferability from the different levels of a deep structure. The case studies show that the proposed framework can transfer the features of the pre-trained CNN to boost the diagnosis performance on unseen machine conditions in terms of diverse working conditions and fault types.

Introduction

The condition-based maintenance (CBM) strategy has received great attention in modern industries, with its prominent advantages of availability, sustainability, safety, and reliability. Unlike breakdown maintenance and time-based maintenance, CBM aims to schedule the maintenance time and planning depending on the actual running state of the equipment. As the process of making a decision on machine conditions is based on appropriate analyses of the monitoring data, diagnostics are the core blocks of CBM [1]. Essentially, machine diagnostics can be formulated as a pattern classification and recognition problem [2].

Little over a decade ago, artificial intelligence techniques brought about the idea that the machine health conditions can be efficiently assessed by a well-trained classifier instead of a diagnosis specialist, which is commonly known as intelligent fault diagnosis [3], [4]. Backed by an automatic operating procedure, high-efficiency data processing capability as well as relatively satisfactory diagnosis accuracy, this strategy led to a series of works dealing either with feature extraction and selection, or with the design and optimization of classification algorithms, in the fields of diagnostics for mechanical equipment [5], [6], [7]. With reference to the recent literature, most of the research about intelligent fault diagnosis can be categorized into two groups: manual feature extraction methods and deep learning methods (as shown in Fig. 1(a)). The general steps, including the data preprocessing, feature extraction and selection and pattern recognition, are needed in the manual feature extraction methods. The influential algorithms in data mining cover noise filtering, dimensionality reduction, instance reduction, imbalanced data preprocessing [8], [9], [10], have been widely studied in fault diagnosis problem. However, for manual feature extraction methods, relevant studies have revealed two inherent weaknesses. (1) The proper designs of feature extraction and selection approaches largely rely on prior knowledge about the objects of analysis. (2) The prevailing pattern recognition algorithms such as artificial neural network (ANN) and support vector machine (SVM) for automatically identifying the machine health condition, are shallow learning architectures, indicating that they are unable to approximate the highly complex non-linear functions [11], [12], [13]. Tackling some of the weaknesses of the traditional framework, the second deep learning methods have enjoyed increasing popularity for a wide range of problems, such as image processing, computer vision, video processing, natural language processing, etc., in recent years [14]. One of the great advantages of this framework is that feature learning is automatic. Based on the labeled data and back-propagation, the deep model methods can initiatively capture the essential features for diagnosing machine conditions, rather than depending on the classification capacity of handcrafted features or pattern recognition algorithms. The deep neural network, especially convolutional neural networks (CNNs), has also led to many state-of-the-art competitive results in the fault diagnosis of diverse mechanical objects, such as spindle bearings, planetary gearboxes and rotor systems [15], [16], [17], [18], [19], [20], [21], [22], [23], [24].

Although the marvelous success of deep learning in a variety of diagnosis applications has often been reported, the topic is still largely open. First, these works are mostly under the assumption of the same distribution between the training data and the test data. The diagnostic ability will evidently degenerate when the training and testing data have different feature distribution. Second, larger amounts of data are often required to train the deep learning models, while typically labeled fault samples are generally scarce in actual industrial tasks. Consequently, it would be highly desirable to develop methods that can leverage knowledge from pre-existing tasks (source domain) to facilitate model training and diagnosis in a unseen machine diagnostic problem (target domain) that is similar but not same as the existing task (namely transfer learning), as shown in Fig. 1(b). This would allow the knowledge to be adequately utilized across different tasks, so that new diagnosis applications can be done more flexible, and more importantly, to improve the learning ability with limited number of samples.

Currently, transfer learning based algorithms and frameworks have been widely studied in different research areas, such as image classification [25], [26], text classification [27], acoustic event recognition [28] and biometrics [29]. According to the latest surveys in this community [30], [31], transfer learning techniques can be mainly categorized into three classes: inductive transfer learning, transductive transfer learning and unsupervised transfer learning. In this work, we focus on the inductive transfer learning for fault diagnosis tasks where sufficiently labeled data in the source domain are available for model pre-training and some labeled data in the target domain are employed to induce a transferred model for use in target tasks. Combined with the rise of deep learning, the signal characteristics can be adaptively and hierarchical compiled as the hierarchical weights of the deep network. Instead of training a deep neural network from scratch, the network parameters can be easily recaptured in other new tasks for feature and knowledge transfer. In the field of object recognition, Oquab et al. [32] designed a deep CNN based transfer learning method for the reuse of the convolutional layers learned on large-scale annotated datasets (ImageNet) to compute the mid-level image representation for target tasks. To reduce the effects of different image statistics (types of objects, viewpoints) across domains, new adaptation layers (two fully connected layers) were added to replace the output layers of pre-trained networks, and trained with a limited amount of target data. The competition results on two image datasets show the high feature transferability of ImageNet-trained CNN for object and action recognition tasks. Also utilizing CNN, the paper by Yosinki et al. [33] further investigated how well the features of each layer transfer from the source domain to the target domain in image classification problems. This research reveals two issues, i.e., optimization difficulties when separating certain layers from the whole network without considering the fragile co-adapted features on successive layers, and the specialization of higher layer features to the source domain, which may increase the difficulties in bridging the gaps between the source and target tasks.

However, studies about transfer learning with the pre-trained deep network in fault diagnosis cases are few. Zhang et al. trained a shallow ANN with enough source data from the bearing data center of Case Western Reserve University (CWRU) for bearing fault diagnosis tasks, and then they transferred the parameters and modified the structure into new but similar tasks with a small amount of target data under different working conditions [34]. The other scenarios, such as diverse fault types across domains, will also lead to a large domain discrepancy. Investigations of transfer fault diagnosis under more scenarios and fault datasets are crucial and necessary. Besides, the transferability of the features in the deep structure is not well discussed with mechanical fault data, and the proper transfer strategy with respect to fault data quality is not clear. Motived by this, this work presents three strategies, namely fine-tuning of the fully connected layers, fine-tuning of the whole pre-trained model, and fine-tuning of the feature descriptor layers, to explore feature transferability in diagnosis applications. With two diagnosis datasets (i.e. an open-access gearbox fault dataset [35], and our single-stage cylindrical straight gearbox fault dataset), the performance of the presented transfer learning strategies is investigated with the assumption that limited samples are available in the target task, where two scenarios i.e., diverse working conditions and diverse fault types, are considered. The main contributions of this work are summarized as follows:

(1) A novel deep transfer learning framework is presented on the basis of the pre-trained CNN, consisting of three strategies: (i) transferring the feature descriptor (convolutional layers) and fine tuning the classifier (fully-connected layers), (ii) transferring and fine-tuning both the feature descriptor and the classifier, and (iii) transferring the classifier and fine-tuning the feature descriptor.

(2) Through exploring the feature transferability in disparate levels of the pre-trained model, the questions of ‘how to transfer’ and ‘what to transfer’ [36] are answered for actual industrial diagnosis applications, where approaches that can handle new but similar scenarios (target tasks) with no need for large amounts of new labeled data and intense expert knowledge, are highly appreciated.

(3) The presented framework is successfully applied in two different scenarios, (i) feature transfer between varying working conditions to properly identify the fault, and (ii) feature transfer between diverse fault types to correctly isolate the new fault by its location.

The remaining part of this paper is organized as follows. In Section 2, the research backgrounds are briefly described. In Section 3, we present the proposed transfer learning strategies in the mechanical diagnosis application. Section 4 presents two case studies, and the base systems. The results and a thorough discussion are made in Sections 5 Results, 6 Discussion, respectively. Finally, the conclusions are drawn in Section 7.

Section snippets

End-to-end diagnosis framework via CNN

The main advantage of deep networks is the automatic feature learning ability, which is capable of adaptively capturing and extracting fault-sensitive features from the raw signal, providing an end-to-end framework for mechanical fault diagnosis. This work focuses on a type of widely used deep network, i.e., CNN. A typical CNN generally consists of a multiple combinations of one or more convolutional layers and pooling layers, followed by one or more fully connected layers.

Feature extraction:

A transfer learning framework for machine fault diagnosis

Essentially, a deep CNN model can be divided into a total of two parts. The initial convolutional and max-pooling blocks are used to learn the signal feature, and to appropriately represent the input signal, which can be treated as a feature descriptor (purple region in the left part in Fig. 2). Additionally, the subsequent fully-connected layers are trained to make a decision for a supervised classification problem, which can be seen as a classifier (buff region in the right section in Fig. 2

Diagnosis case 1

The gearbox fault dataset from the 2009 challenge data of Prognostics and Health Management (PHM) society is used as the first diagnosis case. This fault dataset is representative of generic industrial gearbox fault data. The details of the gearbox structure, the positions of the apparatuses that are used to collect the data, and an overview of the gearbox are presented in Fig. 4. The experiments were performed on two sets of gears, i.e. spur gears and helical gears, at 30, 35, 40, 45 and

Results

This section conducts case studies to verify effectiveness and efficiency of the investigated techniques mainly from two aspects: the comparison between traditional framework and transfer learning framework considering domain discrepancy problem, and the performance investigation of diverse transfer strategies.

Discussion

For transfer strategy 1, as the convolution kernels in the pre-trained CNN are completely learned from the source data, thus, they are tailored to the source tasks. Keeping the weights of the convolution kernels frozen and only fine tuning the classifier could not essentially adapt the feature representation to the target task when the signal characteristics exhibited distinct differences across the domains. Different working conditions or machine failure types generally result in different

Conclusions and future works

Transfer learning is a promising tool for improving the performance of target problems by exploiting knowledge from the previous tasks that are different but similar. Unlike traditional machine learning techniques on fault diagnosis, transfer learning focuses more on transferring existing knowledge or skills to novel tasks, to meet the needs for there being not enough data to train a model from scratch, especially if some of the faults rarely occur in reality. By exploring the transferable

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 11572167). The authors would like to express their sincere gratitude to lab associate Mr. Shaohua Li for his contribution on the experiments. Special thanks should also be expressed to the editors and reviewers for their review work.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (40)

  • WangS. et al.

    Convolutional neural network-based hidden markov models for rolling element bearing fault identification

    Knowl-Based Syst

    (2018)
  • JiaF. et al.

    A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines

    Neurocomputing

    (2018)
  • KhatamiA. et al.

    A sequential search-space shrinking using cnn transfer learning and a radon projection pool for medical image retrieval

    Expert Syst Appl

    (2018)
  • JiangD. et al.

    Machine condition classification using deterioration feature extraction and anomaly determination

    IEEE Trans Reliab

    (2011)
  • HenriquezP. et al.

    Review of automatic fault diagnosis systems using audio and vibration signals

    IEEE Trans Syst Man Cybern Syst

    (2014)
  • HanT. et al.

    Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery

    Trans Inst Meas Control

    (2018)
  • SabinaB. et al.

    Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation

    Anal Chem

    (2006)
  • CateniS. et al.

    A procedure for building reduced reliable training datasets from real-world data

    Acta Press

    (2014)
  • GarcaS. et al.

    Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

    Knowl-Based Syst

    (2016)
  • JiaoJ. et al.

    A multivariate encoder information based convolutional neural network for intelligent fault diagnosis of planetary gearboxes

    Knowl-Based Syst

    (2018)
  • Cited by (129)

    View all citing articles on Scopus
    View full text