Fault detection with Conditional Gaussian Network

https://doi.org/10.1016/j.engappai.2015.07.020Get rights and content

Abstract

The main interest of this paper is to illustrate a new representation of the Principal Component Analysis (PCA) for fault detection under a Conditional Gaussian Network (CGN), a special case of Bayesian networks. PCA and its associated quadratic statistics such as T2 and SPE are integrated under a sole CGN. The proposed framework projects a new observation into an orthogonal space and gives probabilities on the state of the system. It could do so even when some data in the sample test are missing. This paper also gives the probabilities thresholds to use in order to match quadratic statistics decisions. The proposed network is validated and compared to the standard PCA scheme for fault detection on the Tennessee Eastman Process and the Hot Forming Process.

Introduction

Nowadays, systems failures can potentially lead to serious consequences for human, environment or material, and sometimes fixing them could be expensive and even dangerous. Thus, in order to avoid these undesirable situations, it becomes very important and essential for current modern complex systems to early detect any changes in the system nominal operations before they become critical. To do so, several detection methods have been developed and enhanced these last years. These methods can be broadly indexed into two principal approaches, named model-based methods and data-driven methods. Model-based methods are powerful and efficient widely used methods. They are related on the system analytical representation (detailed physical model). However, obtaining this representation for complex, large-scale systems is often not possible or very tricky and requests a lot of time and money. To deal with that, data driven methods have received a significant attention. These methods unlike model-based ones use only measures taken directly from the system (or their transformation) at different times (historical data).

Several data driven methods for faults detection have been proposed (Yin et al., 2012, Ding, 2012, Qin, 2012, Venkatasubramanian et al., 2003, Chiang et al., 2001). Many of them are based on rigorous statistical development of system data and one can mention Subspace aided APproach (SAP), powerful data-driven tools developed to address the problems of building an accurate physical model for complex systems. Partial Least Squares (PLS), Principal Component Analysis (PCA) and their variants (dynamic, non-linear, kernel, and probabilistic) are statistical methods widely used for data reduction and fault detection purpose.

PCA is a well-known and powerful data-driven technique significantly used in application for fault detection but also in many other fields due to its simplicity for model building and efficiency to handle a huge amount of data. In order to identify at any moment if the system is In Control (IC) or not (the system is Out of Control OC), it is, according to Ding et al. (2010) and Qin (2003), associated to statistics with quadratic forms. These statistics are not only associated to PCA but also to many others data driven and model-based methods. Among these statistics, two well-known and used statistics are the T2 and SPE (Squared Prediction Error) statistics. These two are generally combined to complement each other and thus enhance the fault sensitivity.

Meanwhile, in the last decades, Bayesian networks (BN) have been also proposed for fault detection (Yu and Rashid, 2013, Verron et al., 2010a, Huang, 2008, Roychoudhury et al., 2006, Schwall and Gerdes, 2002, Lerner et al., 2000). BN׳s are powerful tools designed by experts and/or learned from data. They offer a Probabilistic/statistical framework that able to integrate information from different sources which may be of interest for fault detection. Indeed, the use and the fusion of all the information available on the system (as causal influences (e.g. graphical representations of variables dependencies), probabilistic fault detection decisions, maintainability information, components reliability and so on) could enhance and provide better decisions. On this perspective, we propose to use a BN in order to model PCA fault detection techniques.

Another important challenge is to handle on-line missing observations. The most used approaches are based on the imputation methods, which try to complete the missing values. However, these methods are time consuming and depend strongly on the missing rate of the original sample. The proposed network, unlike most of the proposed Bayesian networks for fault detection, is able to respect a false alarm rate, model PCA fault detection scheme and handle automatically missing observation without delay or imputation. The main interests of this paper can be described in few points : (1) a generalized form of the quadratic statistics (e.g. T2, SPE) under a probabilistic tool, (2) a probabilistic framework for fault detection purpose, managing both PCA (systematic and residual subspaces) and statistics under a single BN using discrete and Gaussian nodes, and (3) probabilities about the system state could be provided, even when data on line are missing (a non-imputation method to handle unobserved variable s).

The remainder of this paper is structured as follows. In Section 2 a brief description of some definitions and tools needed to develop our proposals is given, Section 3 describes and introduces the development of PCA under CGNs for fault detection purpose. This is followed by a comparison between our proposal and the standard PCA, two cases studies are given. Finally, conclusions and outlooks are outlined in the last section.

Section snippets

Definition

A Bayesian Network (BN) (Jensen and Nielsen, 2007) is a probabilistic graphical model. It is associated and consists of the following:

  • a directed acyclic graph G, G=(V, E), where V is the vertexes set of G (nodes), and E is the edges set of G (arcs),

  • a finite probabilistic space (Ω,Z,p), with Ω a non-empty space, Z a collection of the subspaces of Ω and, p a probability measure on Z with p(Ω)=1,

  • a set of random variables x=x1,,xm associated with the vertexes of the graph G and defined on (Ω,Z,p),

The proposed probabilistic framework

In this section, we propose original CGNs for fault detection. Under these networks, we simultaneously handle PCA and quadratic statistics that come with it. For clarity, we introduce PCA under a CGN, after we propose a probabilistic framework for statistics as T2 and SPE, and ultimately we give the proposed CGNs for fault detection purpose. These CGNs can be used as an alternative to the PCA scheme for fault detection. Note, however, such as PCA they may be suitable for some applications and

Tennessee Eastman Process

In order to compare our proposal to the conventional fault detection PCA scheme (see Section 2.2), we propose to test both of them on the Tennessee Eastman Process (TEP). It is an industrial chemical process (see Fig. 9). Its simulation provided by the Eastman Chemical Company is widely used as a benchmark problem for control techniques and also to compare fault detection and/or diagnosis methods.

The TEP consists of five major units namely, reactor, condenser, compressor, separator and stripper

Conclusions and outlooks

The main interest of this paper is the presentation of a new tool for fault detection purpose. Firstly, we have transposed standard PCA (systematic and residual subspaces) under a BN and more precisely a CGN. Secondly, we have proposed a probabilistic framework for statistics as T2, SPE. For that, it has been necessary to define probabilistic control limits in order to match the decisions made by the comparison of the quadratic statistics to their thresholds. Finally, we have introduced a CGN

Acknowledgments

Mohamed Amine Atoui is supported by a Ph.D. purpose grant from “la Région Pays de la Loire”. The authors gratefully acknowledge the contribution of the reviewers comments.

References (31)

  • L. Chiang et al.

    Fault Detection and Diagnosis in Industrial Systems

    (2001)
  • Ding, S.X., Zhang, P., Jeinsch, T., Ding, E., Engel, P., Gui, W. A survey of the application of basic data-driven and...
  • Ding, S.X. Data-driven design of model-based fault diagnosis systems. In: Proceedings of IFAC ADCHEM, 2012, pp....
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • N. Friedman et al.

    Bayesian network classifiers

    Mach. Learn.

    (1997)
  • Cited by (27)

    • A hyper-heuristic inspired approach for automatic failure prediction in the context of industry 4.0

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      Regarding heuristic based AD approaches, the more conservatives use expert knowledge to fix or provide tentative initial parameters which make these methodologies high or medium expert knowledge dependent (Xie et al., 2019). Despite requiring quite a lot of background knowledge of the problem, some methodologies attempt to alleviate the dependent parameterization by fixing a time-window size, calculating the hyper-parameters of the algorithms, commonly by statistical methods (Chen et al., 2019b) or Cross Validation (Li et al., 2019), and finally estimating the threshold (Atoui et al., 2015; Chen et al., 2019b; Li et al., 2019; Yu, 2011) or setting it by grid search (Liu et al., 2017). Among the most commonly employed approaches, feature extraction (FE) has become, in recent years, a powerful tool to gain a better understanding of the TS related to the failure.

    • Unlocked decision making based on causal connections strength

      2021, European Journal of Control
      Citation Excerpt :

      Faults are often modeled as deviations from the in-control process mean or variance, or both. Some of the techniques that have been widely studied for process monitoring include principal component analysis (PCA) [2,5,42], partial least squares (PLS) [32], Bayesian networks (BN) [6,28], subspace methods [14], wavelets analysis [12]. Fault detection and diagnosis is a classification problem.

    • Enhanced fault diagnosis method using conditional Gaussian network for dynamic processes

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      A change or a new observation about the state of a child node is enough to update the posterior probability of each node of the BN and decide about system”s state. More basic knowledge about BN can be obtained (Atoui et al., 2015). To solve the problem of high false alarm rate (FAR), the probability limit proposed by Atoui et al. (2019) is introduced.

    View all citing articles on Scopus
    View full text