An improved PCA scheme for sensor FDI: Application to an air quality monitoring network

doi:10.1016/j.jprocont.2005.09.007

Journal of Process Control

Volume 16, Issue 6, July 2006, Pages 625-634

https://doi.org/10.1016/j.jprocont.2005.09.007 Get rights and content

Abstract

In this paper a sensor fault detection and isolation procedure based on principal component analysis (PCA) is proposed to monitor an air quality monitoring network. The PCA model of the network is optimal with respect to a reconstruction error criterion. The sensor fault detection is carried out in various residual subspaces using a new detection index. For our application, this index improves the performance compared to classical detection index SPE. The reconstruction approach allows, on one hand, to isolate the faulty sensors and, on the other hand, to estimate the fault amplitudes.

Introduction

Many human activities produce primary pollutants like nitrogen oxides (NO₂ and NO) and volatile organic compounds (VOC) which formed in the lower atmosphere, by chemical or photochemical reactions, secondary pollutants like ozone. The acceptable concentrations of these pollutants, harmful for human health and the environment, are defined by European standards. Air quality monitoring networks have the following main missions: the measurement network management (recording of pollutant concentrations and a range of meteorological parameters related to pollution events) and the diffusion of data for permanent information of population and public authorities in reference to norms. To ensure these missions, the validity of the delivered information is essential before any use of measurements, mainly those of pollutants. Moreover, the geographical area monitored by a network being large, fault diagnosis procedure also enables to optimize the maintenance actions. Therefore sensor validation is an issue of great importance for the development of reliable environmental monitoring and management systems. In collaboration with AIRLOR, an air quality monitoring network (France), the aim of this work is to develop a method to perform sensor validation, mainly sensors measuring the concentrations of ozone and nitrogen oxides.

The task for sensor validation is to detect, isolate, and identify faulty sensors by examining the sensor measurements. The sensor validation is usually performed either using “outlier” detection methods which only enable to identify extreme values out of measurement range or manually by an operator. Unfortunately, this latter approach is too subjective and often impractical in real-time due to high process dimensionality which produces a large amount of collected data.

However, the availability of many sensors in the network provides valuable redundancy for fault detection and isolation because some sensor measurements are highly correlated under normal conditions. The analytical redundancy approach consists of checking the consistency between the measurements and the estimates provided by the relationships existing between the various variables of the process [3]. This analysis may lead to detect and isolate the faulty sensors. In most practical situations, fault diagnosis needs to be performed in the presence of disturbances, noise and modelling errors. These mathematical relations between the plant variables generally take two forms [4]. Analytical redundancy methods used explicit input–output model usually derived from system identification. However, in the considered sensor network, this explicit formulation of the redundancy relationships may be difficult to obtain (owing to complexity of the process and high process dimensionality) because it must be guided mainly by performance criteria of the diagnosis system. As an alternative, implicit modelling approaches, which are data-driven techniques (like principal component analysis), are particularly well adapted to reveal linear relationships among the plant variables without formulating them explicitly. Moreover the number of these relations could be determined by minimizing a criterion based on the best reconstruction of the variables with respect to the number of principal components in the PCA model [12]. PCA has some other nice features. It can handle high dimensional and correlated process variables, provides a natural solution to the errors-in-variables problem and includes disturbance decoupling [4].

The widely used detection index SPE (squared prediction error) indicates how much each sample deviates from the PCA model. Indeed SPE performs fault detection in the the residual space. However, Harkat et al. [6] have shown experimentally on real data that the SPE is sensitive to the modelling errors. Indeed modelling errors could be projected into the residual space which results in residuals with higher variance than the others. Then the SPE will be heavily biased in favour of those equations with the largest residual variance whereas the residuals with the smallest residual variances are most useful for sensor fault diagnosis because they are associated with linear relationships.

After a fault has been detected, it is important to isolate faulty sensor. In the PCA framework, the well known isolation approaches are residual enhancement, contribution plots and variable reconstruction methods [13]. The residual enhancement technique generates structured residuals that selectively respond to subsets of faults [5]. Such residuals may be obtained by algebraic transformation, or by a direct technique that involves a bank of partial PCA models [4]. However, for high dimensionality process, it is not always possible to find the algebraic transformation that enables to obtain the desired isolation properties because these properties are only defined according to the occurrence of the faults in the residuals without taking into account the sensitivities of the residuals to the faults. These comments are also true for the direct method. For fault isolation contribution plots show the contribution of each process variable to the detection statistic [8]. It is assumed that the process variable with the highest contribution is likely the faulty one. However, the contribution plots may not explicitly identify the cause of an abnormal condition [9], and sometimes lead to incorrect conclusions [13]. An alternative approach for fault isolation is the variable reconstruction method proposed by Dunia et al. [2]. It is based on the idea that the influence of fault on the detection index is eliminated when the faulty variable is reconstructed using the PCA model from the variables without defect.

In this paper, a sensor fault detection and isolation procedure based on principal component analysis is proposed to monitor an air quality monitoring network. Taking into account the high redundancy among the process variables, this procedure is based on the variable reconstruction approach in order to design the PCA model of the network, to isolate the faulty sensors and to estimate the fault amplitudes. To improve fault detection, a new fault detection index is proposed which monitors the last principal components (which have smallest variances) in different residual subspaces, starting with a single direction (the last principal direction) and gradually increasing the dimension of the residual to the full residual space. Mathematically, it is difficult to prove that the new index has better fault sensitivity than the SPE because they perform fault detection in different residual spaces where the sensitivities of the residuals to the faults may be different. Therefore experimental data set was used to evaluate the performance of the proposed index. For our application, this index improves the fault detection compared to classical detection index SPE.

First, we present the principle of PCA modelling based on the variable reconstruction approach. Then, after having summarized the principle of sensor fault detection in the PCA framework, we propose our new detection index. For isolation of the faulty sensors, we combine the proposed index and the variable reconstruction principle. In Section 4, we present the application of the proposed method to sensor fault detection and isolation of an air quality monitoring network in Lorraine. Conclusions and future works are finally presented in the last section.

Section snippets

PCA modelling

Principal component analysis is one of the most popular statistical methods, for extracting information from measured data, which finds the directions of significant variability in the data by forming linear combinations of variables.

Let us consider $x (k) = [\begin{matrix} x_{1} (k) & x_{2} (k) & \dots & x_{m} (k)^{T} \end{matrix}]$ the vector formed with m observed plant variables at time instant k. Define the data matrix $X = [\begin{matrix} x (1) & x (2) & \dots & x (N) \end{matrix}]^{T} \in R^{N \times m}$ with N samples x(k)(k = 1, … , N) which is representative of normal process operation.

PCA determines an optimal

Proposed sensor FDI scheme

The aim of the proposed FDI (Fault Detection and Isolation) scheme is to detect the faulty sensors using the index ${\bar{D}}_{i}$ and to identify the faulty sensor using the variable reconstruction approach. The measurement delivered by the faulty sensor is then reconstructed using Eq. (6).

The algorithm for implementing the proposed fault detection and isolation scheme is as follows:

(i)
Perform a standard PCA on the data matrix X; determine the model by a proper selection of the number of PC.
(ii)
Calculate ${\bar{D}}_{i} (i = 1,$

Application to an air quality monitoring network

In this section the proposed FDI scheme is applied to sensor fault detection, isolation and reconstruction of air quality monitoring network.

Conclusion

Acknowledgement

The authors would like to thank AIRLOR which made available the atmospheric pollution data used in this article.

References (14)

T. Kourti et al.
Process analysis, monitoring and diagnosis, using multivariate projection methods
Chemometrics and Intelligent Laboratory Systems
(1995)
W. Li et al.
Recursive PCA for adaptive process monitoring
Journal of Process Control
(2000)
S.J. Qin et al.
Determining the number of principal components for best reconstruction
Journal of Process Control
(2000)
G.E.P. Box
Some theorems on quadratic forms applied in the study of analysis of variance problems: Effect of inequality of variance in one-way classification
The Annals of Mathematical Statistics
(1954)
R. Dunia et al.
Identification of faulty sensors using principal component analysis
AIChE Journal
(1996)
P.M. Frank
Analytical and qualitative model-based fault diagnosis: a survey and some new results
European Journal of Control
(1993)
J. Gertler, T. McAvoy, Principal component analysis and parity relations—a strong duality, in: IFAC Symposium on Fault...

There are more references available in the full text version of this article.

Cited by (137)

Efficient GPU implementation of randomized SVD and its applications[Formula presented]
2024, Expert Systems with Applications
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation.¹
Improving kernel PCA-based algorithm for fault detection in nonlinear industrial process through fractal dimension
2023, Process Safety and Environmental Protection
Principal Component Analysis (PCA) is a widely used technique for fault detection and diagnosis. PCA works well when the data set has linear characteristics. However, most industrial processes have nonlinear characteristics in their data. Kernel PCA (KPCA) is an alternative solution for such types of data sets. This solution doesn’t come without a cost since one of KPCA’s disadvantages is a large number of observations which results in more occupied storage space and more execution time than the PCA technique. Furthermore, if the data is too large it may minimize the monitoring performance of the KPCA model. Reduced KPCA (RKPCA) is a solution for the conventional KPCA limitations. Firstly, RKPCA can deal with nonlinear characteristics without crucial problems because it is based on the KPCA algorithm with a data reduction part where it keeps most of the data’s information. Thus, by reducing the number of observations RKPCA reduces the occupied storage space and execution time while preserving tolerable monitoring performance. The proposed RKPCA algorithm consists of two parts. First, the large-sized training data set is reduced using the fractal dimension technique (correlation dimension). Afterward, the KPCA model is developed through the obtained reduced training data set. The proposed scheme is applied to the Tennessee Eastman Process and the Cement Plant Rotary Kiln data sets to evaluate its performance in comparison with other algorithms.
PCA-based Hotelling's T<sup>2</sup> chart with fast minimum covariance determinant (FMCD) estimator and kernel density estimation (KDE) for network intrusion detection
2021, Computers and Industrial Engineering
In this work, the combination between the Principal Component Analysis (PCA) and the Hotelling’s T² chart is proposed to solve problems caused by the many highly correlated network traffic features and to reduce the computational time without reducing its accuracy detection. However, a new issue arises due to the difficulty of the network traffic observations to follow the multivariate normal distribution as required in Hotelling’s T² chart. Consequently, many false alarms are found in inspecting network intrusion detection. To solve this issue, the Kernel Density Estimation (KDE) procedure is applied to obtain an optimum control limit. Also, to improve the accuracy detection, the Fast Minimum Covariance Determinant (FMCD) is employed to estimate the robust mean vector and covariance matrix. Experiments using the simulated dataset are conducted to assess the proposed chart’s performance in detecting the presence of outlier for the normal and non-normal of multivariate data. According to the simulation studies, the proposed chart yields higher accuracy and a high detection rate with a low false alarm rate. Further, the proposed Intrusion Detection System (IDS) is utilized in scanning attacks. The reputable KDD99 data is used as the benchmark to make a fair comparison between the proposed IDS and some algorithms. The monitoring outputs show that the proposed approach produces advancements in the speed of computational time with 87.42% of time efficiency. Compared to the other charts in detecting intrusion, the proposed chart produces the lower False Negative Rate (FNR). Also, compared to some classifiers the proposed chart yields a higher accuracy at about 0.9893.
Aberrant measurements: Detection, localization, suppression, acceptance and robustness
2021, Measurement: Journal of the International Measurement Confederation
The detection of outliers in a series of measurements, but even more so their location, is a necessity when these measurements are to be used in a monitoring system. This detection/localization can only be done if redundant information is available, which may be based on the model of the system on which the measurements were collected.
In some cases, however, it is not necessary to detect and locate outliers. Instead, a robust approach to their use may be preferred, one that minimizes the influence of these outliers, such as using a median rather than a mean.
In this paper, the focus will be on the notion of robustness through a few examples and notably by proposing extensions to two well-known data processing techniques (data reconciliation and principal component analysis). The numerical examples proposed clearly show how to implement these two techniques and how to use them in a system monitoring procedure.
Minor fault detection of thermocouple sensor in nuclear power plants using time series analysis
2019, Annals of Nuclear Energy
This paper addresses a scheme for detection a minor deflection of thermocouple signal reading in nuclear power plants (NPPs). These minor deflection signal does not exceed the system threshold value, therefore they are neither separated by the data classification methods nor detected by the data approximation methods. The proposed approach is based on time series analysis methods. In this approach, the minor failure signals are detected by computing the distance between the empirical cumulative distribution function of test signal and the standard signal. The moving average method is applied to smooth out the small fluctuation and noise of the signals before computing the empirical distribution. The time series mean shift detection technique is employed to the faulty signal for surety that the signal has a fault and find out its change point. The proposed method is validated by the real data from the Fast Breeder Test Reactor (FBTR).
PCA Fault Isolation Using Interval Reconstruction
2024, International Journal of Control, Automation and Systems

View all citing articles on Scopus

View full text

An improved PCA scheme for sensor FDI: Application to an air quality monitoring network

Abstract

Introduction

Section snippets

PCA modelling

Proposed sensor FDI scheme

Application to an air quality monitoring network

Conclusion

Acknowledgement

Chemometrics and Intelligent Laboratory Systems

Journal of Process Control

Journal of Process Control

Some theorems on quadratic forms applied in the study of analysis of variance problems: Effect of inequality of variance in one-way classification

The Annals of Mathematical Statistics

Identification of faulty sensors using principal component analysis

AIChE Journal

Analytical and qualitative model-based fault diagnosis: a survey and some new results

European Journal of Control