Elsevier

Applied Energy

Volume 235, 1 February 2019, Pages 117-128
Applied Energy

Design of machine learning models with domain experts for automated sensor selection for energy fault detection

https://doi.org/10.1016/j.apenergy.2018.10.107Get rights and content

Highlights

  • A method is presented to engage experts to obtain domain knowledge.

  • Virtual sensors are created from expert knowledge and sensor data.

  • Automated approach selects an optimal set of real and virtual sensors.

  • Machine learning is used to automatically build a data-driven fault detection model.

  • High fault detection performance can be achieved using only a few virtual sensors.

Abstract

Data-driven techniques that extract insights from sensor data reduce the cost of improving system energy performance through fault detection and system health monitoring. To lower cost barriers to widespread deployment, a methodology is proposed that takes advantage of existing sensor data, encodes expert knowledge about the application system to create ‘virtual sensors’, and applies statistical and mathematical methods to reduce the time required for manual configurations. The approach combines sensor data points with encoded expert knowledge that is generic to the application system but independent of a particular deployment, thereby reducing the need to tailor to individual deployments. This paper not only presents a method that detects faults from measured energy data, but also (1) describes an engagement method with experts in the energy system domain to identify data, (2) integrates domain knowledge with the data, (3) automatically selects from among the large pool of potential input data, and (4) uses machine learning to automatically build a data-driven fault detection model. Demonstration on a commercial building chiller plant shows that only a small number of virtual sensors is necessary for fault detection with high accuracy rates. This corresponds to the use of only five out of 52 original sensor data points features. With as few as four features, classification F1 scores exceed 90% on the training set and 80% on the testing set. The results are implementable and realizable using off-the-shelf tools. The goal is to design with domain experts an energy monitoring system that can be configured once and then widely deployed with little additional cost or effort.

Introduction

Renewable energy technologies as well as building energy management systems have upwards of hundreds of existing sensor data points used for control and monitoring. Furthermore, innovations in “Internet of Things” (IoT) devices have led to connected power meters, lights, occupancy sensors, and appliances that are capable of data collection and communication. This data presents a valuable opportunity to extract meaningful information and take data-driven action.

The motivation for transforming data from these devices into actionable information is to improve operations, monitor system health, increase energy generation, and decrease energy waste. The development and widespread use of energy conservation and renewable energy technologies are critical to minimizing negative environmental consequences. To that end, increasing profitability for users and decreasing costs of these technologies enables market penetration and widespread adoption. On the energy demand side, commercial buildings consume 19% of US primary energy [1]. Of this, an estimated 15–30% of energy used in commercial buildings is wasted by poorly maintained, degraded, and improperly controlled equipment [2].

However, one cannot achieve scalable deployments of analytics and applications across systems if deploying solutions requires vendors and domain experts to install sensors and information technology infrastructures that require tailoring each solution for each deployment. Today, even well-established commercial offerings are not deployed at scale because costs are prohibitive. Thus, a major challenge to scalability is reducing hardware and software installation costs, manual configuration requirements, and manual monitoring.

To address this challenge, this paper describes a novel framework for fault detection that combines existing data from control systems with expert knowledge that is general to the type of energy system to create new virtual sensors. These are ‘virtual’ in the sense that they do not correspond to values taken directly off a physical sensor but are calculated from values taken from physical sensor(s). The use of expert knowledge that is generic to the energy system application but independent of individual instances reduces the need for configuration to individual deployments. Techniques from statistics and machine learning automate the process of selecting inputs to the fault detection system. The input data is converted into a simple form suitable for off-the-shelf machine learning algorithms.

This work’s novelty is the process to obtain relevant domain knowledge, create and select virtual sensors, and build a data-driven fault detection system. This method is not tied to a particular type of energy system. Furthermore, the designer of this fault detection system does not require knowledge about the energy system. Deployment at a specific instance of an energy system also does not require detailed knowledge about the specification and configuration of a specific instance.

Energy systems are complex with many potential data points that can be tracked. These include signals, inputs, responses, and states related to thermodynamic, meteorological, control, mechanical, and electrical properties of the system. We present a method for selection of the most useful sensors for fault detection. The desirability of fault detection and the problem of a large quantity of potential sensors to select from are common across energy systems, be it HVAC, energy generators, distribution systems, or energy end uses. Our data driven methodology can be extended to other energy systems. People who are not experts in the energy system nor a specific instance can follow our proposed scheme to create a fault detection system.

Energy applications for machine learning include assessing solar and wind energy. Solar irradiance was predicted by applying logistic regression to important meteorological variables found through analysis with Boosted Regression Trees [3]. To estimate wind turbine power output, five models were compared based on measure-correlate-predict methods using artificial neural networks, Support Vector Machines for regression and random forests [4].

For energy consumption in buildings, various machine learning models were regressed on five commonly accessible building and climate features to estimate annual commercial building energy consumption across the US [5]. Deep learning was applied to forecast 24 h cooling loads in both a supervised manner and an unsupervised manner to extract features as input for cooling load prediction [6]. Real time building energy consumption data was predicted by combining time-series sliding window analysis with metaheuristic optimization-based machine learning [7]. Machine learning has also been used to create building control strategies by constructing approximate model predictive control laws by using multivariate regression and dimensionality reduction algorithms [8]. By learning occupant behaviour through supervised and unsupervised learning, rules were created to infer real-time room setpoints for office cooling [9]. For fault detection, machine learning frameworks have been proposed to identify and classify islanding and grid disturbance [10] and to detect faults and in wind turbines [11]. This paper generalizes and adapts [11] for fault detection in buildings.

In fault detection literature, a paper in a three part review series of fault detection and diagnostics for processes [12] describes a common set of criteria with which to compare and evaluate fault detection and diagnostic methods and provides and describes a taxonomy for the methods. There also exist review papers that survey specific categories of fault detection, such as supervisory methods, model based techniques, and trends in applications of model based techniques [13]. Another review paper surveys fault diagnostics with multivariate statistical models [14]. A further paper comments on the era of “big data” in the context of analyzing process history data [15]. An overview of analytical methods and features of methods used in commercial fault detection and diagnostics offerings for buildings is provided in [16]. It documents that rule based approaches are still common, but there is growing use of process history based approaches.

Fault detection approaches using machine learning has also been applied to HVAC systems. For fault detection in buildings, machine learning has been applied to fault diagnostics in air handling units modeled as a Bayesian probabilistic model [17] and for fault detection in chiller plants modeled as a gaussian mixture model [18]. By using physical models, state estimation techniques have been applied to identify faults by comparing the deviation of actual measurements from their values as predicted by the physical model [19]. HVAC sub-systems have also been modeled as agents in a directed graphical model that is trained on HVAC data under normal running conditions [20]. Neural networks have been employed to predict physical properties and calculated weighting factors for the neural networks to get thresholds above which faults are detected with subtractive clustering analysis used for fault diagnosis [21]. In terms of black box modeling, three different data mining techniques have been compared for outlier detection to detect faults: density-based spatial clustering of applications with noise, K-means, and classification and regression tree [22]. Grey box methods include use of Statistical Process Control (SPC) to measuring and analyze variations, Kalman filtering to provide predictions and determine SPC control limits, and system analysis for fault propagation across subsystems [23]. Rule based techniques include using expert rules to create a cloud-based expert-rule based fault detection system [24].

In contrast to these studies, this paper not only presents a method that detects faults from measured energy data, but also (1) describes an engagement method with experts in the energy system domain to identify data, (2) integrates domain knowledge with the data, (3) automatically selects from among the large pool of potential input data, and (4) uses machine learning to automatically build a data-driven fault detection model. This scheme is not specific to HVAC systems but has also been demonstrated on wind turbines [11]. Most previous works rely on experts to directly select data input and design the fault detection system, for example, by specifying rules or deciding how to model the system being monitored. In contrast, we present a methodology that automates the data selection and fault detection design.

One challenge with data-intensive energy monitoring systems is the implementation and processing costs associated with the large number of sensors required. This paper describes a framework that combines expert knowledge about an application with data readily available from an existing control system. Automated tools are used to identify data points relevant for health monitoring, and then machine learning to identify the most effective model parameters for configuring the system design. This procedure is demonstrated on developing a health monitoring system for fault detection in a commercial building HVAC system.

The additional expert knowledge features are based on (1) an understanding of the measured physical values and performance metrics, (2) the time series behaviour of the sensor measurements, and (3) statistical features. Feature selection methods are then applied to this expanded feature set to select the most important overall features and to validate whether the new features selected are more useful for prediction than the original ones. In the health monitoring field and in application domains, variable and feature selection is often obtained from expert judgement. In the machine learning field, variable and feature selection commonly involves using mathematical and statistical techniques, as summarized by Guyon and Elisseeff [25] and Li et al. [26], which will be the process employed in this paper.

Next, machine learning models are trained to predict faults using both the expanded feature set and the original data set from the control systems to predict faults. The data from the control system is collected from the supervisory control and data acquisition (SCADA) system, the industrial computer system used for automation and control.

This paper is organised as follows: Section 2 describes the proposed methodology. Section 3 describes a case study to design a fault detection system for a commercial building chiller plant. Section 4 applies the methodology to the case study. Section 5 reports the experimental results. Section 6 summarises the conclusions.

Section snippets

Overview

A data-driven scheme is presented for generalized fault detection using a combination of encoded expert knowledge and statistically derived features from sensor data. The proposed method, illustrated in Fig. 1, uses machine learning and statistical techniques to optimally chose a subset of sensor data from existing data in the system. This sensor data is supplemented with expert engineering knowledge that is generic to the class of equipment. The method is applied to a chiller plant that

Sensor data collection

The methodology described in Section 2 is demonstrated on the design of a fault detection system for a chiller plant. Data for normal operations and for operations under fault conditions was generated using a physics-based model of a commercial building chiller plant. It is undesirable to introduce faults in the real plant to collect measured data because introducing faults would interfere with normal operations of the facility and damage the expensive equipment. The physics-based model was

Engineering knowledge features

The original features in the Building Automation System (BAS) are supplemented with new virtual sensor features created from the original features using knowledge of chiller plants. This knowledge of chiller plants includes knowledge of the quantities that the original features correspond to; the location of where data for those features is collected; and an understanding of the operation of the chiller plant. To collect this list of virtual sensor features, experts were interviewed in the

Selected features

Table 4 lists the top ten features as selected by the MIQ and MID schemes to classify no-fault operations and fault operations. Because the two feature selection schemes use different measures of distance, some variation in rankings between the MIQ and MID schemes is expected. There is some overlap between the features selected under the MIQ and MID schemes.

In the feature selection process, the new derived features are favoured more than the original features. Under the MIQ scheme, all of the

Conclusion

When trying to extract meaningful information from existing data and deployed sensors, system designers may be faced with an overwhelming number of data points. This paper presents a data-driven approach to designing and configuring energy monitoring systems. The proposed approach automatically selects data to input into energy monitoring systems using statistical methods. A method based on information theory is used to recommend features that are strong predictors of a fault, are highly

Acknowledgement

The authors would like to thank Rongxin Yin, Thierry Stephane Nouidui, Kevin Leahy, and Ioannis Konstantakopoulos.

Lawrence Berkeley National Laboratory’s contribution to this work was supported by the Assistant Secretary for Energy Efficiency and Renewable Energy, Building Technologies Office, of the U.S. Department of Energy under Contract No. DE- AC02- 05CH11231.

References (32)

  • Z. Du et al.

    Fault detection and diagnosis for buildings and {HVAC} systems using combined neural networks and subtractive clustering analysis

    Build Environ

    (2014)
  • I. Khan et al.

    Fault detection analysis of building energy consumption using data mining techniques

    Energy Procedia

    (2013)
  • K. Bruton et al.

    Development and alpha testing of a cloud based automated fault detection and diagnosis tool for air handling units

    Automat Constr

    (2014)
  • U.E.I. Administration. Annual Energy Outlook 2013. U.S. Energy Information Administration;...
  • S. Katipamula et al.

    Review article: methods for fault detection, diagnostics, and prognostics for building systems – a review, part i

    HVACI&R Res

    (2005)
  • S. Daz et al.

    Performance assessment of five mcp models proposed for the estimation of long-term wind turbine power outputs at a target site using three machine learning techniques

    Appl Energy

    (2018)
  • Cited by (59)

    View all citing articles on Scopus
    1

    Work done while at University of California, Berkeley and at Lawrence Berkeley National Laboratory.

    View full text