Next Article in Journal
Interpreting Stroke-Impaired Electromyography Patterns through Explainable Artificial Intelligence
Previous Article in Journal
Convolutional Neural Networks for Raw Signal Classification in CNC Turning Process Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fuzzy Clustering-Based Deep Learning for Short-Term Load Forecasting in Power Grid Systems Using Time-Varying and Time-Invariant Features

1
School of Electrical Engineering, Computing and Mathematics Sciences, Curtin University, Bentley, WA 6102, Australia
2
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(5), 1391; https://doi.org/10.3390/s24051391
Submission received: 17 January 2024 / Revised: 5 February 2024 / Accepted: 19 February 2024 / Published: 21 February 2024
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Accurate short-term load forecasting (STLF) is essential for power grid systems to ensure reliability, security and cost efficiency. Thanks to advanced smart sensor technologies, time-series data related to power load can be captured for STLF. Recent research shows that deep neural networks (DNNs) are capable of achieving accurate STLP since they are effective in predicting nonlinear and complicated time-series data. To perform STLP, existing DNNs use time-varying dynamics of either past load consumption or past power correlated features such as weather, meteorology or date. However, the existing DNN approaches do not use the time-invariant features of users, such as building spaces, ages, isolation material, number of building floors or building purposes, to enhance STLF. In fact, those time-invariant features are correlated to user load consumption. Integrating time-invariant features enhances STLF. In this paper, a fuzzy clustering-based DNN is proposed by using both time-varying and time-invariant features to perform STLF. The fuzzy clustering first groups users with similar time-invariant behaviours. DNN models are then developed using past time-varying features. Since the time-invariant features have already been learned by the fuzzy clustering, the DNN model does not need to learn the time-invariant features; therefore, a simpler DNN model can be generated. In addition, the DNN model only learns the time-varying features of users in the same cluster; a more effective learning can be performed by the DNN and more accurate predictions can be achieved. The performance of the proposed fuzzy clustering-based DNN is evaluated by performing STLF, where both time-varying features and time-invariant features are included. Experimental results show that the proposed fuzzy clustering-based DNN outperforms the commonly used long short-term memory networks and convolution neural networks.

1. Introduction

Power grid systems supply power loads for millions of users which are dynamic and complex. Therefore, efficient and reliable power grid systems are essential for maintaining power stability and avoiding power system outages and supply user load demands without power interruptions [1,2]. A sufficient power utilization scheme with accurate short-term load forecasting (STLF) is necessary for application on power grid systems [3,4,5]. One percent of forecasting error can cause operation losses of 10 million or more [6]. Since 40% of electrical power is supplied to buildings through the power grid system, an accurate STLF benefits all stakeholders of the energy market and results in substantial savings for users [7]. An accurate STLP also contributes significant savings economically and also ensures power grid reliability and security [8]. Accurate forecasting is essential for the system controller to maintain grid system stability [9,10,11,12,13].
To perform STLF, physics-based models consisting of system equations can be used. Those physics-based models can be used to explicitly illustrate the system dynamics. However, developing those physics-based models requires extensive knowledge of the internal components of systems or buildings which are related to power consumption. Data-driven models are developed by data. Developing such models does not require extensive knowledge relating to systems or buildings. Thanks to advanced smart sensor technologies, smart meters can be used to capture the loads consumed by users in real time. Smart sensors can be used to capture weather information such as temperature, wind speed/direction and sea level pressure, which are correlated with load consumption. This time-series data is captured in real time in order to perform accurate and reliable STLF [14,15]. Among those data-driven models, deep neural networks (DNNs) have commonly been used, since DNNs consist of complex, multiple-neuron layers which are effective for modelling nonlinear and chaotic load consumption data [16]. Recent DNN techniques for STLP can be classified into two categories, (i) single time-varying feature, where the DNN uses past load consumption to predict future load consumption, and (ii) multi-time-varying features, where the DNN uses past dynamic information such as past weather conditions, past meteorology information, past seasonal and calendar information to predict future load consumption, despite using past load consumption.
For the single time-varying feature, the DNNs forecast future load consumption using past load consumption. A long short-term memory (LSTM)-based DNN was developed using past load consumption sequences of appliances in order to forecast future load consumption [17]. Peng et al. [18] applied linear regression and LSTM to forecast future load consumption using past load consumption. Hafeez et al. [19] proposed a Boltzmann machine-based DNN to predict future load consumption using past load consumption. Aly et al. [20] proposed a clustering technique which used past load consumption in order to classify the future load consumption demands of users. Based on past power demands, various models have been developed to predict load consumption for various users. Rafati et al. [21] proposed a dense neural network to model the nonlinear and dynamic characteristics of past electrical load in order to predict future load consumption. Sekhar et al. [22] proposed a hybrid DNN by combining LSTM and a convolution neural network (CNN) to perform load prediction using past load information. Hybrid DNNs based on CNN, LSTM and decision tree have been proposed by Wan et al. [23] and Massaoudi et al. [24] to improve prediction accuracy. Tavassoli-Hojati et al. [25] proposed a self-partitioning local neuro-fuzzy model, where the model is trained by analysing both the linear and nonlinear characteristics of past load time-series features. Wei et al. [26] proposed a decomposition algorithm based on detrend singular spectrum fluctuation analysis to extract the trend and periodic components in past load data. An LSTM was trained with the extracted components. Yang et al. [27] proposed a decomposition approach to extract the time-series components of past load consumption. The decomposition approach captures useful past load consumption components to train DNNs.
For the multi-time-varying features, DNN models are developed by correlating future load consumption with past load consumption and past dynamic information such as past seasonal time information, past weather or meteorological conditions. Liang et al. [28] developed a hybrid DNN based on empirical mode decomposition and a regression neural network; the features used in the DNN included past temperature, past meteorology conditions and past load consumption. Ahmad et al. [29] proposed a novel DNN which included the features of past load information and past meteorology conditions. Kwon et al. [30] proposed a DNN where both past weather information and past load consumption were used as the DNN inputs. An adaptive neuro fuzzy inference system was proposed to predict the future load consumption of the Rajasthan region of India using past load consumption and past acute climatic conditions [31]. Zor et al. [32] proposed a DNN where the DNN inputs were based on past load consumption and past meteorological variables at a large hospital in the eastern Mediterranean. Eseye et al. [33] developed a hybrid machine-learning technique where the features included past weather, past load consumption, past seasonality and calendar information. Eseye et al. [34] proposed a novel feature selection based on a genetic algorithm to select significant features to improve load consumption forecasting accuracy. Hu et al. [35] proposed a back propagation-based neural network to predict the load consumption of the process industry where past load consumption, past production planning information and past humidity were used as DNN inputs. Yaprakdal et al. [36] proposed a feedforward neural network to predict the future load consumption, where the time-varying features included past load consumption, past temperature, past direct horizontal radiation and past diffuse horizontal radiation. Tziolis et al. [37] proposed a Bayesian neural network model where time-varying features such as past load consumption, past humidity, past dew point temperature, past horizontal irradiance and past wind speed were used as the network inputs.
The aforementioned DNN models only use the dynamics of single time-varying features or multi-time-varying features in order to forecast future load consumption. They use those dynamic features as the DNN inputs. They do not use the static information of the time-invariant features such as year built, building spaces, number of person in the building or building purposes. In fact, these time-invariant features are related to load consumption. When both time-invariant and time-varying features are used as the DNN inputs, more information is available for the DNNs to perform STLF; therefore, more accurate predictions are likely to be achieved. For example, building ages relate to load consumption [38]. Newer buildings consume less energy since they are constructed with strong isolation material. Older buildings consume more energy since the isolation material is generally poorer than in new buildings. More electricity for heaters or air-conditioners is consumed. As another example, more electricity is used for a larger building space, while less electricity is consumed for a smaller building. Hence, building space correlates to load consumption [39]. Buildings with more users consume more energy; less energy is consumed for buildings with a smaller number of users [40,41]. Occupant characteristics such as age, education, income and residency length are also correlated to load consumption [42,43]. On the contrary, building purposes relate to power consumption. Commercial or industrial buildings use more energy; resident buildings use less energy [44]. When more correlated features are included, more accurate predictions are likely to be achieved. Therefore, we can use time-invariant features to improve STLF since time-invariant features are also correlated to load consumption.
In this paper, a fuzzy clustering-based DNN is proposed by using both time-varying and time-invariant features to perform STLF. Clusters are generated to classify users with respect to time-invariant features, where the fuzzy c-means algorithm [45] is used since this algorithm is commonly used to cluster samples with time-invariant features [46]. Each cluster groups old users with similar time-invariant features which address the static information. Various DNNs are developed by the time-varying features of old users in the corresponding clusters, which have similar time-invariant features. The time-varying features of users in the same cluster are shared and are used to develop a DNN model for this particular cluster. Since the time-invariant features are already used to cluster users, the DNN model does not need to include the time-invariant features and the model is simpler. In addition, the DNN model only needs to learn time-varying features and it predicts time-varying dynamics for users in the same cluster, which has similar time-invariant features. Therefore, more accurate predictions of time-varying dynamics are likely to be achieved by the proposed model, compared to the commonly used DNN models, which need to address time-varying dynamics for all users. The proposed fuzzy clustering-based DNN is integrated with an LSTM and a CNN which is commonly used for STLF when time-varying features are used [17,28,29,30,31,33,34,36]. The performance of the proposed fuzzy clustering-based DNN was evaluated by Miller’s data [47,48], which includes both time-varying features such as load consumption, air temperature and wind speed and invariant time features such as building size and floor count. Experimental results show that more accurate forecasting can be achieved by the proposed fuzzy clustering-based DNN to predict the load consumption of new users when the data of new users is not available to train the DNN.
The main contributions of this research article are listed below.
(1)
To perform STLF, the existing approaches only use time-varying dynamics such as past load consumption or past power correlated features [46,49,50,51,52,53,54]. No existing approach uses time-invariant features such as building spaces or building age to perform STLF. A novel approach is proposed in this paper to incorporate both time-varying and time-invariant features in order to improve STLF accuracy.
(2)
A novel STLF approach, namely fuzzy clustering-based DNN, is proposed by incorporating fuzzy clustering and deep learning. The fuzzy clustering addresses time-invariant features and the deep learning addresses time-varying features. This incorporation improves existing DNN models, which only address time-varying features.
(3)
The proposed fuzzy clustering-based DNN is evaluated by Miller’s dataset [47,48], which is used for evaluating load consumption predictors. The datasets are involved with both time-invariant and time-varying features. The results demonstrate that better STLF can be achieved by the proposed fuzzy clustering-based DNN.
(4)
To evaluate the prediction performance of the proposed fuzzy clustering-based DNN, its prediction performance is compared with some recently published STLF approaches.
The rest of the article is structured as follows: Section 2 describes the purposes of STLF and describes how a DNN model can be developed for STLF. Section 3 describes the mechanism of the proposed fuzzy clustering-based DNN. It also describes how the fuzzy clustering addresses the time-invariant features and the DNN model addresses the time-varying features. Section 4 shows the load consumption data, which is used for evaluating the proposed method; it shows how the proposed method is implemented, and the prediction results are also shown, compared with other existing methods. A conclusion is drawn in Section 5.

2. Load Consumption Forecasting

The STLF performed by the DNN model is given as (1):
x ^ ( t + m ) = Θ F ¯ ( t p , t ) , W + e ( t + m )
where
F ¯ ( t p , t ) = x ( t ) , y ¯ ( t ) , x ( t 1 ) , y ¯ ( t 1 ) , , x ( t p ) , y ¯ ( t p ) y ¯ ( t k ) = y 1 ( t k ) , y 2 ( t k ) , , y N ( t k ) with k = 0 , 1 , , p
In (1), the DNN model, Θ , forecasts future load consumption, x ^ ( t + m ) , with m time samples ahead. W is the parameter set of Θ , which needs to be optimized with respect to the prediction accuracy. e ( t + m ) is the noise residual at time ( t + m ) . F ¯ ( t p , t ) in (2) is the past information set, which is windowed by a time series between the current time, t, to the past, p, samples of time. y ¯ ( t k ) with k = 0 , 1 , , p denotes the forecasting feature vector which contains the i t h forecasting feature, y i ( t k ) with i = 1 , 2 , , N , such as past weather information, past climate information, past seasonal information, user information and building information. x ( t k ) is the past load consumption. Both x ( t k ) and y ¯ ( t k ) are correlated to the future load consumption. Therefore, F ¯ ( t p , t ) , containing both x ( t k ) and y ¯ ( t k ) , is used to forecast x ^ ( t + m ) .
To optimize Θ , W is determined by the training dataset collected from the M existing users, namely D = [ d ( 1 ) , d ( 2 ) , , d ( M ) ] , where d ( i ) in (3) is the data collected for the i t h user with i = 1 , 2 , , M , which contains n samples of past load consumption and the past information set.
d ( i ) = F ¯ ( i ) ( t p j , t j ) , x ( i ) ( t + m j ) with j = 0 , 1 , , ( n 1 )
where F ¯ ( i ) ( t p j , t j ) is the past information set windowed with time ( t p j ) to ( t j ) for the i t h user; x ( i ) ( t + m j ) is the load consumption at time ( t + m j ) for the i t h user. F ¯ ( i ) ( t p j , t j ) is further written as:
F ¯ ( i ) ( t p j , t j ) = { x ( i ) ( t k j ) , y 1 ( i ) ( t k j ) , y 2 ( i ) ( t k j ) , , y N ( i ) ( t k j ) with k = 0 , 1 , , p }
which contains the past load consumption and past forecasting features within the time window between ( t p j ) and ( t j ) for the i t h user.
Based on the past information set and the load consumption in D, W in Θ can be determined by solving the optimization problem in (5).
min W i = 1 M j = 0 ( n 1 ) Θ ( F ¯ ( i ) ( t p j , t ) , W ) x ( i ) ( t + m j ) 2
The forecasting framework is shown in Figure 1. The DNN model, Θ , is developed by the training dataset, D, which contains the data from the M users, d ( 1 ) , d ( 2 ) , , d ( M ) . Some past information features are time-invariant, such as building spaces, year built, number of building floors and building purposes. Those time-invariant features are related to load consumption for new users. We can use those time-invariant features to improve the prediction accuracy for new users. For example, a larger building space uses more electricity, while less electricity is consumed with a smaller building. In addition, building age correlates with energy consumption, since older buildings are mostly constructed with older material which has less isolation capability. Hence, more energy is required to warm or cool buildings during winters or summers. For modern buildings, better isolation material is used and less energy is consumed. Furthermore, building purposes are related to user behaviours regarding power consumption. Residential buildings use more energy at night time and less energy at day time. On the contrary, commercial or industrial buildings use more energy at day time and less energy at night time. Therefore, building space, building age and building purpose are time-invariant features which correlate to load consumption. Section 3 discusses how time-invariant features are used to improve STLF.

3. Fuzzy Clustering-Based Deep Learning Model

All forecasting features, y i ( t ) y ¯ ( t ) in (2), with  i = 1 , 2 , , N are divided by two sets of features in (6), namely time-invariant features, y ¯ I , and time-varying features, y ¯ v ( t ) , where C is the number of time-invariant features and ( N C ) is the number of time-varying features. All elements in y ¯ I are constants since they are time-invariant.
y ¯ ( t ) = y ¯ I , y ¯ V ( t ) with y ¯ I = y 1 , y 2 , , y C y ¯ V ( t ) = y C + 1 ( t ) , y C + 2 ( t ) , , y N ( t )
Given that the first C features are time-invariant constants, the past information of the i t h user in (4) can be rewritten as:
F ¯ ( i ) ( t p , t ) = { x ( i ) ( t k ) , y 1 ( i ) , y 2 ( i ) , , y C ( i ) , y ( C + 1 ) ( i ) ( t k ) , y ( C + 2 ) ( i ) ( t k ) , , y N ( i ) ( t k ) with k = 0 , 1 , , p }
Substituting y ¯ I ( i ) and y ¯ V ( i ) ( t k ) into (7), F ¯ ( i ) ( t p , t ) can be rewritten as:
F ¯ ( i ) ( t p , t ) = x ( i ) ( t k ) , y ¯ I ( i ) , y ¯ V ( i ) ( t k ) with k = 0 , 1 , , p
where the terms with the subscripts from 1 to C in (7) are the time-invariant feature data for the i t h user. The terms are included in a vector, y ¯ I ( i ) = y 1 ( i ) , y 2 ( i ) , , y C ( i ) ; those from ( C + 1 ) to N are the time-varying feature data, which is written as a vector, y ¯ V ( i ) ( t k ) = y C + 1 ( i ) ( t k ) , y C + 2 ( i ) ( t k ) , , y N ( i ) ( t k ) with k = 0 , 1 , , p .
The time-varying set for the i t h user is grouped as:
Y ¯ V ( i ) = y ¯ V ( i ) ( t ) , y ¯ V ( i ) ( t 1 ) , , y ¯ V ( i ) ( t p )
In this section, a fuzzy clustering-based DNN model is proposed to forecast the load consumption of new users. Clusters are generated to classify users with respect to time-invariant features using the time-invariant vector y ¯ I ( i ) with i = 1 , 2 , , M . Each cluster is grouped with users with similar time-invariant features. Each DNN model is developed by time-varying sets for users in the same cluster, which have similar time-invariant features. Hence, all Y ¯ V ( i ) in the same cluster are used to develop a DNN model. The time-varying features in the same cluster are shared and are used to develop the DNN model.
Since the time-invariant features are already used to cluster users, the DNN model does not need to include the time-invariant features and the model only uses the time-varying features to forecast future load consumption; therefore, a simpler model can be generated. In addition, the model only needs to learn the time-varying features and predict time-varying dynamics in the clusters which have similar time-invariant features. Therefore, the learning is simpler and more accurate predictions of time-varying dynamics are likely to be achieved by the proposed model, compared to the commonly used DNN models which address both the time-varying and time-invariant dynamics. Section 3.1 discusses the clustering method for classifying users based on time-invariant features. Section 3.2 discusses the deep-learning models based on time-varying features to forecast future load consumption.

3.1. Clustering of Time-Invariant Features

When the time-invariant vectors of all users are given, clusters can be generated to classify users which have similar behaviours of using electrical power. Given that we have N c clusters with 2 N c M , we determine which cluster the i t h user belongs to, where i = 1 , 2 , , M . Here, u ^ k ( y ¯ I ( i ) ) in (10) is defined as the membership of the i t h user to the k t h cluster, where k = 1 , 2 , . . , N C . The membership indicates how much y ¯ I ( i ) belongs to the k t h cluster. If  u ^ k ( y ¯ I ( i ) ) is large, the  i t h user has a similar behaviour to the users in the k t h cluster. Therefore, y ¯ I ( i ) is in the k t h cluster if u ^ k ( y ¯ I ( i ) ) > u ^ j ( y ¯ I ( i ) ) for all k j 1 , 2 , , N c .
u ^ k ( y ¯ I ( i ) ) = j = 1 N c d i k d i j 2 ( m f 1 ) 1
where d i k is the A ¯ norm distance between y ¯ I ( i ) and the k t h cluster centre, and  m f is the weighting exponent with 1 m f < . d i k is given as:
d i k = | | y ¯ I ( i ) v ¯ k | | A ¯ 2 = ( y ¯ I ( i ) v ¯ k ) T A ¯ ( y ¯ I ( i ) v ¯ k )
where A ¯ is a positive definite n × n weight matrix and v ¯ k denotes the centre of the k t h cluster, which is given by:
v ¯ k = i = 1 M u ^ k ( y ¯ I i ) m f × y ¯ I ( i ) i = 1 M u ^ k ( y ¯ I i ) m f ; with 1 k N c
To determine the cluster centres, V ¯ = ( v ¯ 1 , v ¯ 2 , , v ¯ N c ) , the generalized least-squared error in (13) is minimized for all y ¯ I ( i ) [45].
J C f u z z ( V ¯ ) = k = 1 N c i = 1 M d i k 2 × u ^ k ( y ¯ I ( i ) ) m f
In (13), u ^ k ( y ¯ I ( i ) ) is the membership function of y ¯ I ( i ) to the k t h cluster and d i k is the A ¯ -norm distance between the i t h user to the k t h cluster centre. The weight attached to each d i k is u ^ k ( y ¯ I ( i ) ) m f , which is the m f power of the y ¯ I ( i ) membership in cluster k. Therefore, minimizing (13) ensures that all users are close to their corresponding cluster centres. If  m f = 1 , J C f u z z minimizes equally to all distances. If  m f is larger, J C f u z z minimizes large distances since the power of large distances dominates other small distances.
To minimize J C f u z z ( V ¯ ) , the FCM algorithm is proposed [45]. The FCM algorithm is one of the most commonly used methods for identifying cluster centres and memberships between each sample to each cluster. Recent research shows that the FCM algorithm is an effective approach for clustering data [46,49], particularly in solving recent engineering problems such as predicting power system risks [50], bearing fault diagnosis [55], power equipment image segmentation [51], PV array fault diagnosis [52] and classifying load consumption for users [53], classifying groundwater quality [54]. Therefore, we proposed the FCM algorithm illustrated in Algorithm 1 to minimize (13) in order to determine the optimal cluster centres, V ¯ = { v ¯ 1 , v ¯ 2 , , v ¯ N c } , in (12). The fuzzy partition coefficient, V p c , indicates the clustering performance.
In the FCM algorithm, the inputs are the time-invariant features of the M users. The first two steps randomly initialize a membership matrix which indicates how much a user belongs to a cluster. Step 3 initializes the first set of cluster centres using (12). Step 5 computes the membership of a user to a cluster using (10), and it generates the membership matrix. Step 6 compares whether the membership matrix is smaller than a threshold. If the membership matrix is smaller, the fuzzy partition coefficient is computed; both the computed fuzzy partition coefficient and the computed cluster centres in Step 3 are returned as the output of the FCM algorithm. Otherwise, Step 3 computes the cluster centres and the algorithm is repeated iteratively.
Algorithm 1  Fuzzy C-Means (FCM) Algorithm
  • Input: All time-invariant vectors y ¯ I ( i ) , with  i = 1 , 2 , , M .
  • Output: The centres of the clusters, V ¯ = { v ¯ 1 , v ¯ 2 , , v ¯ N c } ; the fuzzy partition coefficient, V p c
  •   Step 1: Set the algorithmic parameters, N c , m, A ¯ , and the threshold, ϵ .
  •   Step 2: Randomly initialize a membership matrix, U ^ i t e r = [ u ^ i k i t e r ] , with 
  •             the iteration i t e r = 1 .
  •   Step 3: Compute the cluster centres V ¯ = { v ¯ 1 , v ¯ 2 , , v ¯ N c } using (12).
  •   Step 4: Set i t e r = i t e r + 1 .
  •   Step 5: Compute an updated membership matrix, U ^ i t e r = [ u ^ i k i t e r ] ,
  •             with u ^ i k i t e r = u ^ k ( y ¯ I ( i ) ) using (10).
  •   Step 6: Compare whether U ^ i t e r is higher than U ^ i t e r 1 :
  •                   If  | | U ^ i t e r U ^ i t e r 1 | | is smaller than ϵ ,
  •                         then compute the fuzzy partition coefficient, V p c ,
  •                                V p c = 1 M i = 1 N c j = 1 M ( u i j i t e r ^ ) 2
  •                         goto Step 7.
  •                   Else Set U ^ i t e r 1 as U ^ i t e r and goto Step 3.
  •   Step 7: Return The cluster centres, V ^
After the cluster centres are determined, they are used to determine the memberships to each cluster when the time-invariant vector y ¯ I ( i ) of the i t h user is given. The  i t h user belongs to the k t h cluster if the membership belonging to the k t h cluster is larger than that belonging to the j t h cluster, where
u ^ k ( y ¯ I ( i ) ) > u ^ j ( y ¯ I ( i ) )
with j k = 1 , 2 , , M , and the membership of y ¯ I ( i ) to the k t h cluster is
u ^ k ( y I ( i ) ) = i = 1 N c ( y ¯ I ( i ) v ¯ k ) T A ¯ ( y ¯ I ( i ) v ¯ k ) ( y ¯ I ( i ) v ¯ j ) T A ¯ ( y ¯ I ( i ) v ¯ j )
Each y ¯ I ( i ) belongs to one of the M clusters. The time-varying sets of all users in a single cluster are used to develop a model to predict the future load consumption. Y ¯ V ( p ( j , k ) ) with k = 1 , 2 , , O j are in the j t h cluster, where p ^ j denotes the index vector which indicates the time-invariant vectors in the j t h cluster.
p ^ j = { p ( j , 1 ) , p ( j , 2 ) , , p ( j , O j ) }
where O j is the number of elements in the j t h cluster and the p ( j , k ) t h time-invariant vector with k = 1 , 2 , , O j is in the j t h cluster. All p ( j , k ) in p ^ j are different, where 1 p ( j , k ) M . Since there are N c clusters, N c models are developed using the time-varying sets.
Fuzzy Deep Learning in Algorithm 2 illustrates how the N c models are developed, when the time-invariant vectors and time-varying sets are given. The first two steps generate N c cluster centres using the FCM in Algorithm 1. Step 3 determines the time-invariant vector belonging to each cluster, based on (15). Step 4 determines the index vector of time-varying sets to each cluster using (16). Step 5 develops the model using the time-varying sets in each cluster. Each model is developed based on the time-varying sets in the corresponding cluster. In this paper, the two commonly used deep-learning approaches, namely LSTM and CNN described in Section 3.2.1 and Section 3.2.2, are used, respectively.
Algorithm 2  Fuzzy Deep learning
  • Input: All time-invariant vector y ¯ I ( i ) and time-varying set Y ¯ V ( i ) , with  i = 1 , 2 , , M .
  • Output:  N c DNN models, Θ i , with  i = 1 , 2 , , N c , which forecasts future
  •                 load consumption.
  •   Step 1: Initialize the parameters, N c , m, A ¯ , and  | | · | | A ¯ , and the threshold, ϵ .
  •   Step 2: Generate the N c cluster centres, V ¯ = { v ¯ 1 , v ¯ 2 , , v ¯ N c } , using
  •             the FCM in Algorithm 1.
  •   Step 3: Determine the membership of y ¯ I ( i ) to the k t h cluster, u ^ k ( y I ( i ) ) using (15).
  •   Step 4: Determine the index vector, p ^ j = { p ( j , 1 ) , p ( j , 2 ) , , p ( j , O j ) } using (16),
  •             with j = 1 , 2 , , M , which indexed the time-varying sets in the j t h cluster.
  •   Step 5: Use all time-varying sets Y ¯ V p ( j , k ) with k = 1 , 2 , , O j to develop the
  •             DNN model, Θ j , using deep learning.
  •   Step 6: Return The DNN models Θ j with j = 1 , 2 , , N c .
The flow involving FCM in Algorithm 1 and Fuzzy Deep learning in Algorithm 2 is summarized in Figure 2. FCM generates the centres of the N c clusters using the time-invariant vectors; Fuzzy Deep Learning generates the N c DNN models using the time-varying sets. As aforementioned in Section 1, existing methods only use time-varying sets to develop DNN models for STLF. In fact, DNN models can be trained by both time-invariant vectors and time-varying sets, when both are available. The number of inputs in the DNN models is more than that of the proposed fuzzy clustering-based DNN, since the proposed fuzzy clustering-based DNN is only trained by the time-varying sets. Therefore, the proposed fuzzy clustering-based DNN is simpler than the existing DNN models.
After the N c cluster centres and the N c models are generated, the fuzzy clustering-based DNN in Figure 3 can be used to forecast future load consumption when the time-invariant vector and time-varying set of a new user, namely y ¯ I n e w and Y ¯ V n e w , are given. We assume that the membership of y ¯ I n e w belonging to the i t h cluster is larger than that belonging to the other clusters. The new user belongs to the i t h cluster with the cluster centre v ¯ i . The corresponding i t h model, Θ i , uses Y ¯ V n e w to predict the future load consumption, x ^ n e w ( t + m ) . If the membership is smaller than a threshold value, the DNN trained by both time-varying sets and time-invariant vectors is used. Section 3.2 describes how those models are developed.

3.2. DNNs for Predicting Time-Varying Features

Both LSTM and CNN are implemented on the proposed fuzzy clustering-based DNN since they have been developed for power forecasting when time-varying features such as past weather, load consumption, climate and meteorological variables are given [17,28,29,30,31,33,34,36].

3.2.1. Long Short-Term Memory Network

The LSTM network is suitable for time-series predictions since it benefits from long-term memory cells [56]. The LSTM network in Figure 4 is developed to forecast future load consumption, x ( t + m ) , with m time units ahead, when the past time-varying features, y ¯ V ( t p ) , y ¯ V ( t ( p 1 ) ) ,… and y ¯ V ( t ) , are given. p denotes the number of temporal lags. The LSTM network consists of N h layers: an input layer which feeds in the past time-varying features in multi-dimensions, an LSTM layer with ( p + 1 ) neurons and a dense net which determines x ( t + m ) at the last layer. Each LSTM neuron is fed with ( N C ) past time-varying features.
The LSTM nodes in Figure 4 are interconnected in order to update the neuron states with previous inputs. Each LSTM neuron has two inputs, namely previous short-term state, h ¯ t ( p i ) , j , and previous long-term state, c ¯ t ( p i ) , j , where 0 i ( p 1 ) and 1 j N h . It also has two outputs, namely future short-term state, h ¯ t ( p i ) + 1 , j , and future long-term state, c ¯ t ( p i ) + 1 , j . The LSTM neurons select some of the previous short-term state and long-term state and pass those to the later LSTM neurons. At the last layer, the dense net forecasts x ( t + m ) by combining the values of all forecasting elements in h ¯ t , N h .
Figure 5 illustrates the computations of how the LSTM neuron manipulates the previous and the future short- and long-term states. To simplify the state expression, the hidden layer index is omitted. The previous and future short-term states are denoted as h ¯ t 1 and h ¯ t , respectively; the previous and future long-term states are denoted as c ¯ t 1 and c ¯ t , respectively. The figure shows that the LSTM neuron consists of a main connected layer and three gate controller layers. The upper layer involves a control state which computes the future long-term state, c ¯ t , by analysing the current input gate, z ¯ t , previous short-term state, h ¯ t 1 , and previous long-term state, c ¯ t 1 . The lower layer involves f ¯ t with the forget gate, i ¯ t with the input gate, c ˜ t , with the input node and o ¯ t with the output gate. The LSTM states are changed by the three gate operations, such as by removing, writing or reading. The computations for f ¯ t , i ¯ t , c ˜ t , o ¯ t , c ¯ t and h ¯ t are performed by (17a) to (17f), respectively:
Forget gate : f ¯ t = σ ( W z f T × z ¯ t + W h f T × h ¯ t 1 + b ¯ f )
Input gate : i ¯ t = σ ( W z i T × z ¯ t + W h i T × h ¯ t 1 + b ¯ i )
Input node : c ˜ t = tanh ( W z c T × z ¯ t + W h c T × h ¯ t 1 + b ¯ c )
Output gate : o ¯ t = σ ( W z o T × z ¯ t + W h o T × h ¯ t 1 + b ¯ o )
Long term state : c ¯ t = f ¯ t c ¯ t 1 + i ¯ t c ˜ t 1
Short term state : h ¯ t = o ¯ t tanh c ¯ t 1
where σ denotes the logistic activation function; W z f T , W z i T , W z c T and W z o T are the weight matrices of the four gates connecting to z ¯ t ; W h f T , W h i T , W h c T and W h o T are the weight matrices of the four gates connecting to the previous short-term state h t 1 ; b f , b i , b c , and b o are the bias terms for the four gates.
The input gate and input node decide which parts of input, z ¯ t , are added to the long-term state, c ¯ t , after the forget gate, f ¯ t , stores the important part of z ¯ t which needs to be kept. The output gate generates o ¯ t , which decides which parts of z ¯ t need to be output for the current time. f ¯ t , i ¯ t and o ¯ t are the outputs of the σ function ranged from 0 to 1. c t ˜ is the output of the tanh function, which is between −1 and 1. After the input sequence is processed by the gate operations, the long-term memory, c ¯ t , and short-term memory, h ¯ t , are passed to the next or upper LSTM neurons.

3.2.2. Convolution Neural Network

Despite the LSTM, CNNs are suitable for predicting one-dimensional time-series data. Since sequential time-series data make up a one-dimensional image, a window-based convolution operation can be used to extract useful information [57]. Figure 6 illustrates the proposed CNN framework, which is a multi-head convolution network [58,59]. The framework consists of many CNN heads, which are developed for time-series prediction. The time series of each time-varying feature is processed by a CNN head. Since the time-varying features are indexed from ( C + 1 ) to N, the i t h time-varying feature within a window between t and t p , namely y ¯ I ( t , t p ) in (18), is processed by a CNN ( i C ) Head, where t is the current time and ( t p ) is the past time with p sample lag and ( C + 1 ) i N .
y ¯ i ( t , t p ) = { y i ( t ) , y i ( t 1 ) , , y i ( t p ) }
Each CNN ( i C ) -Head is responsible for capturing useful information from y ¯ i ( t , t p ) , which is correlated with the future load consumption, x ( t + m ) . Since all y ¯ i ( t , t p ) have different natures and scales, each y ¯ i ( t , t p ) can be processed independently and useful information from each feature can be captured. The individual prediction of each CNN ( i C ) -Head is gathered by a dense network in order to predict the future load consumption, x ( t + m ) .
The CNN head in Figure 7 consists of an input layer, several convolution layers, several pooling layers, a concatenate layer and a dense layer. The input layer feeds in the time-varying feature, y ¯ i ( t , t p ) . The convolution layer extracts important information from y ¯ i ( t , t p ) . Each convolution layer consists of multiple sliding windows which scan input time series. The sliding window extracts useful information from the time series by capturing repeated patterns at different regions of the time series. Since the sliding windows in the convolution layer focus on the corresponding features, useful information from each feature can be kept. An activation function is applied to the convolution output to learn the nonlinear patterns of each feature. The pooling layer is used after the convolution layer to reduce the time-series size. After several convolutions and pooling operations, the processed time series is concatenated and is passed to the dense layer. The future information is passed to the dense network at the CNN framework in Figure 6 in order to predict the future load consumption, x ( t + m ) .

4. Forecasting Performance Evaluations

This section presents the validation results obtained by the proposed fuzzy clustering-based DNNs, namely fuzzy LSTM and fuzzy CNN, which are integrated with the fuzzy clustering with the LSTM network in Section 3.2.1 and CNN in Section 3.2.2, respectively. Section 4.1 presents the load consumption data, which is used to evaluate the forecasting performance. Section 4.2 discusses how the fuzzy LSTM and fuzzy CNN are implemented. Section 4.3 presents the forecasting results.

4.1. Load Consumption Data

The performance of the proposed fuzzy LSTM and fuzzy CNN paper is evaluated by Miller’s dataset, which is used for developing load consumption predictors or for large building energy anomaly detection [47,48]. The data used for evaluating the proposed models was collected from two sites, City Building in Cardiff (City-Build) and University College London (University). The data were collected from 2016 to 2017. The data collected in 2016 are used to develop the models and those collected in 2017 are used to validate the prediction capabilities of the models. The numbers of buildings in City-Build and University are 89 and 51, respectively. Hourly meter reading data were captured from power meters installed in the two sites. Each building has one or more power meters measuring load consumption. The total hourly load consumption in a building is the sum of meter readings captured by all the installed meters. The buildings have various purposes, such as education, office and entertainment. The portions of building purposes are summarized in Table 1.
Each building has a corresponding weather data file which is recorded with hourly data for outdoor temperatures, humidity, cloud coverage and weather conditions. Those weather data influence load consumption. Those hourly weather data were collected from the National Center for Environmental Information (NCEI) National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database (ISD) (https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database). The dataset is used to develop the proposed fuzzy LSTM and fuzzy CNN, which consist of forecasting features collected from different domains, namely building, weather and calendar, and load consumption, as shown in Table 2.
In total, there are fourteen features. Two are in the building domain, seven are in the weather domain, four are in the calendar domain and one is in the load consumption domain. Both time-invariant features [39] and time-varying features [60] are correlated to load consumption.
Since some data are missing during data collection, data analysis cannot be performed by statistical or analytical tools [47]. Insertion and estimation of missing values are necessary prior to developing prediction models. Interpolations are performed to estimate missing values for the time-varying features. The missing values are inserted with the closest neighbour values. To improve the robustness of the model, data standardization is performed for each feature in the dataset. Data standardization in (19) is applied for each forecasting feature to ensure that data is internally consistent and also that the effect of outliers is reduced.
x ˜ = ( x x ¯ ) σ
where x ¯ and σ are the mean and standard derivation of the forecasting feature in the dataset, respectively.

4.2. Implementation of Forecasting Models

All algorithms are coded in python scripts and are implemented by a HP ZB 15G7 computer with 32 GB memory and a RTX 3000 GPU 6 GB card. The prediction models are all developed by the TensorFlow module. The prediction models are developed to forecast the future load consumption an hour ahead using a time window of 24 h from the current time to the past 23 h. Since an accurate amount of fossil fuel needs to be reserved hourly, this short-term prediction is necessary for fossil fuel power generators. Insufficient fossil fuel generates insufficient power to users.
The fuzzy c-mean clustering in Algorithm 1 is implemented to generate the centres of the fuzzy clusters, where the dataset of the time-invariant features are used. The fuzzy deep learning in Algorithm 2 is used to generate the prediction models, either the fuzzy LSTM or the fuzzy CNN, when the extra dataset of time-varying features is given. The threshold values of both fuzzy LSTM and fuzzy CNN are set at 0.5. After the fuzzy LSTM and fuzzy CNN models are developed by the training dataset collected in 2016, the test dataset collected in 2017 is used to validate the prediction capability of the developed models.
The proposed fuzzy LSTM in Figure 4 is implemented with 12 time-varying features. The LSTM network has an input layer, an LSTM hidden layer and a dense layer as the output layer. The input layer has 24 LSTM neurons, where each LSTM neuron is connected to a sample of the 24 h window. The second layer has 32 LSTM neurons and the last LSTM neuron generates the future prediction for the corresponding feature. A dense block is set at the output layer, where its inputs are the feature predictions and its output predicts the future load consumption.
The proposed fuzzy CNN in Figure 6 is implemented with the 12 individual CNN-Heads in Figure 7, where the input of each CNN-Head is the time-varying feature. In each CNN-Head, the first layer is the input layer, which is connected with the time series of a 24 time-sample window. The second layer is the convolution layer with 32 convolution filters, of which each filter has a window size of 3. The third layer is the pooling layer, with 32 max-pooling filters. The fourth layer consists of 32 concatenation filters, of which each concatenation filter concatenates the outputs of a max-pooling filter. The fifth layer has a single concatenation filter which concatenates the outputs of the 32 concatenation filters. The last dense layer processes the concatenated outputs and predicts the future information. Each individual CNN-Head generates the future information for each time-varying feature. The outputs from the 12 individual CNN-Heads are gathered by the dense network and the future load consumption is predicted.

4.3. Numerical Results for STLF

4.3.1. Clustering of Time-Invariant Features

The FCM in Algorithm 1 is used to determine the cluster centres of users with respect to the time-invariant features, namely Building size and Flow count, where the training dataset is used to determine the clusters. The FCM algorithm first determines the optimal number of centres, which achieves the largest fuzzy partition coefficient. The cluster centres with the largest fuzzy partition coefficient are selected and are implemented to select relevant users. The corresponding time-varying data in the same cluster are used to develop the STLF model. The numbers of clusters and the corresponding fuzzy partition coefficients are shown in Figure 8a,b for City-Building and University, respectively. Both figures show that the highest fuzzy partition coefficients can be achieved for both City-Building and University when the numbers of clusters are two. The user data in each cluster are used to develop a STLF model. Hence, two models are developed for these two clusters, respectively.
The clustering plots for City-Building and University are shown in Figure 9 and Figure 10, respectively. The figures show that the users for both City-Building and University are distributed on the relevant clusters with respect to numbers of floors and areas of the users. The figures show that the various cluster numbers from 2 to 9 are used to partition the users. For example, the top left subplot in Figure 9 shows that the users are distributed with two clusters. The two cluster centres are illustrated by two red squares. The two classes of users are illustrated with blue and yellow colours for the two clusters, respectively. Users which are closer to the lower centre are labelled with blue; users which are closer to the upper centre are labelled with yellow. Similarly, the top middle subplot shows how the users are distributed, with three clusters of each cluster having a centre with a square label. Three clusters contain users which are indexed with green, blue and yellow colours. Based on the cluster plot, we can identify which users belong to which clusters. When a new user moves to the site, the cluster plot can be used to identify which cluster contains the new user.
Figure 11 shows the clustering results of the test dataset for City-Building, and nine new users are randomly selected from the test dataset. The nine new users are classified by the clusters which are developed by the training dataset. The new users in the test dataset are not included to develop the clusters, and those new users are excluded in the training dataset. Four trials are conducted. The figures show that new users with similar numbers of floors and areas are classified on the cluster, where the cluster centre is close to the user features. Similar results can be found when clustering new users for University. Figure 12 shows that new users with similar numbers of floors and areas are closer to their corresponding centres. Training data of users in the same cluster are used to develop a single forecasting model. Since there are two clusters, two forecasting models are developed based on the user data from each cluster. When a new user is moved to University, the new user is classified into one of the clusters. The corresponding forecasting model is used to predict the future load consumption of this new user. Since existing users with similar behaviours are used to develop the model, more accurate prediction results can be achieved. Those forecasting results are presented in Section 4.3.2.

4.3.2. Load Consumption Forecasting

Figure 13a,b show the prediction results obtained by the fuzzy LSTM and fuzzy CNN, respectively on Day 8, Day 33 and Day 37 of year 2017 for building ID 684, which is one of the buildings in City-Building. The results show the predictions with different time windows from 8 to 32, from 12 to 36 and from 14 to 38. All figures show predictions of an hour ahead. We can see that predictions are close to the actual load consumption when samples of the past 24 h are used for the predictions. Figure 14a,b show the predictions obtained by the fuzzy LSTM and fuzzy CNN for the first month for building ID 684, respectively. The predictions and the actual load consumption (i.e., labels) are represented by green and red points, respectively. The results show that the predictions are generally close to the actual load consumption. In addition, the figures show that load consumption is higher at the beginning of the day, compared to that at the end of the day. The results indicate that higher load consumption is generally required during the mornings and the afternoons.
Cross validation with 20 trials is performed to evaluate the performance of the prediction models. The performance of the proposed fuzzy LSTM and fuzzy CNN are compared with the commonly used LSTM and CNN, namely non-fuzzy LSTM and non-fuzzy CNN. Unlike the proposed methods, the non-fuzzy LSTM and non-fuzzy CNN are developed by modelling both time-varying and time-invariant features, and clustering is not performed on time-invariant features. These experiments validate whether or not the prediction accuracy can be enhanced by the clustering of time-invariant features and by solely modelling time-varying features. Nine building IDs are randomly selected from the test dataset, and those selected building IDs are excluded in the training dataset to develop the prediction models. Due to the page limitation, the results of all trials cannot all be illustrated. We present the first four trials.
The prediction results for the first four trials obtained by fuzzy LSTM and fuzzy CNN for City-Building are shown in Figure 15 and Figure 16, respectively. The nine building IDs which have been used for testing are shown in the x-axis. Those nine building IDs are used for validations and they are not included to develop the prediction models. Figure 16 shows both the mean absolute errors and mean square errors for the standardized load consumption. The results show that, generally, the proposed fuzzy CNN and fuzzy LSTM are able to obtain smaller errors compared with the non-fuzzy LSTM and non-fuzzy CNN. Both non-fuzzy approaches use both time-varying and time-invariant features to perform the predictions. Hence, more accurate results can generally be achieved by the proposed methods. Similar results can be found on University in Figure 17 and Figure 18 for the proposed fuzzy LSTM and fuzzy CNN, respectively. Generally, the proposed fuzzy LSTM and fuzzy CNN are able to obtain smaller prediction errors, although the same prediction errors are obtained by some building IDs. Table 3 shows the mean MAE and MSE obtained by the proposed fuzzy models and the non-fuzzy models. It shows further that the proposed fuzzy models are able to achieve smaller MAE and MSE; hence, more accurate predictions can be achieved. For the validation, a new user is first classified to a cluster, which has similar behaviours as the new user. The load consumption of the new user is predicted based on the model which is particularly developed to the classified cluster. Therefore, more accurate predictions can generally be achieved by the proposed fuzzy models.

5. Conclusions

In this paper, a novel STLF approach, namely the fuzzy clustering-based DNN, was proposed by integrating both time-varying and time-invariant features. The proposed fuzzy clustering-based DNN overcomes the limitation of commonly used STLF approaches which only consider time-varying features and ignore time-invariant features. The proposed approach uses the fuzzy c-means algorithm to group users with similar time-invariant features. DNN models do not need to learn the time-invariant features, and each DNN model only predicts time-varying dynamics in the same cluster which has similar time-invariant features. Both time-invariant and time-varying features are considered by the proposed method. Hence, more accurate predictions are likely to be achieved by the DNN models. The performance of the proposed method was evaluated by Miller’s data captured from City-Building and University with 140 buildings, which include both time-varying features and invariant time features. The experimental results obtained by the proposed method were compared with the commonly used DNN models, including LSTM and CNN. The results showed that the proposed method outperformed the LSTM and CNN; smaller mean square and mean absolution errors were achieved by the proposed method. Particularly smaller prediction errors were achieved for new users which have not been trained by the models. This paper only implemented the proposed fuzzy clustering-based DNN to perform STLF. Long-term load dynamics are also correlated to both time-varying features and time-invariant features. As well as using the proposed fuzzy clustering-based DNN for STLF, the proposed method can also be applied to long-term load forecasting in the electricity market. We will use the Miller’s data captured in 2016 to develop the models and will use the data captured in 2017 to validate the trained models. Long-term load forecasting can be performed in each season of 2017. Along with the Miller’s data, data recently captured from smart grid systems [61,62] will also be used to develop the models. The prediction capability of the proposed method can be further validated.

Author Contributions

Conceptualization, K.F.C.Y.; Methodology, K.Y.C.; Validation, K.Y.C.; Investigation, D.K.; Writing—original draft, K.Y.C.; Writing—review & editing, K.F.C.Y. and A.A.-S. All authors have read and agreed to the published version of the manuscript.

Funding

The second author is supported by funding from Projects of Strategic Importance (Project Code: 1-ZE1Y), Faculty of Science Dean’s Reserve (Project Code: 1-ZVT5) and Hong Kong Polytechnic University grants (4-ZZPT, 1-WZ0E).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Abbreviation:
DNNDeep neural network
LSTMLong short-term memory network
CNNConvolutionary neural network
FCMFuzzy C-means algorithm
STLFShort-term load forecasting
Load consumption forecasting:
NNumber of forecasting features
CNumber of time-invariant features
( N C ) Number of time-varying features
x ( t ) Load consumption at time t
e ( t ) Noise resident at time t
x ( t p ) Past load consumption with p time sample lag
x ^ ( t + m ) Predicted future load consumption with m time samples ahead
y i ( t ) i t h forecasting feature
y ¯ ( t ) Forecasting feature vector containing all y i ( t )
F ¯ ( t p , t )         Past information set containing y ¯ ( t ) and x ( t ) from time t p to t
y ¯ I Forecasting feature vector containing time-invariant features
y ¯ I Time-invariant vector containing time-invariant features
y ¯ V ( t ) Time-varying vector containing time-varying features
Y ¯ V Time-varying set containing time-varying vectors
User data:
DTraining dataset
MNumber of users sharing the grid systems
d ( i ) Data of the i t h user
nNumber of samples in each d ( i )
y j ( i ) ( t ) j t h forecasting feature for the i t h user at time t
x ( i ) ( t p ) Past load consumption with p time sample lag for the i t h user
F ¯ ( i ) ( t p , t ) Past information set for the i t h user with the window between ( t p ) and t.
y ¯ I ( i ) Time-invariant vector containing time-invariant features for the i t h user
y ¯ V ( i ) ( t ) Time-varying vector containing time-varying features for the i t h user
Y ¯ V ( i ) Time-varying set containing time-varying vectors for the i t h user
Clustering:
N C Number of clusters
u ^ k ( y ¯ I ( i ) ) Membership of the k t h cluster with respect to the time-invariant features of
the i t h users
v ^ k Centre of the k t h cluster
V ¯ Set of cluster centres
V p c Fuzzy partition coefficient
d i k the A ¯ Norm distance between y ¯ I ( i ) and the k t h cluster centre
A ¯ Positive definite n × n weight matrix
m f Weighting exponent of the fuzzy clustering algorithm
p ^ j Index vector indicating the time-invariant vectors in the j t h cluster
O j Number of elements in the j t h cluster
Deep learning:
Θ Deep neural network model
WWeights of the DNN model
h ¯ t Short-term state of the long short-term memory network
c ¯ t Long-term states of the LSTM
f ¯ t Forget gate of the LSTM
i ¯ t Input gate of the LSTM
c ˜ t Input node of the LSTM
o ¯ t Output gate of the LSTM
CNN i -HeadCNN head for the ( C + 1 ) t h time-varying feature in the CNN framework

References

  1. Tiwari, A.; Pindoriya, N.M. Automated Demand Response in Smart Distribution Grid: A Review on Metering Infrastructure. Electr. Power Syst. Res. 2022, 206, 166. [Google Scholar] [CrossRef]
  2. Sulaiman, A.; Nagu, B.; Kaur, G.; Karuppaiah, P.; Alshahrani, H.; Reshan, M.S.A.; AlYami, S.; Shaikh, A. Artificial Intelligence-Based Secured Power Grid Protocol for Smart City. Sensors 2023, 23, 8016. [Google Scholar] [CrossRef]
  3. Godinho, G.C.; Lima, D.A. The theory of a general quantum system interacting with a linear dissipative system. Electr. Power Syst. Res. 2020, 188, 106523. [Google Scholar] [CrossRef]
  4. Giamarelos, N.; Papadimitrakis, M.; Stogiannos, M.; Zois, E.N.; Livanos, N.I.; Alexandridis, A. A machine learning model ensemble for mixed power load forecasting across multiple time horizons. Sensors 2023, 23, 5436. [Google Scholar] [CrossRef] [PubMed]
  5. Chen, Z.; Amani, A.M.; Yu, X.H.; Jalili, M. Control and Optimisation of Power Grids Using Smart Meter Data: A Review. Sensors 2023, 23, 2118. [Google Scholar] [CrossRef] [PubMed]
  6. Xiao, L.; Wang, J.; Yang, X.; Xiao, L. A hybrid model based on data preprocessing for electrical power forecasting. Electr. Power Energy Syst. 2015, 64, 311–327. [Google Scholar] [CrossRef]
  7. Saviozzi, M.; Massucco, S.; Silvestro, F. Implementation of advanced functionalities for Distribution Management Systems: Load forecasting and modeling through Artificial Neural Networks ensembles. Electr. Power Syst. Res. 2019, 167, 230–239. [Google Scholar] [CrossRef]
  8. Hou, H.; Liu, C.; Wang, Q.; Wu, X.; Tang, J.; Shi, Y.; Xie, C. Review of load forecasting based on artificial intelligence methodologies, models, and challenges. Electr. Power Syst. Res. 2022, 210, 108067. [Google Scholar] [CrossRef]
  9. Nayak, P.C.; Nayak, B.P.; Prusty, R.C.; Panda, S. Sunflower optimization based fractional order fuzzy PID controller for frequency regulation of solar-wind integrated power system with hydrogen aqua equalizer-fuel cell unit. Energy Sources Part Recover. Util. Environ. Eff. 2022. [Google Scholar] [CrossRef]
  10. Prusty, U.C.; Nayak, P.C.; Prusty, R.C.; Panda, S. An improved moth swarm algorithm based fractional order type-2 fuzzy PID controller for frequency regulation of microgrid system. Energy Sources Part Recover. Util. Environ. Eff. 2022. [Google Scholar] [CrossRef]
  11. Mishra, S.; Nayak, P.C.; Prusty, R.C.; Panda, S. Modified multiverse optimizer technique-based two degree of freedom fuzzy PID controller for frequency control of microgrid systems with hydrogen aqua electrolyzer fuel cell unit. Neural Comput. Appl. 2022, 34, 18805–18821. [Google Scholar] [CrossRef]
  12. Nayak, P.C.; Mishra, S.; Prusty, R.C.; Panda, S. Hybrid whale optimization algorithm with simulated annealing for load frequency controller design of hybrid power system. Soft Comput. 2023. [Google Scholar] [CrossRef]
  13. Nayak, P.C.; Mishra, S.; Prusty, R.C.; Panda, S. Adaptive fuzzy approach for load frequency control using hybrid moth flame pattern search optimization with real time validation. Evol. Intell. 2022. [Google Scholar] [CrossRef]
  14. Ramos, D.; Teixeira, B.; Faria, P.; Gomes, L.; Abrishambaf, O.; Vale, Z. Use of sensors and analyzers data for load forecasting: A two stage approach. Sensors 2020, 20, 3524. [Google Scholar] [CrossRef] [PubMed]
  15. Pirbazari, A.M.; Farmanbar, M.; Chakravorty, A.; Rong, C. Short-term load forecasting using smart meter data: A generalization analysis. Processes 2020, 8, 484. [Google Scholar] [CrossRef]
  16. Aslam, S.; Herodotou, H.; Mohsin, S.M.; Javaid, N.; Ashraf, N.; Aslam, S. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids. Renew. Sustain. Energy Rev. 2021, 144, 110992. [Google Scholar] [CrossRef]
  17. Kong, W.; Dong, Z.Y.; Hill, D.J.; Luo, F.; Xu, Y. Short-term residential load forecasting based on resident behaviour learning. IEEE Trans. Power Syst. 2018, 33, 1087–1088. [Google Scholar] [CrossRef]
  18. Peng, Y.; Wang, Y.; Lu, X.; Li, H.; Shi, D.; Wang, Z.; Li, J. Short-term load forecasting at different aggregation levels with predictability analysis. In Proceedings of the IEEE Innovative Smart Grid Technologies-Asia, Chengdu, China, 21–24 May 2019; pp. 3385–3390. [Google Scholar]
  19. Hashemi, S.E.; Gholian-Jouybari, F.; Hajiaghaei-Keshteli, M. Electric load forecasting based on deep learning and optimized by heuristic algorithm in smart grid. Appl. Energy 2023, 269, 114915. [Google Scholar]
  20. Aly, H.H. A proposed intelligent short-term load forecasting hybrid models of ANN, WNN and KF based on clustering techniques for smart grid. Electr. Power Syst. Res. 2020, 182, 106191. [Google Scholar] [CrossRef]
  21. Rafati, A.; Joorabian, M.; Mashhour, E. An efficient hour-ahead electrical load forecasting method based on innovative features. Energy 2020, 201, 117511. [Google Scholar] [CrossRef]
  22. Sekhar, C.; Dahiya, R. Robust framework based on hybrid deep learning approach for short term load forecasting of building electricity demand. Energy 2023, 268, 12660. [Google Scholar] [CrossRef]
  23. Wan, A.P.; Chang, Q.; AL-Bukhaiti, K.; He, J.B. Short-term power load forecasting for combined heat and power using CNN-LSTM enhanced by attention mechanism. Energy 2023, 282, 128274. [Google Scholar] [CrossRef]
  24. Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A novel stacked generalization ensemble-based hybrid LGBM-XGBMLP model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
  25. Tavassoli-Hojati, Z.; Ghaderi, S.; Iranmanesh, H.; Hilber, P.; Shayesteh, E. A self-partitioning local neuro fuzzy model for short-term load forecasting in smart grids. Energy 2020, 199, 117514. [Google Scholar] [CrossRef]
  26. Wei, N.; Yin, L.H.; Li, C.; Wang, W.; Qiao, W.B.; Li, C.J.; Zeng, F.H.; Fu, L. Short-term load forecasting using detrend singular spectrum fluctuation analysis. Energy 2022, 256, 124722. [Google Scholar] [CrossRef]
  27. Yang, D.C.; Guo, J.; Li, Y.Z.; Sun, S.L.; Wang, S.Y. Short-term load forecasting with an improved dynamic decomposition-reconstruction-ensemble approach. Energy 2023, 263, 125609. [Google Scholar] [CrossRef]
  28. Liang, Y.; Niu, D.X.; Hong, W.C. Short term load forecasting based on feature extraction and improved general regression neural network model. Energy 2019, 166, 653–663. [Google Scholar] [CrossRef]
  29. Ahmad, A.; Javaid, N.; Mateen, A.; Awais, M.; Khan, Z.A. Short-term load forecasting in smart grids: An intelligent modular approach. Energies 2019, 12, 164. [Google Scholar] [CrossRef]
  30. Kwon, B.S.; Park, R.J.; Song, K.B. Short-term load forecasting based on deep neural networks using LSTM layer. J. Electr. Eng. Technol. 2020, 15, 1501–1509. [Google Scholar] [CrossRef]
  31. Rathor, R.D.; Bharagava, A. Day ahead regional electrical load forecasting using ANFIS techniques. J. Inst. Eng. Ser. B 2020, 101, 475–495. [Google Scholar] [CrossRef]
  32. Zor, K.; Çelik, O.; Timur, O.; Teke, A. Short-term building electrical energy consumption forecasting by employing gene expression programming and GMDH networks. Emergies 2020, 13, 1102. [Google Scholar] [CrossRef]
  33. Fay, D.; Ringwood, J.V. Short-Term Forecasting of Heat Demand of Buildings for Efficient and Optimal Energy Management Based on Integrated Machine Learning Models. IEEE Trans. Ind. Inform. 2020, 16, 7743–7755. [Google Scholar]
  34. Eseye, A.T.; Lehtonen, M.; Tukia, T.; Uimonen, S.; Millar, R.J. Machine learning based integrated feature selection approach for improved electricity demand forecasting in decentralized energy systems. IEEE Access 2019, 7, 91463–91475. [Google Scholar] [CrossRef]
  35. Hu, Y.H.; Li, J.G.; Hong, M.N.; Ren, J.Z.; Lin, R.J.; Liu, Y.; Liu, M.G.; Man, Y. Short term electric load forecasting model and its verification for process industrial enterprises based on hybrid GA-PSO-BPNN algorithmd: A case study of papermaking process. Energy 2019, 170, 1215–1227. [Google Scholar] [CrossRef]
  36. Yaprakdal, F. An Ensemble Deep-Learning-Based Model for Hour-Ahead Load Forecasting with a Feature Selection Approach: A Comparative Study with State-of-the-Art Methods. Energies 2022, 16, 57. [Google Scholar] [CrossRef]
  37. Tziolis, G.; Spanias, C.; Theodoride, M.; Theocharides, S.; Lopez-Lorente, J.; Livera, A.; Makrides, G.; Georghiou, G.E. Short-term electric net load forecasting for solar-integrated distribution systems based on Bayesian neural networks and statistical post-processing. Energy 2023, 271, 127018. [Google Scholar] [CrossRef]
  38. Aksoezen, M.; Daniel, M.; Hassler, U.; Kohle, N. Building age as an indicator for energy consumption. Energy Build. 2015, 87, 74–86. [Google Scholar] [CrossRef]
  39. Taheri, M.; Rastogi, P.; Parry, C.; Wegienka1, A. Energy and policy considerations for deep learning in NLP. In Proceedings of the 16th International Conference on International Building Performance Simulation Association, Rome, Italy, 2–4 September 2019; pp. 3863–3870. [Google Scholar]
  40. CSIRO. CSIRO Energise Insight: Household Types and Energy Use; Technical Report; CSIRO: Canberra, Australia, 2018. [Google Scholar]
  41. Frontier Economics Pty Ltd. Final Report for the Australian Energy Regulator; Technical Report; Frontier Economics Pty Ltd.: Melbourne, Australia, 2020. [Google Scholar]
  42. Xu, X.X.; Xiao, B.; Li, C.Z.D. Critical factors of electricity consumption in residential buildings: An analysis from the point of occupant characteristics view. J. Clean. Prod. 2020, 256, 120423. [Google Scholar] [CrossRef]
  43. Santamouris, M.; Vasilakopoulou, K. Present and future energy consumption of buildings: Challenges and opportunities towards decarbonisation. E-Prime Adv. Electr. Eng. Electron. Energy 2021, 1, 100002. [Google Scholar] [CrossRef]
  44. Xu, X.X.; Xiao, B.; Li, C.Z.D. Influence of built environment on building energy consumption: A case study in Nanjing, China. Environ. Dev. Sustain. 2024, 26, 5199–5222. [Google Scholar]
  45. Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
  46. Hashemi, S.E.; Gholian-Jouybari, F.; Hajiaghaei-Keshteli, M. A fuzzy C-means algorithm for optimizing data clustering. Expert Syst. Appl. 2023, 227, 120377. [Google Scholar] [CrossRef]
  47. Miller, C.; Arjunan, P.; Kathirgamanathan, A.; Fu, C.; Roth, J.; Park, J.Y.; Balbach, C.; Gowri, K.; Nagy, Z.; Fontanini, A.; et al. The ASHRAE great energy predictor III competition: Overview and results. Sci. Technol. Built Environ. 2020, 26, 1427–1447. [Google Scholar] [CrossRef]
  48. Miller, C.; Kathirgamanathan, A.; Picchetti, B.; Arjunan, P.; Park, J.Y.; Nagy, Z.; Raftery, P.; Hobson, B.W.; Shi, Z.; Meggers, F. The building data genome project 2, energy meter data from the ASHRAE great energy predictor III competition. Sci. Data 2020, 7, 368. [Google Scholar] [CrossRef] [PubMed]
  49. Shi, Z.Y.; Chen, L.; Duan, J.W.; Chen, G.Y.; Zhao, K. Robust and fuzzy ensemble framework via spectral learning for random projection-based fuzzy-c-means clustering. Eng. Appl. Artif. Intell. 2023, 117, 105541. [Google Scholar] [CrossRef]
  50. Wu, J.; Wu, Z.; Mao, X.; Wu, F.; Tang, H.; Chen, L. Risk early warning method for distribution system with sources-networksloads- vehicles based on fuzzy C-mean clustering. Electr. Power Syst. Res. 2020, 180, 106059. [Google Scholar] [CrossRef]
  51. Hu, F.K.; Chen, H.B.; Wang, X.F. An intuitionistic kernel-based fuzzy c-means clustering algorithm With local information for power equipment image segmentation. IEEE Access 2020, 8, 4500–4514. [Google Scholar] [CrossRef]
  52. Zhao, Q.; Shao, S.; Lu, L.; Liu, X.; Zhu, H.L. A new PV array fault diagnosis method using fuzzy c-mean clustering and fuzzy membership algorithm. Emergies 2018, 11, 238. [Google Scholar] [CrossRef]
  53. Liu, F.; Dong, T.; Hou, T.; Liu, Y. A hybrid short-term load forecasting model based on improved fuzzy c-means clustering, random Forest and deep neural networks. IEEE Access 2021, 9, 59754–59765. [Google Scholar] [CrossRef]
  54. Mohammadrezapour, O.; Kisi, O.; Pourahmad, F. Fuzzy c-means and K-means clustering with genetic algorithm for identification of homogeneous regions of groundwater quality. Neural Comput. Appl. 2020, 32, 3763–3775. [Google Scholar] [CrossRef]
  55. Xiong, J.; Liu, X.; Zhu, X.T.; Zhu, H.B.; Li, H.Y.; Zhang, Q.H. Semi-Supervised Fuzzy C-Means Clustering Optimized by Simulated Annealing and Genetic Algorithm for Fault Diagnosis of Bearings. IEEE Access 2020, 8, 181976–181987. [Google Scholar] [CrossRef]
  56. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  57. Lara-Benitezy, P.; Carranza-Garcia, M.; Riquelme, J.C. An experimental review on deep learning architectures for time-series forecasting. Int. J. Neural Syst. 2021, 31, 2130001. [Google Scholar] [CrossRef] [PubMed]
  58. Canizo, M.; Triguero, I.; Conde, A.; Onieva, E. Multi-head CNN–RNN for multi-time-series anomaly detection: An industrial case study. Neurocomputing 2019, 363, 246–260. [Google Scholar] [CrossRef]
  59. Jiang, J.R.; Lee, J.E.; Zeng, Y.M. Time series multiple channel convolutional neural network with attention-based long short-term memory for predicting bearing remaining useful life. Sensors 2020, 20, 166. [Google Scholar] [CrossRef]
  60. Alipour, P.; Mukherjee, S.; Nateghi, R. Assessing climate sensitivity of peak electricity load for resilient power systems planning and operation: A study applied to the Texas region. Energy 2019, 185, 1143–1153. [Google Scholar] [CrossRef]
  61. Raghunath, K.M. Integrated Energy Management and Forecasting Dataset. IEEE Dataport 2023. [Google Scholar] [CrossRef]
  62. Zheng, X.T.; Xu, N.; Trinh, L.; Wu, D.Q.; Huang, T.; Sivaranjani, S.; Liu, Y.; Xie, L. A multi-scale time-series dataset with benchmark for machine learning in decarbonized energy grids. Sci. Data 2022, 9, 359. [Google Scholar] [CrossRef]
Figure 1. Forecasting framework.
Figure 1. Forecasting framework.
Sensors 24 01391 g001
Figure 2. Training fuzzy clustering-based DNN.
Figure 2. Training fuzzy clustering-based DNN.
Sensors 24 01391 g002
Figure 3. Implementing fuzzy clustering-based DNN after the training.
Figure 3. Implementing fuzzy clustering-based DNN after the training.
Sensors 24 01391 g003
Figure 4. LSTM layer.
Figure 4. LSTM layer.
Sensors 24 01391 g004
Figure 5. LSTM neuron.
Figure 5. LSTM neuron.
Sensors 24 01391 g005
Figure 6. CNN framework [58].
Figure 6. CNN framework [58].
Sensors 24 01391 g006
Figure 7. CNN-Head.
Figure 7. CNN-Head.
Sensors 24 01391 g007
Figure 8. Cluster training.
Figure 8. Cluster training.
Sensors 24 01391 g008
Figure 9. Clusters for City-Building.
Figure 9. Clusters for City-Building.
Sensors 24 01391 g009
Figure 10. Clusters for University.
Figure 10. Clusters for University.
Sensors 24 01391 g010
Figure 11. Cluster validations for City-Building.
Figure 11. Cluster validations for City-Building.
Sensors 24 01391 g011
Figure 12. Cluster validations for University.
Figure 12. Cluster validations for University.
Sensors 24 01391 g012
Figure 13. Predictions of LSTM and CNN for Day 8, Day 33 and Day 37 (standardized values).
Figure 13. Predictions of LSTM and CNN for Day 8, Day 33 and Day 37 (standardized values).
Sensors 24 01391 g013
Figure 14. Actual load consumption (i.e., labels) and predictions of LSTM and CNN for the first month for building ID 684 (standardized values). (a) Actual load consumption (labels) and predictions of LSTM for building ID 684, (b) Actual load consumption (i.e., labels) and predictions of CNN for building ID 684.
Figure 14. Actual load consumption (i.e., labels) and predictions of LSTM and CNN for the first month for building ID 684 (standardized values). (a) Actual load consumption (labels) and predictions of LSTM for building ID 684, (b) Actual load consumption (i.e., labels) and predictions of CNN for building ID 684.
Sensors 24 01391 g014
Figure 15. Site ID 1 with LSTM predictions (std error).
Figure 15. Site ID 1 with LSTM predictions (std error).
Sensors 24 01391 g015
Figure 16. Site ID 1 with CNN predictions (std error).
Figure 16. Site ID 1 with CNN predictions (std error).
Sensors 24 01391 g016
Figure 17. Site ID 4 with LSTM predictions (std error).
Figure 17. Site ID 4 with LSTM predictions (std error).
Sensors 24 01391 g017
Figure 18. Site ID 4 with CNN predictions (std error).
Figure 18. Site ID 4 with CNN predictions (std error).
Sensors 24 01391 g018
Table 1. Portions of building purposes.
Table 1. Portions of building purposes.
PurposePercentages
Education37.71
Office18.77
Entertainment/public assembly12.47
Other10.64
Lodging/residential10.27
Public services10.15
Table 2. Data features.
Table 2. Data features.
Feature IndexForecasting FeatureDescription of the Data (Range/Unit)DomainTime Nature
1Building sizeFloor area of building in square feet (564 to 1360 feet2)BuildingTime-invariant
2Floor countMinimum and maximum numbers of floors are 2 and 16, respectivelyBuildingTime-invariant
3Air temperatureThe temperature of the air from −10.6 to 47.2 °CWeatherTime-varying
4Cloud coveragePortions of the sky covered in clouds from 0 to 9 oktasWeatherTime-varying
5Dewpoint temperatureA given parcel of air is cooled at a constant barometric pressure and water evaporation to saturate, −22.8 to 26.1 °CWeatherTime-varying
6Precipitation depth per an hourThe depth of liquid precipitation measured in an hour, −1 to 343 mm.WeatherTime-varying
7Sea level pressureThe air pressure relative to mean sea level, 973.5 to 1046.0 mbWeatherTime-varying
8Wind directionThe angle measured in a clockwise direction, between north and the direction of the blowing wind, 0 to 360°WeatherTime-varying
9Wind speedThe rate of horizontal travel of air past a fixed point, 0 to 18.5 m/sWeatherTime-varying
10WeekdaySunday to Saturday indexed with 0 to 6CalendarTime-varying
11Hour24 h indexed with 0 to 23CalendarTime-varying
12MonthJanuary to December indexed with 1 to 12CalendarTime-varying
13TimestampYear:Month:Date:HourCalendarTime-varying
14Previous 24 h load consumptionLoad consumption ranges from 0 to 12 kWavgLoad consumptionTime-varying
Table 3. Test results for City Building and University (std error).
Table 3. Test results for City Building and University (std error).
SitesMethodsMAEMSE
City-BuildingProposed fuzzy LSTM0.0510.003
LSTM0.2180.042
Proposed fuzzy CNN0.1510.026
non-fuzzy CNN0.4150.174
UniversityProposed fuzzy LSTM0.0750.005
non-fuzzy LSTM0.1270.016
Proposed fuzzy CNN0.1340.0176
non-fuzzy CNN0.1870.0349
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chan, K.Y.; Yiu, K.F.C.; Kim, D.; Abu-Siada, A. Fuzzy Clustering-Based Deep Learning for Short-Term Load Forecasting in Power Grid Systems Using Time-Varying and Time-Invariant Features. Sensors 2024, 24, 1391. https://doi.org/10.3390/s24051391

AMA Style

Chan KY, Yiu KFC, Kim D, Abu-Siada A. Fuzzy Clustering-Based Deep Learning for Short-Term Load Forecasting in Power Grid Systems Using Time-Varying and Time-Invariant Features. Sensors. 2024; 24(5):1391. https://doi.org/10.3390/s24051391

Chicago/Turabian Style

Chan, Kit Yan, Ka Fai Cedric Yiu, Dowon Kim, and Ahmed Abu-Siada. 2024. "Fuzzy Clustering-Based Deep Learning for Short-Term Load Forecasting in Power Grid Systems Using Time-Varying and Time-Invariant Features" Sensors 24, no. 5: 1391. https://doi.org/10.3390/s24051391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop