Elsevier

Applied Soft Computing

Volume 9, Issue 2, March 2009, Pages 590-598
Applied Soft Computing

A novel fuzzy linear regression model based on a non-equality possibility index and optimum uncertainty

https://doi.org/10.1016/j.asoc.2008.08.005Get rights and content

Abstract

Various kinds of fuzzy regression models are introduced in the literature and many different methods are proposed to estimate fuzzy parameters of the models. In this study, a new approach is introduced to find the parameters of a linear fuzzy regression, with fuzzy outputs, the input data of which is measured by crisp numbers. Based on a non-equality possibility index, a new objective function is designed and solved, by which a minimum degree of acceptable uncertainty (the h-level or h-cut) is found. Four numerical examples are presented to compare the proposed approach with some other methods. Results show superiority of the new approach based on the criterion used by Kim and Bishu in the cases studied here. A realistic application of the proposed method is also presented, by which the total energy consumption of the Residential-Commercial sector in Iran is modeled using three variables of the GDP, number of the Households and an Energy Price index as inputs (exogenous variables) to the model.

Introduction

A fuzzy regression (FR) is an input–output relation in which the input or the output, or both are fuzzy numbers. Unlike the classic linear regression in which the parameters are assumed to be random variables with probability distribution functions, in a fuzzy regression, the coefficients are subject to the possibility theory [1]. Therefore, the input data (independent variable, X), the output data (dependent variable Y) and consequently the relationship between them are relaxed.

A general form of the linear fuzzy regression (LFR) is represented byY˜i*=A˜0X˜i0+A˜1X˜i1++A˜nX˜inor in a vector production form:Y˜i*=A˜X˜iwhere Y˜i* and X˜i are the fuzzy output and vector of the input observations, respectively. Hence, A˜=[Ã0,Ã1,,Ãn] and X˜i=[X˜i0,X˜i1,,X˜i,n]t; i = 1, …, m; so that, A˜j, j = 0, 1, …, n are the fuzzy coefficients.

The fuzzy regression analysis is a powerful tool for investigating and predicting data sets by measuring a vague concept that contains a degree of ambiguity, uncertainty or fuzziness [2], [3], [4], [5]. The main purpose of fuzzy regression models is to find the best model with the least error. Depending on how we define the error, this method can be classified into two classes:

  • (1)

    Possibilistic approach, which tries to minimize the whole fuzziness of the model by minimizing the total spreads of its fuzzy coefficients, subject to including the data points of each sample within a specified feasible data interval [6], [7], [8].

    The fuzzy regression analysis was first introduced by Tanaka et al. [6], who established his idea on the basis of the possibility theory. He modeled the procedure of parameter estimation as a linear programming problem, where the inputs are crisp and the output is a fuzzy number. Later, the optimization model was solved subject to that the observations fall in the fuzzy sets computed by the model. He then extended triangular fuzzy coefficients to Gaussian fuzzy numbers [9]. Some discussions have been presented offering some modifications on the solution of the so-called exponential possibility regression problems, especially on determining the center of the possibility distribution [10], where it was shown that the estimated parameters will be quite different if the main nonlinear problem is solved approximately by dividing it into two linear programming problems.

    Recently, Ge and Wang [12] tried to determine the relationship between threshold value (h-parameter) and input data when data contains a considerable level of noise or uncertainty. They used the threshold value to measure degrees of fitness in fuzzy linear regression. Eventually, they showed that the parameter h is inversely proportional to the input noise. Meantime, many researchers recommended a combination of fuzzy regression models with some other approaches, like Neuro-Fuzzy modeling [2], TSK-FR modeling [5] and Monte-Carlo methods [13] to improve the result obtained from ordinary LFR.

  • (2)

    Least squares model, which minimizes the sum of squared errors in the estimated value, based on their specifications [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21].

This approach is indeed a fuzzy extension of the ordinary least squares, which obtains the best fitting to the data, based on the distance measure under fuzzy consideration, applying information included in the input–output data set.

In the present study, the possibilistic approach is employed, for which a new objective function is introduced. The main idea in this study, which is derived from the second class of the fuzzy regressions, is to minimize the distance between the outputs of the model and the measurements. The proposed objective function helps to estimate an optimal confidence level, namely h-level, simultaneously with the parameters.

The organization of the remaining parts of the paper is as follows: in Sections 2 Fuzzy numbers, 3 Fuzzy linear regression models, preliminary definitions, including fuzzy numbers and fuzzy linear regression models are given. Furthermore, some important remarks in these models are described. In Section 4, the new approach is introduced, based on an index proposed to measure inequality between two fuzzy numbers. Numerical examples are applied to compare the results of our approach with that of some existing methods in Section 5, along with some criteria that are conducted to assess the performance of the methods. Finally, a realistic application is given in Section 6, and Section 7 concludes the paper.

Section snippets

Fuzzy numbers

Three introductory definitions are presented in this section. First, the definition for the fuzzy numbers used through this paper is given based on the concept defined by Dubois and Prade [22]:

Definition 1

à is a fuzzy number of the LR-type if there are a, cL > 0, cR > 0 in R so that:

μA˜(x)=LaxcLforxaRxacRforxawhere L and R are decreasing functions from R+ to [0,1], and L(x) = R(x) = 1, for x  0, and L(x) = R(x) = 0 for x  1. If L(x) = R(x) = 1  x, then the following notation represents a triangular fuzzy numbers:Ã

Fuzzy linear regression models

The fuzzy regression analysis introduced by Tanaka et al. [6], considers crisp inputs and fuzzy outputs. The general model given by (1) is then represented as follows:Y˜i*=A˜0Xi0+A˜1Xi1++A˜nXin=A˜Xiwhere Y˜i* is the fuzzy output, A˜j, j = 0, 1, …, n; are the fuzzy coefficients, and Xi is the vector of crisp inputs. The optimization process is formulated as follows:MinZ(h)=i=1mj=0ncj|xij|subject toj=0najxij+|L1(h)|j=0ncj|xij|yi+|L1(h)|ei,i=1,2,,mj=0najxij|L1(h)|j=0ncj|xij|yi|L

The new approach

There are three types of the possibilistic linear regression analysis, so-called the Min, Max and Conjunction problems. Based on the notation used by Tanaka at el. [7], suppose that A¯j, A_j and Âj; j = 1, …, n, are estimated such that we have one of the following cases:Y˜ihY¯i(theMinproblem)Y˜ihY_i(theMaxproblem)[Y˜i]h[Yˆi]h(theConjunctionproblem)

In the Min problem, ⊆h indicates that the spread of Y¯i covers that of Y˜i in the level of h. The same meaning is aimed

Numerical examples

In this section, four examples are applied in order to compare the new approach with several other methods. Tanaka’s method (TM) [6], [7], Diamond Method (DM) [11], Savic and Pedrycz method (SP) [26], Kim and Bishu (KB) [25] and Modarres Approach (MA) [24] are chosen to evaluate the new approach (NA).

Example 1

The data used by Tanaka et al. [7], shown in Table 1, is used as the first example. Applying NA to this data, the following regression is obtained:

Y˜*=(5.042,1.842)L+(1.592,0.112)Lx;h=0.5721,

Application

The proposed method is used to forecast the end-use energy in the Residential-Commercial sector of Iran. The data is taken from the Annual Energy Balance Sheet published by the Ministry of Energy and the Central Bank of Iran. The total Energy Consumption (EC) of the households is considered as the dependent variable. Since data given for the agriculture sector may include energy used in the rural regions by farmers in their houses, half of the energy consumption given for the agriculture sector

Conclusion

The fuzzy regression has penetrated a wide range of data processing fields and has found various applications for analyzing uncertain data. There are two main classes in this regard: the possibilistic approach and the least squares model. By this study, as a combinatorial approach, we expanded the model, based on an idea of reducing the distance between the output of the possibilistic model and the measured output, by increasing their Conjunction. First, a measuring index of inequality of fuzzy

Cited by (0)

View full text