A novel fuzzy linear regression model based on a non-equality possibility index and optimum uncertainty
Introduction
A fuzzy regression (FR) is an input–output relation in which the input or the output, or both are fuzzy numbers. Unlike the classic linear regression in which the parameters are assumed to be random variables with probability distribution functions, in a fuzzy regression, the coefficients are subject to the possibility theory [1]. Therefore, the input data (independent variable, X), the output data (dependent variable Y) and consequently the relationship between them are relaxed.
A general form of the linear fuzzy regression (LFR) is represented byor in a vector production form:where and are the fuzzy output and vector of the input observations, respectively. Hence, and ; i = 1, …, m; so that, , j = 0, 1, …, n are the fuzzy coefficients.
The fuzzy regression analysis is a powerful tool for investigating and predicting data sets by measuring a vague concept that contains a degree of ambiguity, uncertainty or fuzziness [2], [3], [4], [5]. The main purpose of fuzzy regression models is to find the best model with the least error. Depending on how we define the error, this method can be classified into two classes:
- (1)
Possibilistic approach, which tries to minimize the whole fuzziness of the model by minimizing the total spreads of its fuzzy coefficients, subject to including the data points of each sample within a specified feasible data interval [6], [7], [8].
The fuzzy regression analysis was first introduced by Tanaka et al. [6], who established his idea on the basis of the possibility theory. He modeled the procedure of parameter estimation as a linear programming problem, where the inputs are crisp and the output is a fuzzy number. Later, the optimization model was solved subject to that the observations fall in the fuzzy sets computed by the model. He then extended triangular fuzzy coefficients to Gaussian fuzzy numbers [9]. Some discussions have been presented offering some modifications on the solution of the so-called exponential possibility regression problems, especially on determining the center of the possibility distribution [10], where it was shown that the estimated parameters will be quite different if the main nonlinear problem is solved approximately by dividing it into two linear programming problems.
Recently, Ge and Wang [12] tried to determine the relationship between threshold value (h-parameter) and input data when data contains a considerable level of noise or uncertainty. They used the threshold value to measure degrees of fitness in fuzzy linear regression. Eventually, they showed that the parameter h is inversely proportional to the input noise. Meantime, many researchers recommended a combination of fuzzy regression models with some other approaches, like Neuro-Fuzzy modeling [2], TSK-FR modeling [5] and Monte-Carlo methods [13] to improve the result obtained from ordinary LFR.
- (2)
Least squares model, which minimizes the sum of squared errors in the estimated value, based on their specifications [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21].
This approach is indeed a fuzzy extension of the ordinary least squares, which obtains the best fitting to the data, based on the distance measure under fuzzy consideration, applying information included in the input–output data set.
In the present study, the possibilistic approach is employed, for which a new objective function is introduced. The main idea in this study, which is derived from the second class of the fuzzy regressions, is to minimize the distance between the outputs of the model and the measurements. The proposed objective function helps to estimate an optimal confidence level, namely h-level, simultaneously with the parameters.
The organization of the remaining parts of the paper is as follows: in Sections 2 Fuzzy numbers, 3 Fuzzy linear regression models, preliminary definitions, including fuzzy numbers and fuzzy linear regression models are given. Furthermore, some important remarks in these models are described. In Section 4, the new approach is introduced, based on an index proposed to measure inequality between two fuzzy numbers. Numerical examples are applied to compare the results of our approach with that of some existing methods in Section 5, along with some criteria that are conducted to assess the performance of the methods. Finally, a realistic application is given in Section 6, and Section 7 concludes the paper.
Section snippets
Fuzzy numbers
Three introductory definitions are presented in this section. First, the definition for the fuzzy numbers used through this paper is given based on the concept defined by Dubois and Prade [22]: Definition 1 is a fuzzy number of the LR-type if there are a, cL > 0, cR > 0 in R so that:
Fuzzy linear regression models
The fuzzy regression analysis introduced by Tanaka et al. [6], considers crisp inputs and fuzzy outputs. The general model given by (1) is then represented as follows:where is the fuzzy output, , j = 0, 1, …, n; are the fuzzy coefficients, and Xi is the vector of crisp inputs. The optimization process is formulated as follows:subject to
The new approach
There are three types of the possibilistic linear regression analysis, so-called the Min, Max and Conjunction problems. Based on the notation used by Tanaka at el. [7], suppose that , and ; j = 1, …, n, are estimated such that we have one of the following cases:
In the Min problem, ⊆h indicates that the spread of covers that of in the level of h. The same meaning is aimed
Numerical examples
In this section, four examples are applied in order to compare the new approach with several other methods. Tanaka’s method (TM) [6], [7], Diamond Method (DM) [11], Savic and Pedrycz method (SP) [26], Kim and Bishu (KB) [25] and Modarres Approach (MA) [24] are chosen to evaluate the new approach (NA). Example 1 The data used by Tanaka et al. [7], shown in Table 1, is used as the first example. Applying NA to this data, the following regression is obtained:
Application
The proposed method is used to forecast the end-use energy in the Residential-Commercial sector of Iran. The data is taken from the Annual Energy Balance Sheet published by the Ministry of Energy and the Central Bank of Iran. The total Energy Consumption (EC) of the households is considered as the dependent variable. Since data given for the agriculture sector may include energy used in the rural regions by farmers in their houses, half of the energy consumption given for the agriculture sector
Conclusion
The fuzzy regression has penetrated a wide range of data processing fields and has found various applications for analyzing uncertain data. There are two main classes in this regard: the possibilistic approach and the least squares model. By this study, as a combinatorial approach, we expanded the model, based on an idea of reducing the distance between the output of the possibilistic model and the measured output, by increasing their Conjunction. First, a measuring index of inequality of fuzzy
Reference (27)
- et al.
Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool
J. Environ. Manage.
(2007) - et al.
Possibilistic linear regression analysis for fuzzy data
Eur. J. Operat. Res.
(1989) - et al.
Possibilistic linear systems and their application to the linear regression model
Fuzzy Sets Syst.
(1988) Fuzzy data analysis by possibilistic linear models
Fuzzy Sets Syst.
(1987)Fuzzy least squares
Inform. Sci.
(1988)- et al.
Dependency between degree of fit and input noise in fuzzy linear regression using non-symmetric fuzzy triangular coefficients
Fuzzy Sets Syst.
(2007) - et al.
General fuzzy least squares
Fuzzy Sets Syst.
(1997) S-curve regression model in fuzzy environment
Fuzzy Sets Syst.
(1997)- et al.
A least-squares approach to fuzzy linear regression analysis
Comput. Statist. Data Anal.
(2000) - et al.
Fuzzy least absolute deviation regression and the conflicting trends in fuzzy parameters
Comp. Math. Appl.
(1994)