A Generalized Maximum Entropy (GME) estimation approach to fuzzy regression model
Graphical abstract
Introduction
Regression analysis can be considered one of the most widely used data analysis techniques in engineering, social sciences, biology, data mining, pattern recognition, etc. In general, its purpose is to model the relation between a dependent variable and a set of independent variables by means of a suitable mathematical model (e.g., linear, polynomial, quadratic) in order to understand whether the independent variables predict the dependent variable and how the independent variables explain the empirical data which are modeled. In the standard regression framework y = Xβ + ϵ, in order to fit the model Xβ to the empirical data y, it is necessary to estimate a vector β of parameters from the data vector y and the model matrix X which is in turn assumed having complete rank. The estimation of the vector β can be performed by using the least square method which consists in minimizing the sum of square of residuals between the model and the empirical data that are expressed by the distance ∥y − Xβ ∥ 2.
Linear regression analysis has been mainly applied to standard crisp data (i.e., vectors/matrices with single-valued data). However, some researchers have extended the regression framework also to more complex data (e.g., interval, symbolic or fuzzy) in order to better model situations in which data contain vague and imprecise information [1], [2], [3]. In this context, fuzzy sets can be considered a natural way to model imprecision and vagueness in the empirical data [4]. Such type of data have been extensively studied in fuzzy statistics, a branch of statistical theory devoted to handle with data characterized by a particular type of uncertainty, called fuzziness. Nowadays, several fuzzy regression models and techniques are available [5], [6], [7]. In particular, some of these models have been developed using the concept of LR-fuzzy number [8] which may be considered one of the most important topic in Fuzzy Set Theory (FST). Moreover, LR-fuzzy numbers provide an elegant and compact way to describe a large variety of fuzzy data in different applicative situations [9], [10].
In general, three main approaches can be used to handle with LR-fuzzy data. These can be described according to the nature of the input data, the method and the output data considered (input-method-output schema). In particular, in the first approach, named fuzzy-crisp-crisp, the fuzzy input data are transformed into crisp data (e.g., by means of some defuzzification procedures) and standard statistical methods are used to perform data analysis (e.g., crisp least squares) [11]. The resulting output of this procedure are also crisp data y. In the second approach, fuzzy-crisp-fuzzy, the fuzzy input data are analyzed by means of standard statistical methods that are extended in order to take into account the LR-fuzzy representation (e.g., fuzzy least squares) [3]. Unlike the first approach, the resulting output of this procedure are fuzzy data . In the third approach, fuzzy-fuzzy-fuzzy, fuzzy input data are manipulated with suitable fuzzy statistical methods (e.g., Tanaka's minimum fuzziness criterion) in order to obtain fuzzy output data [12]. Although the first approach does not take into account fuzzy characteristics of the empirical data, the second and third approaches can efficiently manage fuzzy data with specific statistical procedures. However, although they are both fuzzy methods, unlike the third approach, the fuzzy-crisp-fuzzy can manage fuzzy data by extending standard statistical methods to take into account fuzzy properties of the data structures. In this way, this approach can easily manipulate crisp and fuzzy data at the same time and, above all, it can inherit the well-known properties of the standard statistical methods.
In some applicative contexts of linear regression models such as, for example, analysis with small samples and/or fat matrices (matrices with no complete rank), violation of distributional assumptions, ill-posed problems (e.g., multicollinearity), use of prior information on the parameters estimation, standard fuzzy statistical methods may be inappropriate to handle with these kind of situations. A case of particular interest in such situations concerns the presence of multicollinearity in the model matrix that may affect different empirical situations (e.g., [13], [14], [15]). Clearly, in this situation standard statistical methods such as, for example, fuzzy least squares, can result distort and it may not yield to accurate estimations. A possible way out consists in adopting ad-hoc data analysis procedure to transform the collinear data matrix into a new well-posed data matrix (e.g., by using a PCA-regression method). However, a serious limitation of this data transformation procedure is that it uses a subset of orthogonal new variables from the original set of variables that may artificially mask some relevant information contained in the original (not orthogonal) variables. As a consequence, in this article we propose a novel fuzzy regression framework which is entirely based on the well-known Generalized Maximum Entropy (GME) estimation approach [16], [17] and the fuzzy-crisp-fuzzy perspective [18]. In this respect, unlike fuzzy least squares, the GME-fuzzy proposal always guarantees accurate and not distort estimation processes.
The reminder of the article is organized as follows. Section 2 briefly describe the basic characteristics of LR-fuzzy data. Section 3 exposes the GME-fuzzy regression approach for LR-fuzzy data together with its main features. Moreover, this section also describes some useful procedures for data fitting and model evaluation. Section 4 describe a Monte Carlo study assessing the stability and reliability of the proposed approach as compared to the fuzzy least squares. Section 5 illustrates how the proposed GME-fuzzy regression works through an empirical case study. Finally, Section 6 concludes this article providing final remarks and suggestions for future extensions of our proposal.
Section snippets
LR-fuzzy numbers
In this section we briefly recall some basic features of LR-fuzzy numbers. In general, a fuzzy set can be described by its α-sets with α ∈ [0, 1] and where U and indicate the universal set and the membership function of , respectively. If the α-sets of are defined to be convex, then is called a convex fuzzy set. The support of can be denoted by whereas the set of all its maximal points is called the core of .
GME-fuzzy regression models
In this section we provide a detailed description of the proposed GME-fuzzy approach. In particular, we first briefly explain the GME rationale within the more general and simple case of regression problem. Next, we describe from the GME perspective two simple but still relevant fuzzy regressions, namely crisp-input/fuzzy-output and fuzzy-input/crisp-output models [20]. These models were chosen according to the fact that the relation between crisp independent variables (input) and fuzzy
Monte Carlo studies
In this section we describe a series of Monte Carlo studies which were performed in order to assess the stability and reliability of the proposed GME-fuzzy vs. the standard fuzzy least squares. In particular, we studied the performances of both the approaches in two main conditions, namely a general case (GC) and an ill-posed case (IC). Unlike GC, in the second condition we corrupted the model matrix by augmenting the collinearity among the explanatory variables. More technically, in a first
Case study
In this section we describe a real case study concerning atmospherical variables to illustrate the main features of the GME-fuzzy approach. The dataset we use has previously been published (see [39]) and contains six crisp variables (x1 = temperature, x2 = relative humidity, x3 = atmospheric pressure, x4 = rain, x5 = radiation, x6 = wind speed) and a fuzzy variable ( = carbon monoxide). Table 5 shows the original dataset. In order to analyze how the six explanatory variables concerning atmospherical
General findings
In this paper we proposed a novel estimation method for fuzzy regression models based on the Generalized Maximum Entropy (GME) approach. The proposed GME-fuzzy regressions allowed to take into account the main advantages of such entropy-based estimation method (namely, correct estimation process in ill-posed cases, use of external information in the estimation process, peculiar variable selection procedure, excellent work with distributional violations). To better illustrate the GME-fuzzy
Enrico Ciavolino is Researcher and Professor of Statistics at the University of Salento (Italy) and a member of the PhD in Human and Social Sciences. Since 2003 he has held courses in Descriptive Statistics and Multivariate Statistics. The activity of methodological research concerns the models of multivariate analysis and structural equation models based on parametric estimators (maximum likelihood), non-parametric (partial least squares - PLS) and semi-parametric (Generalized Maximum
References (42)
- et al.
The fuzzy approach to statistical analysis
Comput. Stat. Data Anal.
(2006) - et al.
A least-squares approach to fuzzy linear regression analysis
Comput. Stat. Data Anal.
(2000) An application-oriented view of modeling uncertainty
Eur. J. Oper. Res.
(2000)- et al.
Insight of a fuzzy regression model
Fuzzy Sets Syst.
(2000) Least squares model fitting to fuzzy vector data
Fuzzy Sets Syst.
(1987)- et al.
A maximum entropy approach to estimation and inference in dynamic models or counting fish in the sea using maximum entropy
J. Econ. Dyn. Control
(1996) Linear regression analysis for fuzzy/crisp input and fuzzy/crisp output data
Comput. Stat. Data Anal.
(2003)Linear regression analysis for fuzzy/crisp input and fuzzy/crisp output data
Comput. Stat. Data Anal.
(2003)- et al.
Using fuzzy number for measuring quality of service in the hotel industry
Tour. Manage.
(2007) - et al.
A survey analysis of service quality for domestic airlines
Eur. J. Oper. Res.
(2002)
Least squares estimation of a linear regression model with LR fuzzy response
Comput. Stat. Data Anal.
New measures of weighted fuzzy entropy and their applications for the study of maximum weighted fuzzy entropy principle
Inf. Sci.
Symbolic regression analysis
Fuzzy regression methods – a comparative assessment
Fuzzy Sets Syst.
Possibility Theory: An Approach to Computerized Processing of Uncertainty
Fuzzy Logic with Engineering Applications
A fuzzy set theory based computational model to represent the quality of inter-rater agreement
Qual. Quant.
Multicollinearity and correlation among local regression coefficients in geographically weighted regression
J. Geogr. Syst.
Sensitivity of fit indices to fake perturbation of ordinal data: a sample by replacement approach
Multivar. Behav. Res.
Cited by (25)
Fuzzy Linear regression based on approximate Bayesian computation
2020, Applied Soft Computing JournalCitation Excerpt :The most common approach is a mixed one [6,7,10,16–19] with Crisp-Inputs and Fuzzy-Outputs (CIFO) datasets. In addition, there are also general methods [8,20,21] coping with both FIFO and CIFO data. In this paper, we focus on fuzzy linear regression analysis with CIFO data.
Fuzzy regression analysis: Systematic review and bibliography
2019, Applied Soft Computing JournalCitation Excerpt :Kumar and Bajaj [346] deal with an intuitionistic fuzzy weighted linear regression model based on the concept of fuzzy entropy, which is a generalization of the approach presented in Kumar et al. [345]. Moreover, an estimation approach for FLR models based on generalized maximum entropy is proposed by Ciavolino and Calcagni [347]. Abdalla and Buckley [373, 374] apply the Monte Carlo method to the FLR model with the purpose of obtaining the optimal solution within a predetermined error bound.
Theme and sentiment analysis model of public opinion dissemination based on generative adversarial network
2019, Chaos, Solitons and FractalsCitation Excerpt :The machine learning algorithm has gradually replaced the traditional classification algorithm. Currently, many kinds of classification learning algorithms are used in message text classification, such as naive bayes [6], neural network (neural nets), support vector machine [7], decision tree, k-nearest neighbor, and maximum entropy [8]. Onan et al. [9].
Error measures for fuzzy linear regression: Monte Carlo simulation approach
2016, Applied Soft Computing JournalCitation Excerpt :Moreover, Roh et al. [12] presented a new estimation approach based on Polynomial Neural Networks for fuzzy linear regression. Recently, a generalized maximum entropy estimation approach to fuzzy regression model is introduced by Ciavolinoa and Calcagni [13]. Application areas of fuzzy linear regression analysis have been considerably improved by different approaches in recent years.
A Comparative Study with LR-model and LP-based Possibilistic Model on a Real Data Set
2024, AIP Conference ProceedingsThe Formation of Portfolio with Fuzzy Approach and Multi-objective Method: A Case Study on Stocks incorporated in LQ45
2023, Electronic Journal of Applied Statistical Analysis
Enrico Ciavolino is Researcher and Professor of Statistics at the University of Salento (Italy) and a member of the PhD in Human and Social Sciences. Since 2003 he has held courses in Descriptive Statistics and Multivariate Statistics. The activity of methodological research concerns the models of multivariate analysis and structural equation models based on parametric estimators (maximum likelihood), non-parametric (partial least squares - PLS) and semi-parametric (Generalized Maximum Entropy). The methodological research finds applications in the fields of the of psychology, economics, and more generally for the decision support systems and services evaluation. Applied research concerns the evaluation of customer satisfaction in public utility services (hospitals, transport, education), analysis of employee satisfaction (Job Satisfaction) models for decision support in social and political field, multivariate models for gender studies.
Antonio Calcagnì is a PhD student in Psychometrics at Department of Psychology and Cognitive Science, University of Trento (Italy). His research interests concern the application of Fuzzy Set Theory to psychometrics and the problem of measurement in psychology and related areas, such as fuzzy statistics. The methodological research find applications in the fields of psychology and statistics whereas the applied research concerned the evaluation of customer satisfaction in public services (transport and education) and the analysis of satisfaction (Job Satisfaction and Student Satisfaction).