Abstract
Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient \(R^{2}_{w}\) for the overall model. This coefficient is suitably defined for weighted regression.
Target functions for the clusterwise regression problem may have a large number of local optima that cannot be handled with optimization methods based on derivatives. The approach commonly employed to overcome this problem is to start several times from random partitions and then to improve the resulting partition. Because our procedure is very fast it can be used with many random starts. Eventually, the solution with the highest determination coefficient \(R^{2}_{w}\) for the overall model is chosen. The performance of the method is investigated with the help of Monte Carlo simulations. It is also compared to the finite-mixture approach to clusterwise regression. A sequence of bootstrap tests is proposed to determine the number of clusters.
Similar content being viewed by others
References
Baier, D.: A Constrained clusterwise regression procedure for benefit segmentation. In: Studies in Classification, Data Analysis, and Knowledge Organization. vol. 11, pp. 676–683 (1997)
Cohen, E.: Some effects of inharmonic partials on interval perception. In: Music Perception. vol. 1, pp. 323–349 (1984)
Cox, D.R.: Test of separate families of hypotheses. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. vol. 1, pp. 105–123. (1961)
Cox, D.R.: Further results on tests of separate families of hypotheses. J. R. Stat. Soc. 24, 406–24 (1962)
Davidson, R., MacKinnon, J.G.: Econometric Theory and Methods. Oxford University Press, New York (2004)
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)
DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5, 249–282 (1988)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, Berlin (2006)
Furrer, R., Nychka, D., Sain, S.: (2010) Fields: Tools for spatial data; R package version 6.3. http://cran.r-project.org/web/packages/fields
Gruen, B., Leisch, F.: Fitting finite mixtures of generalized linear regressions in R. Comput. Stat. Data Anal. 51(11), 5247–5252 (2007)
Gruen, B., Leisch, F.: FlexMix Version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28(4), 1–35 (2008). http://www.jstatsoft.org/v28/i04/
Hennig, C.: Fixed point clusters for linear regression: computation and comparison. J. Classif. 19, 249–276 (2002)
Hennig, C.: Clusters, outliers, and regression: fixed point clusters. J. Multivar. Anal. 86, 183–212 (2003)
Hennig, C.: fpc: Fixed point clusters, clusterwise regression and discriminant plots. R package version 1.2-7. http://CRAN.R-project.org/package=fpc (2009)
Hurn, M., Justel, A., Robert, C.P.: Estimating mixtures of regressions. J. Comput. Graph. Stat. 12(1), 55–79 (2003)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Jeong, J.: R2-based bootstrap tests for nonnested hypotheses in regression models. InterStat, http://interstat.statjournals.net/YEAR/2006/abstracts/0608001.php (2006). Accessed 21 January 2009
Lau, K., Leung, P., Tse, K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116, 640–652 (1999)
Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11(8), 1–18 (2004). http://www.jstatsoft.org/v11/i08/
Luo, Z., Chou, E.Y.J.: Pavement condition prediction using clusterwise regression. TRB 85th Annual Meeting Compendium of Papers CD-ROM, www.eng.mu.edu/crovettj/courses/ceen175/06-2463.pdf (2006). Accessed 20 September 2009
McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36(3), 318–324 (1987)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2010)
Späth, H.: Clusterwise linear regression. Computing 22, 367–373 (1979)
Späth, H.: A fast algorithm for clusterwise linear regression. Computing 29, 175–181 (1981)
Späth, H.: Clusterwise linear least absolute deviations regression. Computing 37, 371–378 (1986)
Turner, T.R.: Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl. Stat. 49(3), 371–384 (2000)
Viele, K., Tong, B.: Modeling with mixtures of linear regressions. Stat. Comput. 12(4), 315–330 (2002)
Wayne, S.D., Edwards, E.A.: Typologies of compulsive buying behavior: a constrained clusterwise regression approach. J. Consum. Psychol. 5, 231–262 (1996)
Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6, 45–59 (1989)
Wulf, S.: Traditionelle nicht-metrische Conjointanalyse–ein Verfahrens vergleich. Münster, LIT-Verlag (2007)
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Schlittgen, R. A weighted least-squares approach to clusterwise regression. AStA Adv Stat Anal 95, 205–217 (2011). https://doi.org/10.1007/s10182-011-0155-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-011-0155-4