Skip to main content
Log in

A weighted least-squares approach to clusterwise regression

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient \(R^{2}_{w}\) for the overall model. This coefficient is suitably defined for weighted regression.

Target functions for the clusterwise regression problem may have a large number of local optima that cannot be handled with optimization methods based on derivatives. The approach commonly employed to overcome this problem is to start several times from random partitions and then to improve the resulting partition. Because our procedure is very fast it can be used with many random starts. Eventually, the solution with the highest determination coefficient \(R^{2}_{w}\) for the overall model is chosen. The performance of the method is investigated with the help of Monte Carlo simulations. It is also compared to the finite-mixture approach to clusterwise regression. A sequence of bootstrap tests is proposed to determine the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baier, D.: A Constrained clusterwise regression procedure for benefit segmentation. In: Studies in Classification, Data Analysis, and Knowledge Organization. vol. 11, pp. 676–683 (1997)

    Google Scholar 

  • Cohen, E.: Some effects of inharmonic partials on interval perception. In: Music Perception. vol. 1, pp. 323–349 (1984)

    Google Scholar 

  • Cox, D.R.: Test of separate families of hypotheses. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. vol. 1, pp. 105–123. (1961)

    Google Scholar 

  • Cox, D.R.: Further results on tests of separate families of hypotheses. J. R. Stat. Soc. 24, 406–24 (1962)

    MATH  Google Scholar 

  • Davidson, R., MacKinnon, J.G.: Econometric Theory and Methods. Oxford University Press, New York (2004)

    Google Scholar 

  • Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  • DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5, 249–282 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Furrer, R., Nychka, D., Sain, S.: (2010) Fields: Tools for spatial data; R package version 6.3. http://cran.r-project.org/web/packages/fields

  • Gruen, B., Leisch, F.: Fitting finite mixtures of generalized linear regressions in R. Comput. Stat. Data Anal. 51(11), 5247–5252 (2007)

    Article  MATH  Google Scholar 

  • Gruen, B., Leisch, F.: FlexMix Version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28(4), 1–35 (2008). http://www.jstatsoft.org/v28/i04/

    Google Scholar 

  • Hennig, C.: Fixed point clusters for linear regression: computation and comparison. J. Classif. 19, 249–276 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Hennig, C.: Clusters, outliers, and regression: fixed point clusters. J. Multivar. Anal. 86, 183–212 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Hennig, C.: fpc: Fixed point clusters, clusterwise regression and discriminant plots. R package version 1.2-7. http://CRAN.R-project.org/package=fpc (2009)

  • Hurn, M., Justel, A., Robert, C.P.: Estimating mixtures of regressions. J. Comput. Graph. Stat. 12(1), 55–79 (2003)

    Article  MathSciNet  Google Scholar 

  • Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)

    Article  Google Scholar 

  • Jeong, J.: R2-based bootstrap tests for nonnested hypotheses in regression models. InterStat, http://interstat.statjournals.net/YEAR/2006/abstracts/0608001.php (2006). Accessed 21 January 2009

  • Lau, K., Leung, P., Tse, K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116, 640–652 (1999)

    Article  MATH  Google Scholar 

  • Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11(8), 1–18 (2004). http://www.jstatsoft.org/v11/i08/

    Google Scholar 

  • Luo, Z., Chou, E.Y.J.: Pavement condition prediction using clusterwise regression. TRB 85th Annual Meeting Compendium of Papers CD-ROM, www.eng.mu.edu/crovettj/courses/ceen175/06-2463.pdf (2006). Accessed 20 September 2009

  • McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36(3), 318–324 (1987)

    Article  MathSciNet  Google Scholar 

  • R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2010)

  • Späth, H.: Clusterwise linear regression. Computing 22, 367–373 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  • Späth, H.: A fast algorithm for clusterwise linear regression. Computing 29, 175–181 (1981)

    Article  Google Scholar 

  • Späth, H.: Clusterwise linear least absolute deviations regression. Computing 37, 371–378 (1986)

    Article  MATH  Google Scholar 

  • Turner, T.R.: Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl. Stat. 49(3), 371–384 (2000)

    MATH  Google Scholar 

  • Viele, K., Tong, B.: Modeling with mixtures of linear regressions. Stat. Comput. 12(4), 315–330 (2002)

    Article  MathSciNet  Google Scholar 

  • Wayne, S.D., Edwards, E.A.: Typologies of compulsive buying behavior: a constrained clusterwise regression approach. J. Consum. Psychol. 5, 231–262 (1996)

    Article  Google Scholar 

  • Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6, 45–59 (1989)

    Article  Google Scholar 

  • Wulf, S.: Traditionelle nicht-metrische Conjointanalyse–ein Verfahrens vergleich. Münster, LIT-Verlag (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rainer Schlittgen.

Electronic Supplementary Material

Below are the links to the electronic supplementary material.

(R 13 kB).

(R 3.58 kB).

(R 6.35 kB).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schlittgen, R. A weighted least-squares approach to clusterwise regression. AStA Adv Stat Anal 95, 205–217 (2011). https://doi.org/10.1007/s10182-011-0155-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-011-0155-4

Keywords

Navigation