Skip to main content

An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12296))

Abstract

Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Personal communication from one of the authors.

  2. 2.

    https://archive.ics.uci.edu.

  3. 3.

    Source-code can be retrieved from: https://github.com/mnhaouas/card-const-MSSC.

  4. 4.

    https://cp4clustering.github.io/.

  5. 5.

    https://github.com/Behrouz-Babaki/MinSizeKmeans.

References

  1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)

    Article  Google Scholar 

  2. Aloise, D., Hansen, P.: Evaluating a branch-and-bound rlt-based algorithm for minimum sum-of-squares clustering. J. Global Optim. 49(3), 449–465 (2011)

    Article  MathSciNet  Google Scholar 

  3. Babaki, B., Guns, T., Nijssen, S.: Constrained clustering using column generation. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 438–454. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07046-9_31

    Chapter  Google Scholar 

  4. Balcan, M.F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general topologies. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2013, pp. 1995–2003, USA. Curran Associates Inc. (2013)

    Google Scholar 

  5. Banerjee, A., Ghosh, J.: Scalable clustering algorithms with balancing constraints. Data Min. Knowl. Disc. 13(3), 365–395 (2006)

    Article  MathSciNet  Google Scholar 

  6. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC (2008)

    Google Scholar 

  7. Bellman, R.: On a routing problem. Q. Appl. Math. 16(1), 87–90 (1958)

    Article  Google Scholar 

  8. Bennett, K.P., Bradley, P.S., Demiriz, A.: Constrained k-means clustering. Technical report MSR-TR-2000-65, Microsoft Research, May 2000

    Google Scholar 

  9. Bertoni, A., Goldwurm, M., Lin, J., Saccà, F.: Size constrained distance clustering: separation properties and some complexity results. Fundamenta Informaticae 115, 125–139 (2012)

    Google Scholar 

  10. Brusco, M.J.: A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning. Psychometrika 71(2), 347–363 (2006)

    Article  MathSciNet  Google Scholar 

  11. Carbonneau, R.A., Caporossi, G., Hansen, P.: Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression. Comput. Opera. Res. 39(11), 2748–2762 (2012)

    Article  MathSciNet  Google Scholar 

  12. Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017)

    Article  Google Scholar 

  13. Dao, T.-B.-H., Duong, K.-C., Vrain, C.: Constrained minimum sum of squares clustering by constraint programming. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 557–573. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23219-5_39

    Chapter  Google Scholar 

  14. Dao, T.-B.-H., Duong, K.-C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017). Combining Constraint Solving with Mining and Learning

    Google Scholar 

  15. Desrosiers, J., Mladenović, N., Villeneuve, D.: Design of balanced mba student teams. J. Oper. Res. Soc. 56(1), 60–66 (2005)

    Article  Google Scholar 

  16. Guns, T., Dao, T.-B.-H., Vrain, C., Duong, K.-C.: Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering. In: Proceedings of the Twenty-Second European Conference on Artificial Intelligence, ECAI 2016, pp. 462–470. IOS Press, Amsterdam (2016)

    Google Scholar 

  17. Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11(9), 1074–1085 (1992)

    Article  Google Scholar 

  18. Hair, J.F., Tatham, R.L., Anderson, R.E., Black, W.: Multivariate Data Analysis, 5th edn. Pearson, New York (1998)

    Google Scholar 

  19. Jungnickel, D.: The network simplex algorithm. In: Graphs, Networks and Algorithms. Algorithms and Computation in Mathematics, pp. 321–339. Springer, Heidelberg (2005)

    Google Scholar 

  20. Law, Y.C., Lee, J.H.M.: Global constraints for integer and set value precedence. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 362–376. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30201-8_28

    Chapter  Google Scholar 

  21. Quimper, C.-G., López-Ortiz, A., van Beek, P., Golynski, A.: Improved algorithms for the global cardinality constraint. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 542–556. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30201-8_40

    Chapter  MATH  Google Scholar 

  22. Régin, J.-C.: Arc consistency for global cardinality constraints with costs. In: Jaffar, J. (ed.) CP 1999. LNCS, vol. 1713, pp. 390–404. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-540-48085-3_28

    Chapter  Google Scholar 

  23. Rujeerapaiboon, N., Schindler, K., Kuhn, D., Wiesemann, W.: Size matters: cardinality-constrained clustering and outlier detection via conic optimization. SIAM J. Optim. 29(2), 1211–1239 (2019)

    Article  MathSciNet  Google Scholar 

  24. Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)

    Article  Google Scholar 

  25. Tang, W., Yang, Y., Zeng, L., Zhan, Y.: Size constrained clustering with milp formulation. IEEE Access 8, 1587–1599 (2020)

    Article  Google Scholar 

  26. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  27. Walsh, T.: Symmetry breaking constraints: Recent results. In: AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  28. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Google Scholar 

Download references

Acknowledgements

Financial support from a Natural Sciences and Engineering Research Council of Canada (NSERC) graduate scholarship is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mohammed Najib Haouas , Daniel Aloise or Gilles Pesant .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haouas, M.N., Aloise, D., Pesant, G. (2020). An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. In: Hebrard, E., Musliu, N. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2020. Lecture Notes in Computer Science(), vol 12296. Springer, Cham. https://doi.org/10.1007/978-3-030-58942-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58942-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58941-7

  • Online ISBN: 978-3-030-58942-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics