Abstract
Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Personal communication from one of the authors.
- 2.
- 3.
Source-code can be retrieved from: https://github.com/mnhaouas/card-const-MSSC.
- 4.
- 5.
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Aloise, D., Hansen, P.: Evaluating a branch-and-bound rlt-based algorithm for minimum sum-of-squares clustering. J. Global Optim. 49(3), 449–465 (2011)
Babaki, B., Guns, T., Nijssen, S.: Constrained clustering using column generation. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 438–454. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07046-9_31
Balcan, M.F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general topologies. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2013, pp. 1995–2003, USA. Curran Associates Inc. (2013)
Banerjee, A., Ghosh, J.: Scalable clustering algorithms with balancing constraints. Data Min. Knowl. Disc. 13(3), 365–395 (2006)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC (2008)
Bellman, R.: On a routing problem. Q. Appl. Math. 16(1), 87–90 (1958)
Bennett, K.P., Bradley, P.S., Demiriz, A.: Constrained k-means clustering. Technical report MSR-TR-2000-65, Microsoft Research, May 2000
Bertoni, A., Goldwurm, M., Lin, J., Saccà, F.: Size constrained distance clustering: separation properties and some complexity results. Fundamenta Informaticae 115, 125–139 (2012)
Brusco, M.J.: A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning. Psychometrika 71(2), 347–363 (2006)
Carbonneau, R.A., Caporossi, G., Hansen, P.: Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression. Comput. Opera. Res. 39(11), 2748–2762 (2012)
Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017)
Dao, T.-B.-H., Duong, K.-C., Vrain, C.: Constrained minimum sum of squares clustering by constraint programming. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 557–573. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23219-5_39
Dao, T.-B.-H., Duong, K.-C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017). Combining Constraint Solving with Mining and Learning
Desrosiers, J., Mladenović, N., Villeneuve, D.: Design of balanced mba student teams. J. Oper. Res. Soc. 56(1), 60–66 (2005)
Guns, T., Dao, T.-B.-H., Vrain, C., Duong, K.-C.: Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering. In: Proceedings of the Twenty-Second European Conference on Artificial Intelligence, ECAI 2016, pp. 462–470. IOS Press, Amsterdam (2016)
Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11(9), 1074–1085 (1992)
Hair, J.F., Tatham, R.L., Anderson, R.E., Black, W.: Multivariate Data Analysis, 5th edn. Pearson, New York (1998)
Jungnickel, D.: The network simplex algorithm. In: Graphs, Networks and Algorithms. Algorithms and Computation in Mathematics, pp. 321–339. Springer, Heidelberg (2005)
Law, Y.C., Lee, J.H.M.: Global constraints for integer and set value precedence. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 362–376. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30201-8_28
Quimper, C.-G., López-Ortiz, A., van Beek, P., Golynski, A.: Improved algorithms for the global cardinality constraint. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 542–556. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30201-8_40
Régin, J.-C.: Arc consistency for global cardinality constraints with costs. In: Jaffar, J. (ed.) CP 1999. LNCS, vol. 1713, pp. 390–404. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-540-48085-3_28
Rujeerapaiboon, N., Schindler, K., Kuhn, D., Wiesemann, W.: Size matters: cardinality-constrained clustering and outlier detection via conic optimization. SIAM J. Optim. 29(2), 1211–1239 (2019)
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)
Tang, W., Yang, Y., Zeng, L., Zhan, Y.: Size constrained clustering with milp formulation. IEEE Access 8, 1587–1599 (2020)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 577–584. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Walsh, T.: Symmetry breaking constraints: Recent results. In: AAAI Conference on Artificial Intelligence (2012)
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Acknowledgements
Financial support from a Natural Sciences and Engineering Research Council of Canada (NSERC) graduate scholarship is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Haouas, M.N., Aloise, D., Pesant, G. (2020). An Exact CP Approach for the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. In: Hebrard, E., Musliu, N. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2020. Lecture Notes in Computer Science(), vol 12296. Springer, Cham. https://doi.org/10.1007/978-3-030-58942-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-58942-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58941-7
Online ISBN: 978-3-030-58942-4
eBook Packages: Computer ScienceComputer Science (R0)