Abstract
Having N points in a planar Pareto Front (2D PF), k-means and k-medoids are solvable in \(O(N^3)\) time by dynamic programming algorithms. Standard local search approaches, PAM and Lloyd’s heuristics, are investigated in the 2D PF case to solve faster large instances. Specific initialization strategies related to 2D PF cases are implemented with the generic ones (Forgy’s, Hartigans, k-means++). Applying PAM and Lloyd’s local search iterations, the quality of local minimums are compared with optimal values. Numerical results are computed using generated instances, which were made public. This study highlights that local minimums of a poor quality exist for 2D PF cases. A parallel or multi-start heuristic using four initialization strategies improves the accuracy to avoid poor local optimums. Perspectives are still open to improve local search heuristics for the specific 2D PF cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Celebi, M., Kingravi, H., Vela, P.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013). https://doi.org/10.1016/j.eswa.2012.07.021
Dupin, N.: Polynomial algorithms for p-dispersion problems in a 2D Pareto Front. arXiv preprint arXiv:2002.11830 (2020)
Dupin, N., Nielsen, F., Talbi, E.-G.: K-medoids clustering is solvable in polynomial time for a 2D Pareto Front. In: Le Thi, H.A., Le, H.M., Pham Dinh, T. (eds.) WCGO 2019. AISC, vol. 991, pp. 790–799. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-21803-4_79
Dupin, N., Nielsen, F., Talbi, E.-G.: Clustering a 2D Pareto Front: P-center problems are solvable in polynomial time. In: Dorronsoro, B., Ruiz, P., de la Torre, J.C., Urda, D., Talbi, E.-G. (eds.) OLA 2020. CCIS, vol. 1173, pp. 179–191. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41913-4_15
Dupin, N., Nielsen, F., Talbi, E.: Unified polynomial Dynamic Programming algorithms for p-center variants in a 2D Pareto Front. Mathematics 9(4), 453 (2021)
Dupin, N., Talbi, E.: Parallel matheuristics for the discrete unit commitment problem with min-stop ramping constraints. Int. Trans. Oper. Res. 27(1), 219–244 (2020)
Dupin, N., Talbi, E., Nielsen, F.: Dynamic programming heuristic for k-means clustering among a 2-dimensional pareto frontier. In: 7th International Conference on Metaheuristics and Nature Inspired Computing, META 2018 (2018)
Erkut, E., Neuman, S.: Comparison of four models for dispersing facilities. INFOR: Inf. Syst. Oper. Res. 29(2), 68–86 (1991)
Forgy, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)
Grønlund, A., et al.: Fast exact k-means, k-medians and Bregman divergence clustering in 1D. arXiv preprint arXiv:1701.07204 (2017)
Hartigan, J., Wong, M.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Hassin, R., Tamir, A.: Improved complexity bounds for location problems on the real line. Oper. Res. Lett. 10(7), 395–402 (1991)
Hsu, W., Nemhauser, G.: Easy and hard bottleneck location problems. Discret. Appl. Math. 1(3), 209–215 (1979)
Jain, A.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Kaufman, L., Rousseeuw, P.: Clustering by Means of Medoids. North-Holland (1987)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar K-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)
Nielsen, F.: Introduction to HPC with MPI for Data Science. Springer, Heidelberg (2016)
Pena, J., Lozano, J., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20(10), 1027–1040 (1999)
Wang, H., Song, M.: Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3(2), 29–33 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, J., Chen, Z., Dupin, N. (2021). Comparing Local Search Initialization for K-Means and K-Medoids Clustering in a Planar Pareto Front, a Computational Study. In: Dorronsoro, B., Amodeo, L., Pavone, M., Ruiz, P. (eds) Optimization and Learning. OLA 2021. Communications in Computer and Information Science, vol 1443. Springer, Cham. https://doi.org/10.1007/978-3-030-85672-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-85672-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85671-7
Online ISBN: 978-3-030-85672-4
eBook Packages: Computer ScienceComputer Science (R0)