Skip to main content
Log in

The bi-criteria seeding algorithms for two variants of k-means problem

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The k-means problem is very classic and important in computer science and machine learning, so there are many variants presented depending on different backgrounds, such as the k-means problem with penalties, the spherical k-means clustering, and so on. Since the k-means problem is NP-hard, the research of its approximation algorithm is very hot. In this paper, we apply a bi-criteria seeding algorithm to both k-means problem with penalties and spherical k-means problem, and improve (upon) the performance guarantees given by the k-means++ algorithm for these two problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp 15–28

  • Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp 61–72

  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of SODA, pp 1027–1035

  • Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp 754–767

  • Bachem O, Lucic M, Hassani SH, Krause A (2016) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp 1459–1467

  • Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp 292–300

  • Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm—a survey. In: Algorithm engineering. Springer, pp 81–116

  • Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175

    Article  Google Scholar 

  • Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33

    Article  Google Scholar 

  • Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of MDAI, pp 103–114

  • Feng Q, Zhang Z, Shi F, Wang J (2019) An improved approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of FAW, pp 170–181

  • Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43

    Article  MathSciNet  Google Scholar 

  • Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim. https://doi.org/10.1007/s10898-019-00779-w

    Article  MATH  Google Scholar 

  • Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding algorithm for \(k\)-means problem with penalties. J Comb Optim 39:15–32

    Article  MathSciNet  Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:21–33

    Article  MathSciNet  Google Scholar 

  • Makarychev K, Makarychev Y, Sviridenko M, Ward J (2016) A bi-criteria approximation algorithm for \(k\)-means. In: Proceedings of APPROX/RONDOM, pp 14:1–14:20

  • Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22

    Article  MathSciNet  Google Scholar 

  • Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255

    Article  Google Scholar 

  • Wei D (2016) A constant-factor bi-criteria approximation guarantee for \(k\)-means++. In: Proceedings of NIPS, pp 604–612

  • Wu X, Kumar V, Quinlan J, Ross Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37

    Article  Google Scholar 

  • Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74

    Article  Google Scholar 

  • Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109

    MATH  Google Scholar 

  • Xu D, Xu Y, Zhang D (2018) A survey on the initialization methods for the \(k\)-means algorithm. Oper Res Trans 22:31–40

    MathSciNet  MATH  Google Scholar 

  • Zhang D, Cheng Y, Li M, Wang Y, Xu D (2019) Local search approximation algorithms for the spherical \(k\)-means problem. In: Proceedings of AAIM, pp 341–351

  • Zhang D, Hao C, Wu C, Xu D, Zhang Z (2018) A local search approximation algorithm for the \(k\)-means problem with penalties. J Comb Optim 37:439–453

    Article  MathSciNet  Google Scholar 

  • Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Technical report \(\sharp \)01-40, Department of Computer Science, University of Minnesota

Download references

Acknowledgements

The author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171) and Shandong Provincial Natural Science Foundation (No. ZR2019MA032) of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Li.

Additional information

Dedicated to Professor Minyi Yue on the Occasion of His 100th Birthday.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M. The bi-criteria seeding algorithms for two variants of k-means problem. J Comb Optim 44, 1693–1704 (2022). https://doi.org/10.1007/s10878-020-00537-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-020-00537-9

Keywords

Navigation