Skip to main content

On the Complexity of Clustering with Relaxed Size Constraints

  • Conference paper
  • First Online:
  • 392 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9778))

Abstract

We study the computational complexity of the problem of computing an optimal clustering \(\{A_1,A_2,...,A_k\}\) of a set of points assuming that every cluster size \(|A_i|\) belongs to a given set M of positive integers. We present a polynomial time algorithm for solving the problem in dimension 1, i.e. when the points are simply rational values, for an arbitrary set M of size constraints, which extends to the \(\ell _1\)-norm an analogous procedure known for the \(\ell _2\)-norm. Moreover, we prove that in the Euclidean plane, i.e. assuming dimension 2 and \(\ell _2\)-norm, the problem is NP-hard even with size constraints set reduced to \(M=\{2,3\}\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    If X does not admit a \(\mathcal {M}\)-clustering then symbol \(\bot \) is returned.

References

  1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–249 (2009)

    Article  Google Scholar 

  2. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman and Hall/CRC, Boca Raton (2008)

    MATH  Google Scholar 

  3. Bertoni, A., Goldwurm, M., Lin, J., Saccà, F.: Size constrained distance clustering: separation properties and some complexity results. Fundamenta Informaticae 115(1), 125–139 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Bertoni, A., Rè, M., Saccà, F., Valentini, G.: Identification of promoter regions in genomic sequences by 1-dimensional constraint clustering. In: Neural Nets WIRN11, pp. 162–169 (2011)

    Google Scholar 

  5. Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  6. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. Technical report MSR-TR-2000-65, Miscrosoft Research Publication, May 2000

    Google Scholar 

  7. Dasgupta, S.: The hardness of \(k\)-means clustering. Technical report CS2007-0890, Department of Computer Science and Engineering, University of California, San Diego (2007)

    Google Scholar 

  8. Fisher, W.D.: On grouping for maximum homogeneity. J. Am. Stat. Assoc. 53(284), 789–798 (1958)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fößmeier, U., Kant, G., Kaufmann, M.: 2-Visibility drawings of planar graphs. In: North, S. (ed.) Graph Drawing. LNCS, vol. 1190, pp. 155–168. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  10. Hasegawa, S., Imai, H., Inaba, M., Katoh, N.: Efficient algorithms for variance-based \(k\)-clustering. In: Proceedings of Pacific Graphics 1993, pp. 75–89 (1993)

    Google Scholar 

  11. Knuth, D.E., Raghunathan, A.: The problem of compatible representatives. SIAM J. Discrete Math. 5(3), 422–427 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lichtenstein, D.: Planar formulae and their uses. SIAM J. Comput. 11(2), 329–343 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lin, J., Bertoni, A., Goldwurm, M.: Exact algorithms for size constrained 2-clustering in the plane. Theor. Comput. Sci. 629, 80–95 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  15. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  16. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Mulzer, W., Rote, G.: Minimum-weight triangulation is NP-hard. J. ACM 55(2), 11 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Papadimitriou, C., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover, New York (1998)

    MATH  Google Scholar 

  19. Rao, M.R.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66(335), 622–626 (1971)

    Article  MATH  Google Scholar 

  20. Stephan, R.: Cardinality constrained combinatorial optimization: complexity and polyhedra. Discrete Optim. 7(3), 99–113 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large databases. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405–419. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  22. Vattani, A.: \(k\)-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Vazirani, V.: Approximation Algorithms. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  24. Vinod, H.: Integer programming and the theory of grouping. J. Am. Stat. Assoc. 64(326), 506–519 (1969)

    Article  MATH  Google Scholar 

  25. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  26. Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowl. Based Syst. 23(8), 883–889 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

We thank an anonymous referee for his/her useful comments on other problems related to clustering with relaxed size constraints.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianyi Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Goldwurm, M., Lin, J., Saccà, F. (2016). On the Complexity of Clustering with Relaxed Size Constraints. In: Dondi, R., Fertin, G., Mauri, G. (eds) Algorithmic Aspects in Information and Management. AAIM 2016. Lecture Notes in Computer Science(), vol 9778. Springer, Cham. https://doi.org/10.1007/978-3-319-41168-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41168-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41167-5

  • Online ISBN: 978-3-319-41168-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics