Skip to main content

On the k-Medoids Model for Semi-supervised Clustering

  • Conference paper
  • First Online:
Variable Neighborhood Search (ICVNS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11328))

Included in the following conference series:

Abstract

Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. A major challenge with clustering is to define an appropriate clustering criterion that can express a good separation of data into homogeneous groups such that the obtained clustering solution is meaningful and useful to the user. To circumvent this issue, it is suggested that the domain expert could provide background information about the dataset, which can be incorporated by a clustering algorithm in order to improve the solution. Performing clustering under this assumption is known as semi-supervised clustering. This work explores semi-supervised clustering through the k-medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the k-medoids model presents classification accuracy compared to the traditional k-means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with pairwise constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  2. Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE TPAMI 4, 277–291 (1980)

    MATH  Google Scholar 

  3. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, Boca Raton (2013)

    MATH  Google Scholar 

  4. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)

    MATH  Google Scholar 

  5. Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. EJOR 130(3), 449–467 (2001)

    MathSciNet  MATH  Google Scholar 

  6. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  7. Bekkerman, R., Sahami, M.: Semi-supervised clustering using combinatorial MRFs. In: ICML (2006)

    Google Scholar 

  8. Yan, B., Domeniconi, C.: An adaptive kernel method for semi-supervised clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 521–532. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_49

    Google Scholar 

  9. Law, M.H.C., Topchy, A., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SIAM-SDM, pp. 641–645 (2005)

    Google Scholar 

  10. Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Min. Knowl. Disc. 21(3), 345–370 (2010)

    MathSciNet  Google Scholar 

  11. Xia, Y.: A global optimization method for semi-supervised clustering. Data Min. Knowl. Disc. 18(2), 214–256 (2009)

    MathSciNet  Google Scholar 

  12. Tuy, H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)

    MATH  Google Scholar 

  13. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)

    Google Scholar 

  14. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp. 10(5), 1299–1319 (1998)

    Google Scholar 

  15. Christofides, N.: Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics). Academic Press Inc., Orlando (1975)

    MATH  Google Scholar 

  16. Steinley, D.: K-medoids and other criteria for crisp clustering. In: Handbook of Cluster Analysis. Chapman and Hall/CRC Handbooks of Modern Statistical Methods. CRC Press (2015)

    Google Scholar 

  17. Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990)

    Google Scholar 

  18. Teitz, M.B., Bart, P.: Heuristic methods for estimating the generalized vertex median of aweighted graph. Oper. Res. 16(5), 955–961 (1968)

    MATH  Google Scholar 

  19. Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21(2), 95–108 (1983)

    MATH  Google Scholar 

  20. Hansen, P., Mladenović, N.: Variable neighborhood search for the P-median. Locat. Sci. 5(4), 207–226 (1997)

    MATH  Google Scholar 

  21. Resende, M.G.C., Werneck, R.F.: A fast swap-based local search procedure for location problems. ANOR 150(1), 205–230 (2007)

    MathSciNet  MATH  Google Scholar 

  22. Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SIAM-SDM, pp. 138–149 (2005)

    Google Scholar 

  23. Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)

    MathSciNet  MATH  Google Scholar 

  24. Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415, 247–253 (2017)

    Google Scholar 

  25. Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recog. 45(12), 4337–4345 (2012)

    Google Scholar 

  26. Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recog. 34(2), 405–413 (2001)

    MATH  Google Scholar 

  27. Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. EJOR 253(3), 659–672 (2016)

    MathSciNet  MATH  Google Scholar 

  28. Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)

    Google Scholar 

  29. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    MATH  Google Scholar 

  30. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  31. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, pp. 81–88. ACM, New York (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Randel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Randel, R., Aloise, D., Mladenović, N., Hansen, P. (2019). On the k-Medoids Model for Semi-supervised Clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds) Variable Neighborhood Search. ICVNS 2018. Lecture Notes in Computer Science(), vol 11328. Springer, Cham. https://doi.org/10.1007/978-3-030-15843-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15843-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15842-2

  • Online ISBN: 978-3-030-15843-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics