On the k-Medoids Model for Semi-supervised Clustering

Randel, Rodrigo; Aloise, Daniel; Mladenović, Nenad; Hansen, Pierre

doi:10.1007/978-3-030-15843-9_2

Rodrigo Randel¹⁷,
Daniel Aloise¹⁷,
Nenad Mladenović¹⁸ &
…
Pierre Hansen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11328))

Included in the following conference series:

International Conference on Variable Neighborhood Search

787 Accesses
3 Citations

Abstract

Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. A major challenge with clustering is to define an appropriate clustering criterion that can express a good separation of data into homogeneous groups such that the obtained clustering solution is meaningful and useful to the user. To circumvent this issue, it is suggested that the domain expert could provide background information about the dataset, which can be incorporated by a clustering algorithm in order to improve the solution. Performing clustering under this assumption is known as semi-supervised clustering. This work explores semi-supervised clustering through the k-medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the k-medoids model presents classification accuracy compared to the traditional k-means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with pairwise constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
MathSciNet MATH Google Scholar
Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE TPAMI 4, 277–291 (1980)
MATH Google Scholar
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, Boca Raton (2013)
MATH Google Scholar
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)
MATH Google Scholar
Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. EJOR 130(3), 449–467 (2001)
MathSciNet MATH Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Google Scholar
Bekkerman, R., Sahami, M.: Semi-supervised clustering using combinatorial MRFs. In: ICML (2006)
Google Scholar
Yan, B., Domeniconi, C.: An adaptive kernel method for semi-supervised clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 521–532. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_49
Google Scholar
Law, M.H.C., Topchy, A., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SIAM-SDM, pp. 641–645 (2005)
Google Scholar
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Min. Knowl. Disc. 21(3), 345–370 (2010)
MathSciNet Google Scholar
Xia, Y.: A global optimization method for semi-supervised clustering. Data Min. Knowl. Disc. 18(2), 214–256 (2009)
MathSciNet Google Scholar
Tuy, H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)
MATH Google Scholar
Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)
Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp. 10(5), 1299–1319 (1998)
Google Scholar
Christofides, N.: Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics). Academic Press Inc., Orlando (1975)
MATH Google Scholar
Steinley, D.: K-medoids and other criteria for crisp clustering. In: Handbook of Cluster Analysis. Chapman and Hall/CRC Handbooks of Modern Statistical Methods. CRC Press (2015)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990)
Google Scholar
Teitz, M.B., Bart, P.: Heuristic methods for estimating the generalized vertex median of aweighted graph. Oper. Res. 16(5), 955–961 (1968)
MATH Google Scholar
Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21(2), 95–108 (1983)
MATH Google Scholar
Hansen, P., Mladenović, N.: Variable neighborhood search for the P-median. Locat. Sci. 5(4), 207–226 (1997)
MATH Google Scholar
Resende, M.G.C., Werneck, R.F.: A fast swap-based local search procedure for location problems. ANOR 150(1), 205–230 (2007)
MathSciNet MATH Google Scholar
Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SIAM-SDM, pp. 138–149 (2005)
Google Scholar
Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)
MathSciNet MATH Google Scholar
Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415, 247–253 (2017)
Google Scholar
Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recog. 45(12), 4337–4345 (2012)
Google Scholar
Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recog. 34(2), 405–413 (2001)
MATH Google Scholar
Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. EJOR 253(3), 659–672 (2016)
MathSciNet MATH Google Scholar
Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, pp. 81–88. ACM, New York (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

École Polytechnique de Montréal, Montréal, Canada
Rodrigo Randel & Daniel Aloise
Mathematical Institute, Serbian Academy of Science and Arts, Belgrade, Serbia
Nenad Mladenović
HEC Montréal, Montréal, Canada
Pierre Hansen

Authors

Rodrigo Randel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Aloise
View author publications
You can also search for this author in PubMed Google Scholar
Nenad Mladenović
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Randel .

Editor information

Editors and Affiliations

University of Macedonia, Thessaloniki, Greece
Angelo Sifaleras
University of Kent, Canterbury, UK
Said Salhi
Royal Military College of Canada, Kingston, ON, Canada
Jack Brimberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Randel, R., Aloise, D., Mladenović, N., Hansen, P. (2019). On the k-Medoids Model for Semi-supervised Clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds) Variable Neighborhood Search. ICVNS 2018. Lecture Notes in Computer Science(), vol 11328. Springer, Cham. https://doi.org/10.1007/978-3-030-15843-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-15843-9_2
Published: 14 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15842-2
Online ISBN: 978-3-030-15843-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics