loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Philipp Baumann 1 and Dorit S. Hochbaum 2

Affiliations: 1 Department of Business Administration, University of Bern, Schuetzenmattstrasse 14, 3012 Bern, Switzerland ; 2 IEOR Department, University of California, Berkeley, Etcheverry Hall, CA 94720, U.S.A.

Keyword(s): Constrained Clustering, Must-link and Cannot-link Constraints, Mixed-binary Linear Programming.

Abstract: The k-means algorithm is one of the most widely-used algorithms in clustering. It is known to be effective when the clusters are homogeneous and well separated in the feature space. When this is not the case, incorporating pairwise must-link and cannot-link constraints can improve the quality of the resulting clusters. Various extensions of the k-means algorithm have been proposed that incorporate the must-link and cannot-link constraints using heuristics. We introduce a different approach that uses a new mixed-integer programming formulation. In our approach, the pairwise constraints are incorporated as soft-constraints that can be violated subject to a penalty. In a computational study based on 25 data sets, we compare the proposed algorithm to a state-of-the-art algorithm that was previously shown to dominate the other algorithms in this area. The results demonstrate that the proposed algorithm provides better clusterings and requires considerably less running time than the state- of-the-art algorithm. Moreover, we found that the ability to vary the penalty is beneficial in situations where the pairwise constraints are noisy due to corrupt ground truth. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.90.167.73

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Baumann, P. and Hochbaum, D. (2022). A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 195-202. DOI: 10.5220/0010800000003122

@conference{icpram22,
author={Philipp Baumann. and Dorit S. Hochbaum.},
title={A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={195-202},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010800000003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints
SN - 978-989-758-549-4
IS - 2184-4313
AU - Baumann, P.
AU - Hochbaum, D.
PY - 2022
SP - 195
EP - 202
DO - 10.5220/0010800000003122
PB - SciTePress