Semi-supervised Constrained Clustering: An Expert-Guided Data Analysis Methodology

Podpečan, Vid; Grčar, Miha; Lavrač, Nada

doi:10.1007/978-3-642-15246-7_22

Semi-supervised Constrained Clustering: An Expert-Guided Data Analysis Methodology

Vid Podpečan²¹,
Miha Grčar²¹ &
Nada Lavrač^21,22

Conference paper

1631 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Abstract

This paper presents a methodology for expert-guided analysis of large data sets, including large text corpora. Its main ingredient is the algorithm for semi-supervised data clustering using cluster size constraints which implements several improvements over existing k-means constrained clustering algorithms. First, it allows for a larger set of user-defined cluster size constraints of different types (lower- and upper-bound constraints). Second, it allows for dynamic re-assignment of predefined constraints to clusters in iterative cluster computation/optimization, thus improving the results of constrained clustering. Third, it allows for expert-guided cluster optimization achieved by combining constrained clustering and data visualization, which enables finer-grained expert’s control over the clustering process, leading to further improvements of the quality of obtained clustering solutions. Incorporating data visualization into the clustering process allows the user to select referential points which act as constraint anchors in the course of iterative cluster computation. The proposed semi-supervised constrained clustering methodology has been implemented using a service-oriented data mining environment Orange4WS and evaluated on different document corpora.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berkhin, P.: Survey of Clustering Data Mining Techniques. Research Paper. Accrue Software Inc. (2002)
Google Scholar
Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)
MATH Google Scholar
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. Miscrosoft Research publication, MSR-TR-2000-65 (May 2000)
Google Scholar
Bradley, P.S., Mangasarian, O.L., Street, W.N.: Clustering via concave mininization. In: Advances in Neural Information Prcessing Systems, vol. 9, pp. 368–374. MIT Press, Cambridge (1997)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Machine Intelligence 1(4), 224–227 (1979)
Article Google Scholar
Dhillon, I., Guan, Y., Kogan, J.: Refining clusters in high dimensional data. In: Second SIAM ICDM Workshop on Clustering High Dimensional Data (2002)
Google Scholar
Faloutsos, C., Lin, K.: FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (1995)
Google Scholar
Forgy, E.: Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics 21, 768–780 (1965)
Google Scholar
Fortuna, B., Grobelnik, M., Mladenić, D.: Semi-automatic Data-driven Ontology Construction System. In: Proc. of the 9th Intl. Multiconf. Information Society IS 2006, Ljubljana, Slovenia (2006)
Google Scholar
Gansner, E.R., Koren, Y., North, S.: Graph Drawing by Stress Majorization. In: Pach, J. (ed.) GD 2004. LNCS, vol. 3383, pp. 239–250. Springer, Heidelberg (2005)
Chapter Google Scholar
Karp, R.M.: Reducibility Among Combinatorial Problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum, New York (1972)
Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1997)
MATH Google Scholar
Paige, C.C., Saunders, M.A.: Algorithm 583; LSQR: Sparse Linear Equations and Least-squares Problems. ACM Trans. on Mathematical Software (TOMS) 8(2), 195–209 (1982)
Article MathSciNet Google Scholar
Paulovich, F.V., Nonato, L.G., Minghim, R.: Visual Mapping of Text Collections through a Fast High Precision Projection Technique. In: Proc. of the 10th Conf. on Information Visualization, pp. 282–290 (2006)
Google Scholar
Podpečan, V., Juršič, M., Žakova, M., Lavrač, N.: Towards a Service-Oriented Knowledge Discovery Platform. In: SoKD: ECML/PKDD 2009 workshop on Third Generation Data Mining (2009)
Google Scholar
Sorkine, O., Cohen-Or, D.: Least-squares Meshes. In: Proc. of the Intl. Conference on Shape Modeling, pp. 191–199 (2004)
Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data mining. Addison Wesley, Reading (2006)
Google Scholar
Tung, A.K.H., Ng, R.T., Lakshmanan, L.V.S., Han, J.: Constraint-based clustering in large databases. In: Proc. of the 8th Intl. Conf. on Database Theory, pp. 405–419 (2001)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: Proc. of the 17th Intl. Conf. on Machine Learning, pp. 1103–1110 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Ljubljana, Slovenia
Vid Podpečan, Miha Grčar & Nada Lavrač
University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač

Authors

Vid Podpečan
View author publications
You can also search for this author in PubMed Google Scholar
Miha Grčar
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Seoul National University, 151-744, Seoul, Korea
Byoung-Tak Zhang
Department of Computing,, Macquarie University, NSW, Sydney, Australia
Mehmet A. Orgun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podpečan, V., Grčar, M., Lavrač, N. (2010). Semi-supervised Constrained Clustering: An Expert-Guided Data Analysis Methodology. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-15246-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15245-0
Online ISBN: 978-3-642-15246-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics