Abstract
This paper uses an optimization approach to address the problem of conceptual clustering. The aim of AGAPE, which is based on the tabu-search meta-heuristic using split, merge and a special “k-means” move, is to extract concepts by optimizing a global quality function. It is deterministic and uses no a priori knowledge about the number of clusters. Experiments carried out in topic extraction show very promising results on both artificial and real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Michalski, R.S., Stepp, R.E., Diday, E.: A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts. Pattern Recognition (1), 33–55 (1981)
Mishra, N., Ron, D., Swaminathan, R.: A New Conceptual Clustering Framework. Machine Learning 56(1-3), 115–151 (2004)
Sherali, H.D., Desai, J.: A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem. Journal of Global Optimization 32(2), 281–306 (2005)
Glover, F., Laguna, M.S.: Tabu Search. Kluwer Academic Publishers, Dordrecht (1997)
Newman, D.J., Block, S.: Probabilistic Topic Decomposition of an Eighteenth-Century American Newspaper. Journal of the American Society for Information Science and Technology 57(6), 753–767 (2006)
Ng, M.K., Wong, J.C.: Clustering categorical data sets using tabu search techniques. Pattern Recognition 35(12), 2783–2790 (2002)
Velcin, J., Ganascia, J.-G.: Stereotype Extraction with Default Clustering. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland (2005)
Fisher, D.H.: Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning (2), 139–172 (1987)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley, Califonia (1967)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-Clustering. In: KDD 2003. Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM Press, New York (2003)
Aggarwal, C.: Re-designing distance functions and distance-based applications for high dimensional data. ACM SIGMOD Record 30(1), 13–18 (2001)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster Validity Methods: Part I - Part II. In: Special Interest Groups on Management Of Data (2002)
He, J., Tan, A.-H., Tan, C.-L., Sung, S.-Y.: On Qualitative Evaluation of Clustering Systems. In: Information Retrieval and Clustering, Kluwer Academic Publishers, Dordrecht (2002)
Huang, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. In: DMKD, vol. 8 (1997)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)
Chateauraynaud, F.: Prospéro: une technologie littéraire pour les sciences humaines. CNRS Editions (2003)
Kass, R.E., Raftery, A.E.: Bayes factors. Journal of American Statistical Association 90, 773–795 (1995)
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)
Gondek, D., Hofmann, T.: Non-redundant clustering with conditional ensembles. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, Illinois, pp. 70–77 (2005)
Dimitriadou, E., Weingessel, A., Hornik, K.: A cluster ensembles framework. In: Design and application of hybrid intelligent systems, pp. 528–534. IOS Press, Amsterdam (2003)
Fred, A., Jain, A.: Robust data clustering. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 128–133. IEEE Computer Society Press, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Velcin, J., Ganascia, JG. (2007). Topic Extraction with AGAPE. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)