Skip to main content

Topic Extraction with AGAPE

  • Conference paper
Advanced Data Mining and Applications (ADMA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

Abstract

This paper uses an optimization approach to address the problem of conceptual clustering. The aim of AGAPE, which is based on the tabu-search meta-heuristic using split, merge and a special “k-means” move, is to extract concepts by optimizing a global quality function. It is deterministic and uses no a priori knowledge about the number of clusters. Experiments carried out in topic extraction show very promising results on both artificial and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalski, R.S., Stepp, R.E., Diday, E.: A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts. Pattern Recognition (1), 33–55 (1981)

    Google Scholar 

  2. Mishra, N., Ron, D., Swaminathan, R.: A New Conceptual Clustering Framework. Machine Learning 56(1-3), 115–151 (2004)

    Article  MATH  Google Scholar 

  3. Sherali, H.D., Desai, J.: A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem. Journal of Global Optimization 32(2), 281–306 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Glover, F., Laguna, M.S.: Tabu Search. Kluwer Academic Publishers, Dordrecht (1997)

    Book  MATH  Google Scholar 

  5. Newman, D.J., Block, S.: Probabilistic Topic Decomposition of an Eighteenth-Century American Newspaper. Journal of the American Society for Information Science and Technology 57(6), 753–767 (2006)

    Article  Google Scholar 

  6. Ng, M.K., Wong, J.C.: Clustering categorical data sets using tabu search techniques. Pattern Recognition 35(12), 2783–2790 (2002)

    Article  MATH  Google Scholar 

  7. Velcin, J., Ganascia, J.-G.: Stereotype Extraction with Default Clustering. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland (2005)

    Google Scholar 

  8. Fisher, D.H.: Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning (2), 139–172 (1987)

    Google Scholar 

  9. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley, Califonia (1967)

    Google Scholar 

  10. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-Clustering. In: KDD 2003. Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM Press, New York (2003)

    Google Scholar 

  11. Aggarwal, C.: Re-designing distance functions and distance-based applications for high dimensional data. ACM SIGMOD Record 30(1), 13–18 (2001)

    Article  Google Scholar 

  12. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster Validity Methods: Part I - Part II. In: Special Interest Groups on Management Of Data (2002)

    Google Scholar 

  13. He, J., Tan, A.-H., Tan, C.-L., Sung, S.-Y.: On Qualitative Evaluation of Clustering Systems. In: Information Retrieval and Clustering, Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  14. Huang, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. In: DMKD, vol. 8 (1997)

    Google Scholar 

  15. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)

    Google Scholar 

  16. Chateauraynaud, F.: Prospéro: une technologie littéraire pour les sciences humaines. CNRS Editions (2003)

    Google Scholar 

  17. Kass, R.E., Raftery, A.E.: Bayes factors. Journal of American Statistical Association 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  18. Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)

    Article  MATH  Google Scholar 

  19. Gondek, D., Hofmann, T.: Non-redundant clustering with conditional ensembles. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, Illinois, pp. 70–77 (2005)

    Google Scholar 

  20. Dimitriadou, E., Weingessel, A., Hornik, K.: A cluster ensembles framework. In: Design and application of hybrid intelligent systems, pp. 528–534. IOS Press, Amsterdam (2003)

    Google Scholar 

  21. Fred, A., Jain, A.: Robust data clustering. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 128–133. IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Velcin, J., Ganascia, JG. (2007). Topic Extraction with AGAPE. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73871-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73870-1

  • Online ISBN: 978-3-540-73871-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics