Clustering with Random Indexing K-tree and XML Structure

De Vries, Christopher M.; Geva, Shlomo; De Vine, Lance

doi:10.1007/978-3-642-14556-8_40

Christopher M. De Vries¹⁹,
Shlomo Geva¹⁹ &
Lance De Vine¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6203))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

556 Accesses
2 Citations

Abstract

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K-tree project page (2009), http://ktree.sourceforge.net
Geva, S.: K-tree: a height balanced tree structured vector quantizer. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop Neural Networks for Signal Processing X, vol. 1, pp. 271–280 (2000)
Google Scholar
De Vries, C., Geva, S.: Document clustering with k-tree. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 420–431. Springer, Heidelberg (2009)
Chapter Google Scholar
De Vries, C., Geva, S.: K-tree: large scale document clustering. In: SIGIR 2009: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 718–719. ACM, New York (2009)
Google Scholar
Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005 (2005)
Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics 26(189-206), 1 (1984)
MathSciNet Google Scholar
Dasgupta, S., Gupta, A.: An elementary proof of the Johnson-Lindenstrauss lemma. Random Structures & Algorithms 22(1), 60–65 (2002)
Article MathSciNet Google Scholar
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66(4), 671–687 (2003)
Article MATH MathSciNet Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 245–250. ACM, New York (2001)
Google Scholar
Kanerva, P.: The spatter code for encoding concepts at many levels. In: ICANN 1994, Proceedings of the International Conference on Artificial Neural Networks (1994)
Google Scholar
Plate, T.: Distributed representations and nested compositional structure. PhD thesis (1994)
Google Scholar
De Vries, C., De Vine, L., Geva, S.: Random indexing k-tree. In: ADCS 2009: Australian Document Computing Symposium 2009, Sydney, Australia (2009)
Google Scholar
Robertson, S., Jones, K.: Simple, proven approaches to text retrieval. Update (1997)
Google Scholar
Guyon, I., von Luxburg, U., Tubingen, G., Williamson, R., Canberra, A.: Clustering: Science or Art?
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia
Christopher M. De Vries, Shlomo Geva & Lance De Vine

Authors

Christopher M. De Vries
View author publications
You can also search for this author in PubMed Google Scholar
Shlomo Geva
View author publications
You can also search for this author in PubMed Google Scholar
Lance De Vine
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, 4001, Brisbane, Qld, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Department of Computer Science, University of Otago, P.O. Box 56,, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Vries, C.M., Geva, S., De Vine, L. (2010). Clustering with Random Indexing K-tree and XML Structure. In: Geva, S., Kamps, J., Trotman, A. (eds) Focused Retrieval and Evaluation. INEX 2009. Lecture Notes in Computer Science, vol 6203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14556-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-14556-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14555-1
Online ISBN: 978-3-642-14556-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics