Copyright © 2004 Elsevier Ltd All rights reserved.
A framework for understanding Latent Semantic Indexing (LSI) performance
Accepted 12 November 2004.
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval application. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term by dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the Singular Value Decomposition (SVD) algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.
Keywords: Latent Semantic Indexing; Term co-occurrence; Singular value; Decomposition; Information retrieval theory
Article Outline
- 1. Introduction
- 2. Overview of Latent Semantic Indexing
- 3. Higher-order co-occurrence in LSI
- 3.1. Data sets
- 3.2. Methodology
- 3.3. Results
- 4. Analysis of the LSI values
- 4.1. Data sets
- 4.2. Methodology
- 4.3. Results
- 4.4. Discussion
- 5. Transitivity and the SVD
- 6. Conclusions and future work
- Acknowledgements
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (13)







