ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Data & Knowledge Engineering
Volume 49, Issue 2, May 2004, Pages 129-143
Web Information and Data Management
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (324 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.datak.2003.10.006    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2003 Elsevier B.V. All rights reserved.

Finding aliases on the web using latent semantic analysis

Vinay Bhat , Tim Oates Corresponding Author Contact Information, E-mail The Corresponding Author, Vishal Shanbhag and Charles Nicholas

Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD 21250, USA

Available online 21 November 2003.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

A common problem faced when gathering information from the web is the use of different names to refer to the same entity. For example, the city in India referred to as Bombay in some documents may be referred to as Mumbai in others because its name officially changed from the former to the latter in 1995. Multiplicity of names can cause relevant documents to be missed by search engines. Our goal is to develop an automated system that discovers additional names for an entity given just one of its names. Latent semantic analysis (LSA) is generally thought to be well-suited for this task [Numerical linear algebra with applications 3(4) (1996) 301]. We demonstrate empirically that under a broad range of circumstances LSA performs poorly, and describe a two-stage algorithm based on LSA that performs significantly better.

Author Keywords: Author Keywords: Aliases; Latent semantic analysis; Search engines

Article Outline

1. Introduction
2. Latent semantic analysis
3. Empirical evaluation of LSA for finding aliases
4. A two-stage algorithm based on LSA
5. Dynamic corpus creation
6. Experiments
6.1. Names of chemical compounds
6.2. TOEFL synonym questions
7. Discussion
References
Vitae




Data & Knowledge Engineering
Volume 49, Issue 2, May 2004, Pages 129-143
Web Information and Data Management
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.