ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computer Networks
Volume 31, Issues 11-16, 17 May 1999, Pages 1467-1479
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (107 K)

Article Toolbox
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S1389-1286(99)00022-5    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 1999 Published by Elsevier Science B.V. All rights reserved.

Finding related pages in the World Wide Web

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Jeffrey Dean1, * and Monika R. Henzinger2

Compaq Systems Research Center, 130 Lytton Ave., Palo Alto, CA 94301, USA


Available online 3 May 2000.

Abstract

When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers.

We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's ‘What's Related' service (http://home.netscape.com/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.

Author Keywords: Search engines; Related pages; Searching paradigms

Article Outline

1. Introduction
2. Related page algorithms
2.1. Companion algorithm
2.1.1. Step 1: building the vicinity graph
2.1.2. Step 2: duplicate elimination
2.1.3. Step 3: assign edge weights
2.1.4. Step 4: compute hub and authority scores
2.2. Cocitation algorithm
2.3. Netscape's approach
3. Implementation
4. Evaluation
4.1. Experimental setup
4.2. User study results
4.3. Run-time performance
5. Related work
6. Conclusion
Acknowledgements
References
Vitae


1This work was done while the author was at the Compaq Western Research Laboratory.

*Corresponding author. Present address: mySimon, Inc., Santa Clara, CA, USA. E-mail: jdean@mysimon.com

2E-mail: monika@pa.dec.com


Computer Networks
Volume 31, Issues 11-16, 17 May 1999, Pages 1467-1479
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.