ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Processing & Management
Volume 38, Issue 6, November 2002, Pages 823-848
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (410 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
Special issue
View Record in Scopus
 
doi:10.1016/S0306-4573(01)00051-6    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2002 Elsevier Science Ltd. All rights reserved.

Strong similarity measures for ordered sets of documents in information retrieval

L. EggheCorresponding Author Contact Information, E-mail The Corresponding Author, a, b and C. MichelE-mail The Corresponding Author, c

a LUC, Universitaire Campus, B-3590, Diepenbeek, Belgium b UIA, Universiteitsplein 1, B-2610, Antwerpen (Wilrijk), Belgium c CEM-GRESIC, MSHA, D.U. Bordeaux III, Esplanade des Antilles, F-33607, Pessac Cedex, France

Received 6 December 2000; 
accepted 3 October 2001. 
Available online 21 November 2001.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

A general method is presented to construct ordered similarity measures (OS-measures), i.e., similarity measures for ordered sets of documents (as, e.g., being the result of an IR-process), based on classical, well-known similarity measures for ordinary sets (measures such as Jaccard, Dice, Cosine or overlap measures). To this extent, we first present a review of these measures and their relationships.

The method given here to construct OS-measures extends the one given by Michel in a previous paper so that it becomes applicable on any pair of ordered sets. Concrete expressions of this method, applied to the classical similarity measures, are given.

Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT© extracts ranked document sets in three different contexts, each for 550 requests. The practical usability of the OS-measures is then discussed based on these experiments.

Article Outline

1. Introduction
1.1. General note on the interpretation of these similarity measures for the comparison of vectors
2. General properties of similarity measures (on ordinary sets)
3. Ordered similarity measures (OS-measures)
3.1. Statement of the problem
3.2. General theorem on the construction of strong OS-measures
3.3. Strong OS-measures derived from strong similarity measures for ordinary sets
3.3.1. Jaccard
3.3.2. Dice
3.3.3. Generalized Dice
3.3.4. Cosine
3.3.5. The measure N
3.3.6. The overlap measure O2
4. Experimentation
4.1. Presentation of the context
4.2. Tested OS-measures
4.3. Analysis
4.4. Results
4.4.1. Impact of weight function
4.4.2. Impact of the classical similarity indicator
4.4.3. Impact of the query type
5. Conclusion
Appendix A
References















 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.