article

Free Access

Lexical ambiguity and information retrieval

Authors:
Robert Krovetz

Univ. of Massachusetts, Amherst

Univ. of Massachusetts, Amherst
View Profile

,
W. Bruce Croft

Univ. of Massachusetts, Amherst

Univ. of Massachusetts, Amherst
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 10 Issue 2pp 115–141https://doi.org/10.1145/146802.146810

Published:01 April 1992Publication History

ACM Transactions on Information Systems

Abstract

Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance. For example, resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Other uses of word sense disambiguation in an information retrieval context are discussed.

References

1 AMSLER, R. The structure of the Mermam Webster Pocket Dictionary Ph D. dissertation, Univ. of Texas at Austin, 1980. Google Scholar
2 ANTHONY, E. An Exploratory Inquiry into Lexical Clusters Am. Speech, 29, 3 (1954), 175-180Google Scholar
3 ATKINS, B. Semantic ID Tags: Corpus evidence for dmtionary senses In Proceedings of the Third Annual Conference of the UW Centre for the New Oxford English Dtcttonary~ (1987), pp. 17-36.Google Scholar
4 BECK~R, B. Sense dmambiguatlon using the the New Oxford English D~ctionary. Masters thesis, Univ. of Waterloo, 1989.Google Scholar
5 BELKIN, N., AND CROFT, W B Retrieval Techniques. Ann. Rev Inf Sci and Tech (ARIST), 22~ (1987), 109-145. Google Scholar
6 BLACK, E. An expemment in computational dmcrimination of English word senses. IBM J. Res. Dev. 32, 2 (1988), 185-194. Google Scholar
7 CnODOROW, M., RAVIN, Y., AND SAC~AR, H. Tool for investigating the synonymy relation in a sense disambiguated thesaurus. In Proceedings of the 2 nd Con/erence on Applied Natural Language Processing, (1988)~ 144 151. Google Scholar
8 CHOUEKA, Y., AND LUS~C~NAN, S Discmminatlon by short contexts Comput. Hum 19, (1985), 147-157.Google Scholar
9 COTTRELL, G., AND SMALL, S. A connectiomst scheme for modeling word sense disambiguation. Cognition and Bra~n Theory, 6, I (1983), 89-120.Google Scholar
10 CROFT~ W B. Experiments with Representation in a Document Retmeval System Inf. Tech. Res. Dev. 2, (1983), 1-21.Google Scholar
11 CULLINGFORD, R., AND PAZZANI, M. Word-meaning selection in multiprocess language understanding programs. IEEE Trans Patter, Anal Moch Intell 6, 4 (1984), 493-509Google Scholar
12 DAHLGREN, K. Naive Semantics for Natural Language Understanding. Kluwer, Amsterdam, 1988 Google Scholar
13 EARL, L. Use of word government in resolving syntactic and semantic ambiguities. Inf. Storage Retrieval. 9, (1973), 639-664.Google Scholar
14 Fox, E., NUNN, G, AND LEE, W. Coefficients of combining concept classes in a collection. In Proceedings o{ the Eleventh Internatwnal Conference on Research and Development ~n Information Retrzeval, (1988), 291-308 Google Scholar
15 FUHR, N. Models for Retrieval with Probabilistic Indexing. Inf. Process. Manage. 25, 1 (1989), 55-72. Google Scholar
16 HAYES, P. Some association-based techniques for texical disambiguation by machine Ph.D Disseration, published as Tech. Rep. 25, Dept. of Computer Science, Univ. of Rochester, 1977.Google Scholar
17 HELM, S. Closer than you think. Medicine Comput. 1, i (1983)Google Scholar
18 HmsT, G Resolving lexical ambiguity computationally with spreading activation and polaroid words. In Lexical Ambiguity Resolution, S. Small, G. Cottrell, and M Tannenhaus Eds., Morgan Kaufmann, Pato Alto, Calif., 1988.Google Scholar
19 KEEN, E. An analysis of the documentation requests. In The Smart Retrteval System, G Salton, Ed., Prentice-Hall, Englewood, N.J, 1971.Google Scholar
20 KELLY, E., AND STONE, P. Computer Recogn~twn of English Word Senses. North-Holland, Amsterdam, 1975Google Scholar
21 KROVETZ, R. Lexical acquisition and information retrieval. In Lexical Acquisition' Build~ ~ng the Lexzcon using On-Line Resources, U. Zernik Ed., LEA Press~ 1991.Google Scholar
22 KROVETZ, R, AND CROFT, W. B Word sense disambiguation using machine readable dictionaries. In Proceedings of the Twelfth International Conference on Research and Development in Informatwn Retrieval (1989), 127-136. Google Scholar
23 LESK, M. Automatic sense disamblguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of SIGDOC (1986), 24-26 Google Scholar
24 MASTERMAN, M., NEEDHAM, R. M., SPARCK-JONES, K., AND MAYOH, B. Agricola incurvo terram dlmovit aratro. Rep. ML84, Cambridge Language Research Unit, 1957, Reprinted 1986.Google Scholar
25 PROCTOR, P Longman Dictzonali~ of Contemporary English. Longman, 1978.Google Scholar
26 SALTON, G. Automatic Information Organization and Retrieval. McGraw-Hill, New York, 1968. Google Scholar
27 SALTON, G., AND McGmL, M. Introduction to Modern Information Retrieval McGraw-Hill, New York, 1983. Google Scholar
28 SEDLOW, S., AND MOONEY, D. Knowledge retrieval from expert systems: II. Research results In Proceedings of the 51st Annual Meeting of the American Society of Information Science, (1988), pp. 209-212.Google Scholar
29 SLATOR, B Lexical semantics and preference semantics analysis. Ph.D. dissertation, Rep. MCCs-88-143, New Mexico State Univ., 1988 Google Scholar
30 SMALL, S., AND RINGER, C. Parsing and comprehending with word experts (a theory and its realization). In Strategies for Natural Language Processing, Lehnert and Ringle Eds. LEA Press, 1982, 89-147.Google Scholar
31 SMALL, S., COTTRELL, G., TANNENHAUS, M., EDS, Lexical Ambiguity Resolution. Morgan Kaufmann, Palo Alto, Calif., 1988. Google Scholar
32 SPARCK-JONES, K., AND TAIT, J. Automatic search term variant generation. J. Doc. 40, 1 (1984), 50-66.Google Scholar
33 VAN RIJSBERGAN, C.J. Informatlon Retrieval. Butterworths, London, 1979. Google Scholar
34 WEISS, S Learning to disambiguate. Inf. Storage Retrieval, 9, (1973), 33-41.Google Scholar
35 WILKS, Y., FASS, D., Guo, C-M., McDONALD, J., PLAT~, T., AND SLATO~, B. A tractable machine dictionary as a resource for computational semantics. In Computational Lexicography for Natural Language Processing, B. Boguraev and T. Brisoce (Eds). Longman, 1989. Google Scholar
36 ZIPF, G. The meaning-frequency relationship of words. J. Gen Psycho 33, (1945), 251 266.Google Scholar

Index Terms

Lexical ambiguity and information retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval

Recommendations

Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach
AsianIR '03: Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11

Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper ...
Read More
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A ...
Read More
Using comparable corpora to improve the effectiveness of cross-language information retrieval
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processing

Large-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. From the Cross-Language Information Retrieval point of view, limitation of translation resources as well as ...
Read More

Reviews

Reviewer: Richard S. Marcus

The authors consider ambiguity arising from words having multiple senses. For example, the word “file” has the two senses, “a thing that stores information” and “a thing that scrapes wood or metal.” The authors give a nice review of kinds of lexical ambiguity and attempts in computational linguistics to disambiguate words. They then perform a series of experimental analyses to determine how much improvement in document retrieval could be obtained using word sense analysis. The analyses indicate that word sense disambiguation can significantly aid in the differentiation of relevant from irrelevant documents but that this does not necessarily contribute to any large improvement in retrieval. In fact, the experiments showed only small increases in precision (up to 1 or 2 percent). The authors conjecture that disambiguation will be highly effective in some situations, such as where a query word has several senses and the sense in the query is not statistically predominant, or where high recall is desired or one word in the query is critical. This excellent paper is must reading for anyone researching this area. As the authors point out, relatively little quantitative analysis had been done before their research. A few caveats are in order: w hile gain in precision is analyzed, it is not obvious what loss in recall, if any, results. The analysis purporting to show the higher importance of uniform sense distribution or use of rare senses in queries does not actually appear to show any greater proportionate utility of those cases in increasing precision. The special utility in the high-recall case is left as an unanalyzed conjecture. The analysis is limited to searching in the statistical/vector paradigm; I conjecture that sense disambiguation could be even more useful in search paradigms emphasizing deep semantic analysis or contextual or structural (“Smart Boolean”) approaches.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 10, Issue 2
April 1992
98 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/146802
Issue’s Table of Contents

Copyright © 1992 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 1992
Published in tois Volume 10, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
disambiguation
document retrieval
semantically based search
word senses
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 217
  Total Citations
  View Citations
- 2,379
  Total Downloads
- Downloads (Last 12 months)349
- Downloads (Last 6 weeks)55
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lexical ambiguity and information retrieval

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Using comparable corpora to improve the effectiveness of cross-language information retrieval

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lexical ambiguity and information retrieval

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Using comparable corpora to improve the effectiveness of cross-language information retrieval

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media