Abstract
Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance. For example, resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Other uses of word sense disambiguation in an information retrieval context are discussed.
- 1 AMSLER, R. The structure of the Mermam Webster Pocket Dictionary Ph D. dissertation, Univ. of Texas at Austin, 1980. Google Scholar
- 2 ANTHONY, E. An Exploratory Inquiry into Lexical Clusters Am. Speech, 29, 3 (1954), 175-180Google Scholar
- 3 ATKINS, B. Semantic ID Tags: Corpus evidence for dmtionary senses In Proceedings of the Third Annual Conference of the UW Centre for the New Oxford English Dtcttonary~ (1987), pp. 17-36.Google Scholar
- 4 BECK~R, B. Sense dmambiguatlon using the the New Oxford English D~ctionary. Masters thesis, Univ. of Waterloo, 1989.Google Scholar
- 5 BELKIN, N., AND CROFT, W B Retrieval Techniques. Ann. Rev Inf Sci and Tech (ARIST), 22~ (1987), 109-145. Google Scholar
- 6 BLACK, E. An expemment in computational dmcrimination of English word senses. IBM J. Res. Dev. 32, 2 (1988), 185-194. Google Scholar
- 7 CnODOROW, M., RAVIN, Y., AND SAC~AR, H. Tool for investigating the synonymy relation in a sense disambiguated thesaurus. In Proceedings of the 2 nd Con/erence on Applied Natural Language Processing, (1988)~ 144 151. Google Scholar
- 8 CHOUEKA, Y., AND LUS~C~NAN, S Discmminatlon by short contexts Comput. Hum 19, (1985), 147-157.Google Scholar
- 9 COTTRELL, G., AND SMALL, S. A connectiomst scheme for modeling word sense disambiguation. Cognition and Bra~n Theory, 6, I (1983), 89-120.Google Scholar
- 10 CROFT~ W B. Experiments with Representation in a Document Retmeval System Inf. Tech. Res. Dev. 2, (1983), 1-21.Google Scholar
- 11 CULLINGFORD, R., AND PAZZANI, M. Word-meaning selection in multiprocess language understanding programs. IEEE Trans Patter, Anal Moch Intell 6, 4 (1984), 493-509Google Scholar
- 12 DAHLGREN, K. Naive Semantics for Natural Language Understanding. Kluwer, Amsterdam, 1988 Google Scholar
- 13 EARL, L. Use of word government in resolving syntactic and semantic ambiguities. Inf. Storage Retrieval. 9, (1973), 639-664.Google Scholar
- 14 Fox, E., NUNN, G, AND LEE, W. Coefficients of combining concept classes in a collection. In Proceedings o{ the Eleventh Internatwnal Conference on Research and Development ~n Information Retrzeval, (1988), 291-308 Google Scholar
- 15 FUHR, N. Models for Retrieval with Probabilistic Indexing. Inf. Process. Manage. 25, 1 (1989), 55-72. Google Scholar
- 16 HAYES, P. Some association-based techniques for texical disambiguation by machine Ph.D Disseration, published as Tech. Rep. 25, Dept. of Computer Science, Univ. of Rochester, 1977.Google Scholar
- 17 HELM, S. Closer than you think. Medicine Comput. 1, i (1983)Google Scholar
- 18 HmsT, G Resolving lexical ambiguity computationally with spreading activation and polaroid words. In Lexical Ambiguity Resolution, S. Small, G. Cottrell, and M Tannenhaus Eds., Morgan Kaufmann, Pato Alto, Calif., 1988.Google Scholar
- 19 KEEN, E. An analysis of the documentation requests. In The Smart Retrteval System, G Salton, Ed., Prentice-Hall, Englewood, N.J, 1971.Google Scholar
- 20 KELLY, E., AND STONE, P. Computer Recogn~twn of English Word Senses. North-Holland, Amsterdam, 1975Google Scholar
- 21 KROVETZ, R. Lexical acquisition and information retrieval. In Lexical Acquisition' Build~ ~ng the Lexzcon using On-Line Resources, U. Zernik Ed., LEA Press~ 1991.Google Scholar
- 22 KROVETZ, R, AND CROFT, W. B Word sense disambiguation using machine readable dictionaries. In Proceedings of the Twelfth International Conference on Research and Development in Informatwn Retrieval (1989), 127-136. Google Scholar
- 23 LESK, M. Automatic sense disamblguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of SIGDOC (1986), 24-26 Google Scholar
- 24 MASTERMAN, M., NEEDHAM, R. M., SPARCK-JONES, K., AND MAYOH, B. Agricola incurvo terram dlmovit aratro. Rep. ML84, Cambridge Language Research Unit, 1957, Reprinted 1986.Google Scholar
- 25 PROCTOR, P Longman Dictzonali~ of Contemporary English. Longman, 1978.Google Scholar
- 26 SALTON, G. Automatic Information Organization and Retrieval. McGraw-Hill, New York, 1968. Google Scholar
- 27 SALTON, G., AND McGmL, M. Introduction to Modern Information Retrieval McGraw-Hill, New York, 1983. Google Scholar
- 28 SEDLOW, S., AND MOONEY, D. Knowledge retrieval from expert systems: II. Research results In Proceedings of the 51st Annual Meeting of the American Society of Information Science, (1988), pp. 209-212.Google Scholar
- 29 SLATOR, B Lexical semantics and preference semantics analysis. Ph.D. dissertation, Rep. MCCs-88-143, New Mexico State Univ., 1988 Google Scholar
- 30 SMALL, S., AND RINGER, C. Parsing and comprehending with word experts (a theory and its realization). In Strategies for Natural Language Processing, Lehnert and Ringle Eds. LEA Press, 1982, 89-147.Google Scholar
- 31 SMALL, S., COTTRELL, G., TANNENHAUS, M., EDS, Lexical Ambiguity Resolution. Morgan Kaufmann, Palo Alto, Calif., 1988. Google Scholar
- 32 SPARCK-JONES, K., AND TAIT, J. Automatic search term variant generation. J. Doc. 40, 1 (1984), 50-66.Google Scholar
- 33 VAN RIJSBERGAN, C.J. Informatlon Retrieval. Butterworths, London, 1979. Google Scholar
- 34 WEISS, S Learning to disambiguate. Inf. Storage Retrieval, 9, (1973), 33-41.Google Scholar
- 35 WILKS, Y., FASS, D., Guo, C-M., McDONALD, J., PLAT~, T., AND SLATO~, B. A tractable machine dictionary as a resource for computational semantics. In Computational Lexicography for Natural Language Processing, B. Boguraev and T. Brisoce (Eds). Longman, 1989. Google Scholar
- 36 ZIPF, G. The meaning-frequency relationship of words. J. Gen Psycho 33, (1945), 251 266.Google Scholar
Index Terms
- Lexical ambiguity and information retrieval
Recommendations
Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach
AsianIR '03: Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper ...
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalThis paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A ...
Using comparable corpora to improve the effectiveness of cross-language information retrieval
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processingLarge-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. From the Cross-Language Information Retrieval point of view, limitation of translation resources as well as ...
Comments