skip to main content
article
Free Access

Lexical ambiguity and information retrieval

Published:01 April 1992Publication History
Skip Abstract Section

Abstract

Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance. For example, resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Other uses of word sense disambiguation in an information retrieval context are discussed.

References

  1. 1 AMSLER, R. The structure of the Mermam Webster Pocket Dictionary Ph D. dissertation, Univ. of Texas at Austin, 1980. Google ScholarGoogle Scholar
  2. 2 ANTHONY, E. An Exploratory Inquiry into Lexical Clusters Am. Speech, 29, 3 (1954), 175-180Google ScholarGoogle Scholar
  3. 3 ATKINS, B. Semantic ID Tags: Corpus evidence for dmtionary senses In Proceedings of the Third Annual Conference of the UW Centre for the New Oxford English Dtcttonary~ (1987), pp. 17-36.Google ScholarGoogle Scholar
  4. 4 BECK~R, B. Sense dmambiguatlon using the the New Oxford English D~ctionary. Masters thesis, Univ. of Waterloo, 1989.Google ScholarGoogle Scholar
  5. 5 BELKIN, N., AND CROFT, W B Retrieval Techniques. Ann. Rev Inf Sci and Tech (ARIST), 22~ (1987), 109-145. Google ScholarGoogle Scholar
  6. 6 BLACK, E. An expemment in computational dmcrimination of English word senses. IBM J. Res. Dev. 32, 2 (1988), 185-194. Google ScholarGoogle Scholar
  7. 7 CnODOROW, M., RAVIN, Y., AND SAC~AR, H. Tool for investigating the synonymy relation in a sense disambiguated thesaurus. In Proceedings of the 2 nd Con/erence on Applied Natural Language Processing, (1988)~ 144 151. Google ScholarGoogle Scholar
  8. 8 CHOUEKA, Y., AND LUS~C~NAN, S Discmminatlon by short contexts Comput. Hum 19, (1985), 147-157.Google ScholarGoogle Scholar
  9. 9 COTTRELL, G., AND SMALL, S. A connectiomst scheme for modeling word sense disambiguation. Cognition and Bra~n Theory, 6, I (1983), 89-120.Google ScholarGoogle Scholar
  10. 10 CROFT~ W B. Experiments with Representation in a Document Retmeval System Inf. Tech. Res. Dev. 2, (1983), 1-21.Google ScholarGoogle Scholar
  11. 11 CULLINGFORD, R., AND PAZZANI, M. Word-meaning selection in multiprocess language understanding programs. IEEE Trans Patter, Anal Moch Intell 6, 4 (1984), 493-509Google ScholarGoogle Scholar
  12. 12 DAHLGREN, K. Naive Semantics for Natural Language Understanding. Kluwer, Amsterdam, 1988 Google ScholarGoogle Scholar
  13. 13 EARL, L. Use of word government in resolving syntactic and semantic ambiguities. Inf. Storage Retrieval. 9, (1973), 639-664.Google ScholarGoogle Scholar
  14. 14 Fox, E., NUNN, G, AND LEE, W. Coefficients of combining concept classes in a collection. In Proceedings o{ the Eleventh Internatwnal Conference on Research and Development ~n Information Retrzeval, (1988), 291-308 Google ScholarGoogle Scholar
  15. 15 FUHR, N. Models for Retrieval with Probabilistic Indexing. Inf. Process. Manage. 25, 1 (1989), 55-72. Google ScholarGoogle Scholar
  16. 16 HAYES, P. Some association-based techniques for texical disambiguation by machine Ph.D Disseration, published as Tech. Rep. 25, Dept. of Computer Science, Univ. of Rochester, 1977.Google ScholarGoogle Scholar
  17. 17 HELM, S. Closer than you think. Medicine Comput. 1, i (1983)Google ScholarGoogle Scholar
  18. 18 HmsT, G Resolving lexical ambiguity computationally with spreading activation and polaroid words. In Lexical Ambiguity Resolution, S. Small, G. Cottrell, and M Tannenhaus Eds., Morgan Kaufmann, Pato Alto, Calif., 1988.Google ScholarGoogle Scholar
  19. 19 KEEN, E. An analysis of the documentation requests. In The Smart Retrteval System, G Salton, Ed., Prentice-Hall, Englewood, N.J, 1971.Google ScholarGoogle Scholar
  20. 20 KELLY, E., AND STONE, P. Computer Recogn~twn of English Word Senses. North-Holland, Amsterdam, 1975Google ScholarGoogle Scholar
  21. 21 KROVETZ, R. Lexical acquisition and information retrieval. In Lexical Acquisition' Build~ ~ng the Lexzcon using On-Line Resources, U. Zernik Ed., LEA Press~ 1991.Google ScholarGoogle Scholar
  22. 22 KROVETZ, R, AND CROFT, W. B Word sense disambiguation using machine readable dictionaries. In Proceedings of the Twelfth International Conference on Research and Development in Informatwn Retrieval (1989), 127-136. Google ScholarGoogle Scholar
  23. 23 LESK, M. Automatic sense disamblguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of SIGDOC (1986), 24-26 Google ScholarGoogle Scholar
  24. 24 MASTERMAN, M., NEEDHAM, R. M., SPARCK-JONES, K., AND MAYOH, B. Agricola incurvo terram dlmovit aratro. Rep. ML84, Cambridge Language Research Unit, 1957, Reprinted 1986.Google ScholarGoogle Scholar
  25. 25 PROCTOR, P Longman Dictzonali~ of Contemporary English. Longman, 1978.Google ScholarGoogle Scholar
  26. 26 SALTON, G. Automatic Information Organization and Retrieval. McGraw-Hill, New York, 1968. Google ScholarGoogle Scholar
  27. 27 SALTON, G., AND McGmL, M. Introduction to Modern Information Retrieval McGraw-Hill, New York, 1983. Google ScholarGoogle Scholar
  28. 28 SEDLOW, S., AND MOONEY, D. Knowledge retrieval from expert systems: II. Research results In Proceedings of the 51st Annual Meeting of the American Society of Information Science, (1988), pp. 209-212.Google ScholarGoogle Scholar
  29. 29 SLATOR, B Lexical semantics and preference semantics analysis. Ph.D. dissertation, Rep. MCCs-88-143, New Mexico State Univ., 1988 Google ScholarGoogle Scholar
  30. 30 SMALL, S., AND RINGER, C. Parsing and comprehending with word experts (a theory and its realization). In Strategies for Natural Language Processing, Lehnert and Ringle Eds. LEA Press, 1982, 89-147.Google ScholarGoogle Scholar
  31. 31 SMALL, S., COTTRELL, G., TANNENHAUS, M., EDS, Lexical Ambiguity Resolution. Morgan Kaufmann, Palo Alto, Calif., 1988. Google ScholarGoogle Scholar
  32. 32 SPARCK-JONES, K., AND TAIT, J. Automatic search term variant generation. J. Doc. 40, 1 (1984), 50-66.Google ScholarGoogle Scholar
  33. 33 VAN RIJSBERGAN, C.J. Informatlon Retrieval. Butterworths, London, 1979. Google ScholarGoogle Scholar
  34. 34 WEISS, S Learning to disambiguate. Inf. Storage Retrieval, 9, (1973), 33-41.Google ScholarGoogle Scholar
  35. 35 WILKS, Y., FASS, D., Guo, C-M., McDONALD, J., PLAT~, T., AND SLATO~, B. A tractable machine dictionary as a resource for computational semantics. In Computational Lexicography for Natural Language Processing, B. Boguraev and T. Brisoce (Eds). Longman, 1989. Google ScholarGoogle Scholar
  36. 36 ZIPF, G. The meaning-frequency relationship of words. J. Gen Psycho 33, (1945), 251 266.Google ScholarGoogle Scholar

Index Terms

  1. Lexical ambiguity and information retrieval

                Recommendations

                Reviews

                Richard S. Marcus

                The authors consider ambiguity arising from words having multiple senses. For example, the word “file” has the two senses, “a thing that stores information” and “a thing that scrapes wood or metal.” The authors give a nice review of kinds of lexical ambiguity and attempts in computational linguistics to disambiguate words. They then perform a series of experimental analyses to determine how much improvement in document retrieval could be obtained using word sense analysis. The analyses indicate that word sense disambiguation can significantly aid in the differentiation of relevant from irrelevant documents but that this does not necessarily contribute to any large improvement in retrieval. In fact, the experiments showed only small increases in precision (up to 1 or 2 percent). The authors conjecture that disambiguation will be highly effective in some situations, such as where a query word has several senses and the sense in the query is not statistically predominant, or where high recall is desired or one word in the query is critical. This excellent paper is must reading for anyone researching this area. As the authors point out, relatively little quantitative analysis had been done before their research. A few caveats are in order: w hile gain in precision is analyzed, it is not obvious what loss in recall, if any, results. The analysis purporting to show the higher importance of uniform sense distribution or use of rare senses in queries does not actually appear to show any greater proportionate utility of those cases in increasing precision. The special utility in the high-recall case is left as an unanalyzed conjecture. The analysis is limited to searching in the statistical/vector paradigm; I conjecture that sense disambiguation could be even more useful in search paradigms emphasizing deep semantic analysis or contextual or structural (“Smart Boolean”) approaches.

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader