skip to main content
10.3115/981863.981866dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Noun-phrase analysis in unrestricted text for information retrieval

Published:24 June 1996Publication History

ABSTRACT

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe an hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

References

  1. David A. Evans. 1990. Concept management in text via natural-language processing: The CLARIT approach. In: Working Notes of the 1990 AAAI Symposium on "Text-Based Intelligent Systems", Stanford University, March, 27--29, 1990, 93--95.Google ScholarGoogle Scholar
  2. David A. Evans, Kimberly Ginther-Webster, Mary Hart, Robert G. Lefferts, Ira A. Monarch. 1991. Automatic indexing using selective NLP and first-order thesauri. In: A. Lichnerowicz (ed.), Intelligent Text and Image Handling. Proceedings of a Conference, RIAO '91. Amsterdam, NL: Elsevier, pp. 624--644.Google ScholarGoogle Scholar
  3. David. A. Evans, Robert G. Lefferts, Gregory Grefenstette, Steven K. Handerson, William R. Hersh, and Armar A. Archbold. 1993. CLARIT TREC design, experiments, and results. In: Donna K. Harman (ed.), The First Text REtrieval Conference (TREC-1). NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office, pp. 251--286; 494--501.Google ScholarGoogle Scholar
  4. David A. Evans, and Robert G. Lefferts. 1995. CLARIT-TREC experiments Information Processing and Management, Vol. 31, No. 3, 385--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David A. Evans, Nataša Milić-Frayling, Robert G. Lefferts. 1996. CLARIT TREC-4 experiments. In: Donna K. Harman (ed.), The Fourth Text REtrieval Conference (TREC-4). NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google ScholarGoogle Scholar
  6. Donna K. Harman, ed. 1993. The First Text REtrieval Conference (TREC-1) NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office.Google ScholarGoogle ScholarCross RefCross Ref
  7. Donna K. Harman, ed. 1995. Overview of the Third Text REtrieval Conference (TREC-3), NIST Special Publication 500--225. Washington, DC: U.S. Government Printing Office.Google ScholarGoogle ScholarCross RefCross Ref
  8. Donna K. Harman, ed. 1996. Overview of the Fourth Text REtrieval Conference (TREC-4), NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google ScholarGoogle Scholar
  9. Mark Lauer. 1995. Corpus statistics meet with the noun compound: Some empirical results. In: Proceedings of the 33th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Lewis and K. Sparck Jones. 1996. Natural language processing for information retrieval. Communications of the ACM, January, Vol. 39, No. 1, 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Liberman and Richard Sproat. 1992. The stress and structure of modified noun phrases in English. In: I. Sag and A. Szabolcsi (eds.), Lexical Matters, CSLI Lecture Notes No. 24. Chicago, IL: University of Chicago Press, pp. 131--181.Google ScholarGoogle Scholar
  12. Mitchell Maucus. 1980. A Theory of syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Pustejovsky, S. Bergler, and P. Anick. 1993. Lexical semantic techniques for corpus analysis. In: Computational Linguistics, Vol. 19(2), Special Issue on Using Large Corpora II, pp. 331--358. Google ScholarGoogle Scholar
  14. P. Resnik, and M. Hearst. 1993. Structural Ambiguity and Conceptual Relations. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, June 22, Ohio State University, pp. 58--64.Google ScholarGoogle Scholar
  15. Gerard Salton and Michael McGill. 1983. Introduction to Modern Information Retrieval, New York, NY: McGrawHill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Christoph Schwarz. 1990. Content based text handling. Information Processing and Management, Vol. 26(2), pp. 219--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alan F. Smeaton. 1992. Progress in application of natural language processing to information retrieval. The Computer Journal, Vol. 35, No. 3, pp. 268--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Strzalkowski and J. Carballo. 1994. Recent developments in natural language text retieval. In: Donna K. Harman (ed.), The Second Text REtrieval Conference (TREC-2). NIST Special Publication 500--215. Washington, DC: U.S. Government Printing Office, pp. 123--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Noun-phrase analysis in unrestricted text for information retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics
          June 1996
          399 pages
          • Program Chairs:
          • Aravind Joshi,
          • Martha Palmer

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 24 June 1996

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate85of443submissions,19%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader