ABSTRACT
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe an hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.
- David A. Evans. 1990. Concept management in text via natural-language processing: The CLARIT approach. In: Working Notes of the 1990 AAAI Symposium on "Text-Based Intelligent Systems", Stanford University, March, 27--29, 1990, 93--95.Google Scholar
- David A. Evans, Kimberly Ginther-Webster, Mary Hart, Robert G. Lefferts, Ira A. Monarch. 1991. Automatic indexing using selective NLP and first-order thesauri. In: A. Lichnerowicz (ed.), Intelligent Text and Image Handling. Proceedings of a Conference, RIAO '91. Amsterdam, NL: Elsevier, pp. 624--644.Google Scholar
- David. A. Evans, Robert G. Lefferts, Gregory Grefenstette, Steven K. Handerson, William R. Hersh, and Armar A. Archbold. 1993. CLARIT TREC design, experiments, and results. In: Donna K. Harman (ed.), The First Text REtrieval Conference (TREC-1). NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office, pp. 251--286; 494--501.Google Scholar
- David A. Evans, and Robert G. Lefferts. 1995. CLARIT-TREC experiments Information Processing and Management, Vol. 31, No. 3, 385--395. Google ScholarDigital Library
- David A. Evans, Nataša Milić-Frayling, Robert G. Lefferts. 1996. CLARIT TREC-4 experiments. In: Donna K. Harman (ed.), The Fourth Text REtrieval Conference (TREC-4). NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google Scholar
- Donna K. Harman, ed. 1993. The First Text REtrieval Conference (TREC-1) NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office.Google ScholarCross Ref
- Donna K. Harman, ed. 1995. Overview of the Third Text REtrieval Conference (TREC-3), NIST Special Publication 500--225. Washington, DC: U.S. Government Printing Office.Google ScholarCross Ref
- Donna K. Harman, ed. 1996. Overview of the Fourth Text REtrieval Conference (TREC-4), NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google Scholar
- Mark Lauer. 1995. Corpus statistics meet with the noun compound: Some empirical results. In: Proceedings of the 33th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- David Lewis and K. Sparck Jones. 1996. Natural language processing for information retrieval. Communications of the ACM, January, Vol. 39, No. 1, 92--101. Google ScholarDigital Library
- Mark Liberman and Richard Sproat. 1992. The stress and structure of modified noun phrases in English. In: I. Sag and A. Szabolcsi (eds.), Lexical Matters, CSLI Lecture Notes No. 24. Chicago, IL: University of Chicago Press, pp. 131--181.Google Scholar
- Mitchell Maucus. 1980. A Theory of syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. Google ScholarDigital Library
- J. Pustejovsky, S. Bergler, and P. Anick. 1993. Lexical semantic techniques for corpus analysis. In: Computational Linguistics, Vol. 19(2), Special Issue on Using Large Corpora II, pp. 331--358. Google Scholar
- P. Resnik, and M. Hearst. 1993. Structural Ambiguity and Conceptual Relations. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, June 22, Ohio State University, pp. 58--64.Google Scholar
- Gerard Salton and Michael McGill. 1983. Introduction to Modern Information Retrieval, New York, NY: McGrawHill. Google ScholarDigital Library
- Christoph Schwarz. 1990. Content based text handling. Information Processing and Management, Vol. 26(2), pp. 219--226. Google ScholarDigital Library
- Alan F. Smeaton. 1992. Progress in application of natural language processing to information retrieval. The Computer Journal, Vol. 35, No. 3, pp. 268--278. Google ScholarDigital Library
- T. Strzalkowski and J. Carballo. 1994. Recent developments in natural language text retieval. In: Donna K. Harman (ed.), The Second Text REtrieval Conference (TREC-2). NIST Special Publication 500--215. Washington, DC: U.S. Government Printing Office, pp. 123--136. Google ScholarDigital Library
- Noun-phrase analysis in unrestricted text for information retrieval
Recommendations
Phrase processing methods for Japanese text retrieval
This paper examines the effectiveness of different phrase identification and weighting methods for Japanese text retrieval in an operational information retrieval (IR) system, called NACSIS-IR. Based on our previous experiments, we used character-based ...
Noun phrase chunking in Hebrew: influence of lexical and morphological features
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsWe present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as non-recursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of ...
Kazakh Noun Phrase Extraction Based on N-gram and Rules
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingThe aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were ...
Comments