Article

Free Access

Noun-phrase analysis in unrestricted text for information retrieval

Authors:
David A. Evans

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Chengxiang Zhai

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

ACL '96: Proceedings of the 34th annual meeting on Association for Computational LinguisticsJune 1996Pages 17–24https://doi.org/10.3115/981863.981866

Published:24 June 1996Publication History

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

Pages 17–24

ABSTRACT

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe an hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

References

David A. Evans. 1990. Concept management in text via natural-language processing: The CLARIT approach. In: Working Notes of the 1990 AAAI Symposium on "Text-Based Intelligent Systems", Stanford University, March, 27--29, 1990, 93--95.Google Scholar
David A. Evans, Kimberly Ginther-Webster, Mary Hart, Robert G. Lefferts, Ira A. Monarch. 1991. Automatic indexing using selective NLP and first-order thesauri. In: A. Lichnerowicz (ed.), Intelligent Text and Image Handling. Proceedings of a Conference, RIAO '91. Amsterdam, NL: Elsevier, pp. 624--644.Google Scholar
David. A. Evans, Robert G. Lefferts, Gregory Grefenstette, Steven K. Handerson, William R. Hersh, and Armar A. Archbold. 1993. CLARIT TREC design, experiments, and results. In: Donna K. Harman (ed.), The First Text REtrieval Conference (TREC-1). NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office, pp. 251--286; 494--501.Google Scholar
David A. Evans, and Robert G. Lefferts. 1995. CLARIT-TREC experiments Information Processing and Management, Vol. 31, No. 3, 385--395. Google ScholarDigital Library
David A. Evans, Nataša Milić-Frayling, Robert G. Lefferts. 1996. CLARIT TREC-4 experiments. In: Donna K. Harman (ed.), The Fourth Text REtrieval Conference (TREC-4). NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google Scholar
Donna K. Harman, ed. 1993. The First Text REtrieval Conference (TREC-1) NIST Special Publication 500--207. Washington, DC: U.S. Government Printing Office.Google ScholarCross Ref
Donna K. Harman, ed. 1995. Overview of the Third Text REtrieval Conference (TREC-3), NIST Special Publication 500--225. Washington, DC: U.S. Government Printing Office.Google ScholarCross Ref
Donna K. Harman, ed. 1996. Overview of the Fourth Text REtrieval Conference (TREC-4), NIST Special Publication. Washington, DC: U.S. Government Printing Office.Google Scholar
Mark Lauer. 1995. Corpus statistics meet with the noun compound: Some empirical results. In: Proceedings of the 33th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
David Lewis and K. Sparck Jones. 1996. Natural language processing for information retrieval. Communications of the ACM, January, Vol. 39, No. 1, 92--101. Google ScholarDigital Library
Mark Liberman and Richard Sproat. 1992. The stress and structure of modified noun phrases in English. In: I. Sag and A. Szabolcsi (eds.), Lexical Matters, CSLI Lecture Notes No. 24. Chicago, IL: University of Chicago Press, pp. 131--181.Google Scholar
Mitchell Maucus. 1980. A Theory of syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. Google ScholarDigital Library
J. Pustejovsky, S. Bergler, and P. Anick. 1993. Lexical semantic techniques for corpus analysis. In: Computational Linguistics, Vol. 19(2), Special Issue on Using Large Corpora II, pp. 331--358. Google Scholar
P. Resnik, and M. Hearst. 1993. Structural Ambiguity and Conceptual Relations. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, June 22, Ohio State University, pp. 58--64.Google Scholar
Gerard Salton and Michael McGill. 1983. Introduction to Modern Information Retrieval, New York, NY: McGrawHill. Google ScholarDigital Library
Christoph Schwarz. 1990. Content based text handling. Information Processing and Management, Vol. 26(2), pp. 219--226. Google ScholarDigital Library
Alan F. Smeaton. 1992. Progress in application of natural language processing to information retrieval. The Computer Journal, Vol. 35, No. 3, pp. 268--278. Google ScholarDigital Library
T. Strzalkowski and J. Carballo. 1994. Recent developments in natural language text retieval. In: Donna K. Harman (ed.), The Second Text REtrieval Conference (TREC-2). NIST Special Publication 500--215. Washington, DC: U.S. Government Printing Office, pp. 123--136. Google ScholarDigital Library

Noun-phrase analysis in unrestricted text for information retrieval

Recommendations

Phrase processing methods for Japanese text retrieval

This paper examines the effectiveness of different phrase identification and weighting methods for Japanese text retrieval in an operational information retrieval (IR) system, called NACSIS-IR. Based on our previous experiments, we used character-based ...
Read More
Noun phrase chunking in Hebrew: influence of lexical and morphological features
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as non-recursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of ...
Read More
Kazakh Noun Phrase Extraction Based on N-gram and Rules
IALP '10: Proceedings of the 2010 International Conference on Asian Language Processing

The aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics
June 1996
399 pages
Program Chairs:
Aravind Joshi
University of Pennsylvania, Philadelphia, PA
,
Martha Palmer
University of Pennsylvania, Philadelphia, PA
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 24 June 1996
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,243
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Noun-phrase analysis in unrestricted text for information retrieval

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Phrase processing methods for Japanese text retrieval

Noun phrase chunking in Hebrew: influence of lexical and morphological features

Kazakh Noun Phrase Extraction Based on N-gram and Rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Noun-phrase analysis in unrestricted text for information retrieval

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Phrase processing methods for Japanese text retrieval

Noun phrase chunking in Hebrew: influence of lexical and morphological features

Kazakh Noun Phrase Extraction Based on N-gram and Rules

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media