Retrieving Terms and their Variants in a Lexicalized Unification-Based Framework

Jacquemin, Christian; Royaute, Jean

doi:10.1007/978-1-4471-2099-5_14

Retrieving Terms and their Variants in a Lexicalized Unification-Based Framework

Christian Jacquemin³ &
Jean Royaute⁴

Conference paper

430 Accesses
8 Citations

Abstract

Term extraction is a major concern for information retrieval. Terms are not fixed forms and their variations prevent them from being identified by a match with their initial string or inflection. We show that a local syntactic approach to this problem can give good results for both the quality of identification and parsing time.

A specific tool, FASTR, is developed which handles an identification of basic terms and a parser of their variations as well. Terms are described by logic rules automatically generated from terms and their categorial structure. Variations are represented by metarules. The parser efficiently processes large size corpora with big dictionaries and mixes lexical identification with local syntactic analysis. We evaluate the accuracy of results produced by these metarules and improve these results with filtering metarules.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sparck Jones K. Assumptions and Issues in Text-Based Information Retrieval. In: Jacobs PS (ed) Text-Based Intelligent Systems, Lawrence Erlbaum Associates, Hillsdale, 1992.
Google Scholar
Krovetz R, Croft WB. Word Sense Disambiguation Using Machine-Readable Dictionaries. In: Proceedings, 12th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 1989, pp 127–136.
Google Scholar
Krovetz R. Viewing Morphology as an Inference Process. In: Proceedings, 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp 191–203.
Google Scholar
Sparck Jones K, Tait H. Automatic Search Term Variant Generation. Journal of Documentation, 40 (1), 1984, pp 50–66.
Article Google Scholar
Harman D, Buckley C, Dumais S et al. Report on TREC-2 (Text REtrieval Conference). SIGIR Forum, 27 (3), 1993, pp 14–18.
Article Google Scholar
Strzalkowski T, Vauthey B. Information Retrieval Using Robust Natural Language Processing. In: Proceedings, 30th Annual Meeting of the Association for Computational Linguistics, 1992, pp 104–111.
Google Scholar
Enguehard C, Malvache P. Trigano P. Indexation de textes: l’apprentissage de concepts. In: Proceedings, 14th International Conference on Computational Linguistics, 1992, pp 1197–1202.
Google Scholar
Jacquemin C. A Coincidence Detection Network for Spatio-temporal Coding: Application to Nominal Composition. In Proceedings, 13th International Joint Conference on Artificial Intelligence, 1993.
Google Scholar
Langacker RW. Foundations of Cognitive Grammar. Vol I. Theoretical Prerequisites. Stanford University Press, Stanford, 1987.
Google Scholar
Fagan JL. Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods. In: Proceedings, 10th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 1987, pp 91–101.
Google Scholar
Evans DA, Ginther-Webster K, Hart M, Lefferts RG, Monarch IA. Automatic Indexing Using Selective NLP and First-Order Thesauri. In: Proceedings, RIAO’91, 1991, pp 624–643.
Google Scholar
Shieber SN. An Introduction to Unification-Based Approaches to Grammar. CSLI Lecture Notes 4, CLSI, Stanford, 1986.
Google Scholar
Boguraev B, Briscoe T. Large Lexicons for Natural Language Processing: Utilizing the Grammar Coding System of LDOCE. Computational Linguistics 13 (3–4), 1987, pp 203–218.
Google Scholar
Courtois B. Un système de dictionnaires électronique pour les mots simples du français. Langue Française 87, Larousse, Paris, 1990, pp 11–22.
Google Scholar
Schabes Y, Joshi AK. Parsing with Lexicalized Tree Adjoining Grammar. In: Tomita M (ed) Current Issues in Parsing Technologies, Kluwer Academic Publisher, Dordrecht, 1990.
Google Scholar
Jacquemin C. FASTR: A Unification Grammar and a Parser for Terminology Extraction from Large Corpora. In: Proceedings of IA’94, EC2, Paris, 1994. Forthcoming.
Google Scholar
Jacquemin C. Representing and Parsing Terms with Acceptability Controlled Grammar. In: Proceedings, Terminology and Knowledge Engineering 93, Indeks Verlag, Cologne, 1993, pp 235–244.
Google Scholar
Royauté J, Schmitt L, Olivetan E. Les expériences d’indexation a l’INIST. In: Proceedings, 14th International Conference on Computational Linguistics, 1992, pp 1058–1063.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut de Recherche en Informatique de Nantes, IUT, 3, rue du Maréchal Joffre, F-44041, Nantes Cedex 01, France
Christian Jacquemin
Programme de Recherche Indexation, INIST - CNRS, 2, allée du Parc de Brabois, F-54514, Vandoeuvre-lès-Nancy, France
Jean Royaute

Authors

Christian Jacquemin
View author publications
You can also search for this author in PubMed Google Scholar
Jean Royaute
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA, USA
Bruce W. Croft
Department of Computer Science, University of Glasgow, G12 8RZ, 8–17 Lilybank Gardens, Glasgow, Scotland
C. J. van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jacquemin, C., Royaute, J. (1994). Retrieving Terms and their Variants in a Lexicalized Unification-Based Framework. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_14

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2099-5_14
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics