COLE Experiments in the CLEF 2002 Spanish Monolingual Track

Vilares, Jesús; Alonso, Miguel A.; Ribadas, Francisco J.; Vilares, Manuel

doi:10.1007/978-3-540-45237-9_22

COLE Experiments in the CLEF 2002 Spanish Monolingual Track

Jesús Vilares⁵,
Miguel A. Alonso⁵,
Francisco J. Ribadas⁶ &
…
Manuel Vilares⁶

Conference paper

305 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2785))

Abstract

In this our first participation in CLEF, we applied Natural Language Processing techniques for single word and multiword term conflation. We tested several approaches at different levels of text processing in our experiments: first, we lemmatized the text to avoid inflectional variation; second, we expanded the queries through synonyms according to a fixed similarity threshold; third, we employed morphological families to deal with derivational variation; and fourth, we tested a mixed approach based on the employment of such families together with syntactic dependencies to deal with the syntactic content of the document.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Miguel A. Alonso, Jesús Vilares, and Víctor M. Darriba. On the usefulness of extracting syntactic dependencies for text indexing. In Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith, editors, Artificial Intelligence and Cognitive Science, volume 2464 of Lecture Notes in Artificial Intelligence, pages 3–11. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 274, 276
Google Scholar
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, Harlow, England, 1999. 270
Google Scholar
Fco. Mario Barcala, Jesús Vilares, Miguel A. Alonso, Jorge Graña, and Manuel Vilares. Tokenization and proper noun recognition for information retrieval. In A Min Tjoa and Roland R. Wagner (eds.), Thirteen International Workshop on Database and Expert Systems Applications. 2-6 September 2002. Aix-en-Provence, France, pp. 246-250, IEEE Computer Society Press, Los Alamitos, California, 2002. 266
Google Scholar
J.M. Blecua (dir.), Diccionario Avanzado de Sinónimos y Antónimos de la Lengua Española, Vox, Barcelona, Spain, 1997. 267
Google Scholar
Thorsten Brants. TNT — a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP’2000), Seattle, 2000. 266
Google Scholar
Chris Buckley, James Allan, and Gerard Salton. Automatic routing and ad-hoc retrieval using SMART: TREC 2. In D.K. Harman, editor, NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2), pages 45-56, Gaithersburg, MD, USA, 1993. 271
Google Scholar
Santiago Fernández, Jorge Graña, and Alejandro Sobrino. A Spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In Actas del XI Congreso Español sobre Tecnologías y Lógica Fuzzy (ESTYLF-2002), León, Spain, September 2002. 267
Google Scholar
Carlos G. Figuerola, Raquel Gómez, Angel F. Zazo, and José Luis Alonso. Stemming in Spanish: A first approach to its impact on information retrieval. In Carol Peters, editor, Working notes for the CLEF 2001 Workshop, Darmstadt, Germany, September 2001. 269
Google Scholar
Jorge Graña, Fco. Mario Barcala, and Miguel A. Alonso. Compilation methods of minimal acyclic automata for large dictionaries. In Bruce W. Watson and Derick Wood, editors, Proc. of the 6th Conference on Implementations and Applications of Automata (CIAA 2001), pages 116-129, Pretoria, South Africa, July 2001. 266
Google Scholar
Jorge Graña, Fco. Mario Barcala, and Jesús Vilares. Formal methods of tokenization for part-of-speech tagging. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 240–249. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 266
Google Scholar
Jorge Graña, Jean-Cédric Chappelier, and Manuel Vilares. Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of the Eu-roconference Recent Advances in Natural Language Processing (RANLP 2001), pages 122-128, Tzigov Chark, Bulgaria, 2001. 266
Google Scholar
Jane Greenberg. Automatic query expansion via lexical-semantic relationships. Journal of the American Society for Information Science and Technology, 52(5):402–415, 2001. 267
Article Google Scholar
Christian Jacquemin and Evelyne Tzoukermann. NLP for term variant extraction: synergy between morphology, lexicon and syntax. In Tomek Strza-lkowski, editor, Natural Language Information Retrieval, volume 7 of Text, Speech and Language Technology, pages 25–74. Kluwer Academic Publishers, Dordrecht/Boston/London, 1999. 268
Google Scholar
J. Savoy, A. Le Calve, and D. Vrajitoru. Report on the TREC-5 experiment: Data fusion and collection fusion. Proceedings of TREC’5, NIST publication #500-238, pages 489-502, Gaithersburg, MD, 1997. 272
Google Scholar
Jesús Vilares, Fco. Mario Barcala, and Miguel A. Alonso. Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 381–390. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 268, 276
Google Scholar
Jesús Vilares, David Cabrero, and Miguel A. Alonso. Applying productive derivational morphology to term indexing of Spanish texts. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2004 of Lecture Notes in Computer Science, pages 336–348. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 267
Google Scholar
Jesús Vilares, Manuel Vilares, and Miguel A. Alonso. Towards the development of heuristics for automatic query expansion. In Heinrich C. Mayr, Jiri Lazansky, Gerald Quirchmayr, and Pavel Vogel, editors, Database and Expert Systems Applications, Volume 2113 of Lecture Notes in Computer Science, pages 887–896. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 270, 272, 276
Google Scholar
David Yarowsky. A comparison of corpus-based techniques for restoring accents in Spanish and French text. In Natural Language Processing Using Very Large Corpora, pages 99-120. Kluwer Academic Publishers, 1999. 269
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Computación, Universidade da Coruña, Campus de Elviña s/n, 15071, La Coruña, Spain
Jesús Vilares & Miguel A. Alonso
Escuela Superior de Ingeniería Informática, Universidade de Vigo, Campus de As Lagoas, 32004, Orense, Spain
Francisco J. Ribadas & Manuel Vilares

Authors

Jesús Vilares
View author publications
You can also search for this author in PubMed Google Scholar
Miguel A. Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Ribadas
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Vilares
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche (ISTI-CNR), Via G. Moruzzi 1, 56124, Pisa, Italy
Carol Peters
Eurospider Information Technology AG, Schaffhauserstr. 18, 8006, Zürich, Switzerland
Martin Braschler
Universidad Nacional de Educación a Distancia Lenguajes y Sístemas Informáticos, Ciudad Universitaria, 28040, Madrid, Spain
Julio Gonzalo
Informationszentrum Sozialwissenschaften, Arbeitsgemeinschaft Sozialwissenschaftlicher Institute e.V. (IZ), Lennéstr. 30, 53113, Bonn, Germany
Michael Kluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M. (2003). COLE Experiments in the CLEF 2002 Spanish Monolingual Track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-45237-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40830-7
Online ISBN: 978-3-540-45237-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics