Skip to main content

COLE Experiments in the CLEF 2002 Spanish Monolingual Track

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2785))

Abstract

In this our first participation in CLEF, we applied Natural Language Processing techniques for single word and multiword term conflation. We tested several approaches at different levels of text processing in our experiments: first, we lemmatized the text to avoid inflectional variation; second, we expanded the queries through synonyms according to a fixed similarity threshold; third, we employed morphological families to deal with derivational variation; and fourth, we tested a mixed approach based on the employment of such families together with syntactic dependencies to deal with the syntactic content of the document.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miguel A. Alonso, Jesús Vilares, and Víctor M. Darriba. On the usefulness of extracting syntactic dependencies for text indexing. In Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith, editors, Artificial Intelligence and Cognitive Science, volume 2464 of Lecture Notes in Artificial Intelligence, pages 3–11. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 274, 276

    Google Scholar 

  2. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, Harlow, England, 1999. 270

    Google Scholar 

  3. Fco. Mario Barcala, Jesús Vilares, Miguel A. Alonso, Jorge Graña, and Manuel Vilares. Tokenization and proper noun recognition for information retrieval. In A Min Tjoa and Roland R. Wagner (eds.), Thirteen International Workshop on Database and Expert Systems Applications. 2-6 September 2002. Aix-en-Provence, France, pp. 246-250, IEEE Computer Society Press, Los Alamitos, California, 2002. 266

    Google Scholar 

  4. J.M. Blecua (dir.), Diccionario Avanzado de Sinónimos y Antónimos de la Lengua Española, Vox, Barcelona, Spain, 1997. 267

    Google Scholar 

  5. Thorsten Brants. TNT — a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP’2000), Seattle, 2000. 266

    Google Scholar 

  6. Chris Buckley, James Allan, and Gerard Salton. Automatic routing and ad-hoc retrieval using SMART: TREC 2. In D.K. Harman, editor, NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2), pages 45-56, Gaithersburg, MD, USA, 1993. 271

    Google Scholar 

  7. Santiago Fernández, Jorge Graña, and Alejandro Sobrino. A Spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In Actas del XI Congreso Español sobre Tecnologías y Lógica Fuzzy (ESTYLF-2002), León, Spain, September 2002. 267

    Google Scholar 

  8. Carlos G. Figuerola, Raquel Gómez, Angel F. Zazo, and José Luis Alonso. Stemming in Spanish: A first approach to its impact on information retrieval. In Carol Peters, editor, Working notes for the CLEF 2001 Workshop, Darmstadt, Germany, September 2001. 269

    Google Scholar 

  9. Jorge Graña, Fco. Mario Barcala, and Miguel A. Alonso. Compilation methods of minimal acyclic automata for large dictionaries. In Bruce W. Watson and Derick Wood, editors, Proc. of the 6th Conference on Implementations and Applications of Automata (CIAA 2001), pages 116-129, Pretoria, South Africa, July 2001. 266

    Google Scholar 

  10. Jorge Graña, Fco. Mario Barcala, and Jesús Vilares. Formal methods of tokenization for part-of-speech tagging. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 240–249. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 266

    Google Scholar 

  11. Jorge Graña, Jean-Cédric Chappelier, and Manuel Vilares. Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of the Eu-roconference Recent Advances in Natural Language Processing (RANLP 2001), pages 122-128, Tzigov Chark, Bulgaria, 2001. 266

    Google Scholar 

  12. Jane Greenberg. Automatic query expansion via lexical-semantic relationships. Journal of the American Society for Information Science and Technology, 52(5):402–415, 2001. 267

    Article  Google Scholar 

  13. Christian Jacquemin and Evelyne Tzoukermann. NLP for term variant extraction: synergy between morphology, lexicon and syntax. In Tomek Strza-lkowski, editor, Natural Language Information Retrieval, volume 7 of Text, Speech and Language Technology, pages 25–74. Kluwer Academic Publishers, Dordrecht/Boston/London, 1999. 268

    Google Scholar 

  14. J. Savoy, A. Le Calve, and D. Vrajitoru. Report on the TREC-5 experiment: Data fusion and collection fusion. Proceedings of TREC’5, NIST publication #500-238, pages 489-502, Gaithersburg, MD, 1997. 272

    Google Scholar 

  15. Jesús Vilares, Fco. Mario Barcala, and Miguel A. Alonso. Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 381–390. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 268, 276

    Google Scholar 

  16. Jesús Vilares, David Cabrero, and Miguel A. Alonso. Applying productive derivational morphology to term indexing of Spanish texts. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2004 of Lecture Notes in Computer Science, pages 336–348. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 267

    Google Scholar 

  17. Jesús Vilares, Manuel Vilares, and Miguel A. Alonso. Towards the development of heuristics for automatic query expansion. In Heinrich C. Mayr, Jiri Lazansky, Gerald Quirchmayr, and Pavel Vogel, editors, Database and Expert Systems Applications, Volume 2113 of Lecture Notes in Computer Science, pages 887–896. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 270, 272, 276

    Google Scholar 

  18. David Yarowsky. A comparison of corpus-based techniques for restoring accents in Spanish and French text. In Natural Language Processing Using Very Large Corpora, pages 99-120. Kluwer Academic Publishers, 1999. 269

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M. (2003). COLE Experiments in the CLEF 2002 Spanish Monolingual Track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45237-9_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40830-7

  • Online ISBN: 978-3-540-45237-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics