Skip to main content

An Unsupervised Method to Improve Spanish Stemmer

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Abstract

We evaluate the effectiveness of using our edit distances algorithm to improving an unsupervised language-independent stemming method. The main idea is to create morphological families through the automatic words grouping using our distance. Based on that grouping, we make a stemming process. The capacity of the edit distance algorithm in the task of words clustering and the ability of our method to generate the correct stem for Spanish was evaluated. A good result (98% precision) for the morphological families’ creation and also a remarkable 99.85% of correct stemming was obtained.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistic 11(1-2), 22–31 (1968)

    Google Scholar 

  2. Paice, C.D.: Another Stemmer. ACM SIGRIR Forum 24 (3), 56–61 (1990)

    Article  Google Scholar 

  3. Porter, M.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  4. Hafer, M., Weiss, S.: Word segmentation by letter succesor varieties. Information Storage and Retrieval 10, 371–385 (1974)

    Article  Google Scholar 

  5. Smirnov, I.: Overview of Stemming Algorithms. DePaul University (2008)

    Google Scholar 

  6. Jinxi, X., Bruce, C.: Corpus-based stemming using co-ocurrence of word variants. ACM Transactions on Information Systems 16, 61–81 (1998)

    Article  Google Scholar 

  7. Peng, F., Ahmed, N., Li, X., Lu, Y.: Context sensitive stemming for web search. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–646 (2007)

    Google Scholar 

  8. James, M., Paul, M.: Single N-gram stemming. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–416 (2003)

    Google Scholar 

  9. Popovic, M.F., Willet, P.: The effectiveness of stemming for natural language accesess to Slovene textual data. Journal of American Society for Information Science 3(5), 384–390 (1992)

    Article  Google Scholar 

  10. Braschler, M., Schäuble, P.: Experiments with the Eurospider REtrieval System for CLEF 2000. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 140–148. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics - Doklady 10, 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  12. Fernández Orquín, A., Díaz, J., Fundora, A., Muñoz, R.: Un algoritmo para la extracción de características lexicográficas en la comparación de palabras. In: IV Convención Científica Internacional CIUM, Matanzas, Cuba (2009)

    Google Scholar 

  13. Knuth, D.E.: MiKTeX 2.6 (May 28, 2007), http://www.miktex.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fernández, A., Díaz, J., Gutiérrez, Y., Muñoz, R. (2011). An Unsupervised Method to Improve Spanish Stemmer. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22327-3_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22326-6

  • Online ISBN: 978-3-642-22327-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics