An Unsupervised Method to Improve Spanish Stemmer

Fernández, Antonio; Díaz, Josval; Gutiérrez, Yoan; Muñoz, Rafael

doi:10.1007/978-3-642-22327-3_24

An Unsupervised Method to Improve Spanish Stemmer

Antonio Fernández¹⁹,
Josval Díaz¹⁹,
Yoan Gutiérrez¹⁹ &
…
Rafael Muñoz²⁰

Conference paper

1798 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6716))

Abstract

We evaluate the effectiveness of using our edit distances algorithm to improving an unsupervised language-independent stemming method. The main idea is to create morphological families through the automatic words grouping using our distance. Based on that grouping, we make a stemming process. The capacity of the edit distance algorithm in the task of words clustering and the ability of our method to generate the correct stem for Spanish was evaluated. A good result (98% precision) for the morphological families’ creation and also a remarkable 99.85% of correct stemming was obtained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistic 11(1-2), 22–31 (1968)
Google Scholar
Paice, C.D.: Another Stemmer. ACM SIGRIR Forum 24 (3), 56–61 (1990)
Article Google Scholar
Porter, M.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Hafer, M., Weiss, S.: Word segmentation by letter succesor varieties. Information Storage and Retrieval 10, 371–385 (1974)
Article Google Scholar
Smirnov, I.: Overview of Stemming Algorithms. DePaul University (2008)
Google Scholar
Jinxi, X., Bruce, C.: Corpus-based stemming using co-ocurrence of word variants. ACM Transactions on Information Systems 16, 61–81 (1998)
Article Google Scholar
Peng, F., Ahmed, N., Li, X., Lu, Y.: Context sensitive stemming for web search. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–646 (2007)
Google Scholar
James, M., Paul, M.: Single N-gram stemming. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–416 (2003)
Google Scholar
Popovic, M.F., Willet, P.: The effectiveness of stemming for natural language accesess to Slovene textual data. Journal of American Society for Information Science 3(5), 384–390 (1992)
Article Google Scholar
Braschler, M., Schäuble, P.: Experiments with the Eurospider REtrieval System for CLEF 2000. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 140–148. Springer, Heidelberg (2001)
Chapter Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics - Doklady 10, 707–710 (1966)
MathSciNet MATH Google Scholar
Fernández Orquín, A., Díaz, J., Fundora, A., Muñoz, R.: Un algoritmo para la extracción de características lexicográficas en la comparación de palabras. In: IV Convención Científica Internacional CIUM, Matanzas, Cuba (2009)
Google Scholar
Knuth, D.E.: MiKTeX 2.6 (May 28, 2007), http://www.miktex.org

Download references

Author information

Authors and Affiliations

Departamento de Informática, Universidad de Matanzas, Autopista a Varadero, Matanzas, Cuba
Antonio Fernández, Josval Díaz & Yoan Gutiérrez
Departamento de Lenguaje y Sistemas Informáticos, Universidad de Alicante, España
Rafael Muñoz

Authors

Antonio Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Josval Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Yoan Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Muñoz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Alicante, 03080, Alicante, Spain
Rafael Muñoz
Department of Software and Computing Systems, University of Alicante, Aptdo. de Correos 99, 03080, Alicante, Spain
Andrés Montoyo
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernández, A., Díaz, J., Gutiérrez, Y., Muñoz, R. (2011). An Unsupervised Method to Improve Spanish Stemmer. In: Muñoz, R., Montoyo, A., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2011. Lecture Notes in Computer Science, vol 6716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22327-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-22327-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22326-6
Online ISBN: 978-3-642-22327-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics