Morpho Challenge Evaluation Using a Linguistic Gold Standard

Kurimo, Mikko; Creutz, Mathias; Varjokallio, Matti

doi:10.1007/978-3-540-85760-0_111

Morpho Challenge Evaluation Using a Linguistic Gold Standard

Mikko Kurimo¹,
Mathias Creutz¹ &
Matti Varjokallio¹

Conference paper

606 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5152))

Abstract

In Morpho Challenge 2007, the objective was to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling. Because in unsupervised morpheme analysis the morphemes can have arbitrary names, the analyses are here evaluated by a comparison to a linguistic gold standard by matching the morpheme-sharing word pairs. The data sets were provided for four languages: Finnish, German, English, and Turkish and the participants were encouraged to apply their algorithm to all of them. The results show significant variance between the methods and languages, but the best methods seem to be useful in all tested languages and match quite well with the linguistic analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kurimo, M., Creutz, M., Varjokallio, M., Arisoy, E., Saraclar, M.: Unsupervised segmentation of words into morphemes - Challenge 2005, an introduction and evaluation report. In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, Venice, Italy (2006)
Google Scholar
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of HLT-NAACL, Edmonton, Canada, pp. 4–6 (2003)
Google Scholar
Lee, Y.S.: Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL, Boston, MA, USA (2004)
Google Scholar
Zieman, Y., Bleich, H.: Conceptual mapping of user’s queries to medical subject headings. In: Proceedings of the 1997 American Medical Informatics Association (AMIA) Annual Fall Symposium (October 1997)
Google Scholar
Kurimo, M., Creutz, M., Turunen, V.: Morpho Challenge evaluation by IR experiments. In: Peters, C., et al. (eds.) CLEF 2007 Workshop. LNCS, vol. 5152. Springer, Heidelberg (2008)
Google Scholar
Cetinoglu, O.: Prolog based natural language processing infrastructure for Turkish. M.Sc. thesis, Bogazici University, Istanbul, Turkey (2000)
Google Scholar
Dutagaci, H.: Statistical language models for large vocabulary continuous speech recognition of Turkish. M.Sc. thesis, Bogazici University, Istanbul, Turkey (2002)
Google Scholar
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), Espoo, Finland, pp. 106–113 (2005)
Google Scholar
Creutz, M., Lagus, K.: Morfessor in the Morpho Challenge. In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, Venice, Italy (2006)
Google Scholar
Tepper, M.: A Hybrid Approach to the Induction of Underlying Morphology. PhD thesis, University of Washington (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Adaptive Informatics Research Centre, Helsinki University of Technology, P.O.Box 5400, FIN-02015 TKK, Finland
Mikko Kurimo, Mathias Creutz & Matti Varjokallio

Authors

Mikko Kurimo
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Creutz
View author publications
You can also search for this author in PubMed Google Scholar
Matti Varjokallio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Valentin Jijkoun Thomas Mandl Henning Müller Douglas W. Oard Anselmo Peñas Vivien Petras Diana Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurimo, M., Creutz, M., Varjokallio, M. (2008). Morpho Challenge Evaluation Using a Linguistic Gold Standard. In: Peters, C., et al. Advances in Multilingual and Multimodal Information Retrieval. CLEF 2007. Lecture Notes in Computer Science, vol 5152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85760-0_111

Download citation

DOI: https://doi.org/10.1007/978-3-540-85760-0_111
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85759-4
Online ISBN: 978-3-540-85760-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics