Abstract
We propose a method for morphological analysis and disambiguation for Kazakh language that accounts for both inflectional and derivational morphology, including not fully productive derivation. The method is data-driven and does not require manually generated rules. We leverage so called “transition chains” that help pruning false segmentations, while keeping correct ones. At the disambiguation step we use a standard HMM-based approach. Evaluating our method against open source solutions on several data sets, we show that it achieves better or on par performance. We also provide an extensive error analysis that sheds light on common problems of the morphological disambiguation of the language.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Washington, J., Salimzyanov, I., Tyers, F.: Finite-state morphological transducers for three kypchak languages. In: Calzolari N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), Reykjavik (May 2014)
Oflazer, K., Güzey, C.: Spelling correction in agglutinative languages. In: ANLP, pp. 194–195 (1994)
Sak, H., Güngör, T., Saraçlar, M.: A stochastic finite-state morphological parser for turkish. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 273–276. Association for Computational Linguistics, Stroudsburg (2009)
Koskenniemi, K.: A general computational model for word-form recognition and production. In: Proceedings of the 10th International Conference on Computational Linguistics, pp. 178–181. Association for Computational Linguistics (1984)
Hulden, M.: Foma: a finite-state compiler and library. In: Lascarides, A., Gardent, C., Nivre, J. (eds.) EACL (Demos), pp. 29–32. The Association for Computer Linguistics (2009)
Lindén, K., Axelson, E., Hardwick, S., Pirinen, T.A., Silfverberg, M.: HFST-Framework for Compiling and Applying Morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011)
Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., Sharafudinov, A.: Assembling the kazakh language corpus. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1022–1031. Association for Computational Linguistics, Seattle(2013)
Grzegorz Chrupała, G.D., van Genabith, J.: Learning morphology with morfette. In: Calzolari, N., Khalid Choukri, B.M.J.M.J.O.S.P.D.T. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), Marrakech (May 2008), http://www.lrec-conf.org/proceedings/lrec2008/
Hakkani-Tur, D.Z., Oflazer, K., Tur, G.: Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities 36(4), 381–410 (2002)
Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial combination of rules and statistics: A case study in czech tagging. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 268–275. Association for Computational Linguistics, Stroudsburg (2001)
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing (TSLP) 4(1), 3 (2007)
Kohonen, O., Virpioja, S., Leppänen, L., Lagus, K.: Semi-supervised extensions to morfessor baseline. In: Proceedings of the Morpho Challenge 2010 Workshop. Aalto University School of Science and Technology Faculty of Information and Natural Sciences Department of Information and Computer Science, Espoo, Finland (September 2010)
Sharipbayev, A., Bekmanova, G., Ergesh, B., Buribayeva, A., Karabalayeva, M.K.: Intellectual morphological analyzer based on semantic networks. In: Proceedings of the OSTIS 2012, pp. 397–400 (2012)
Kessikbayeva, G., Cicekli, I.: Rule based morphological analyzer of kazakh language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, pp. 46–54. Association for Computational Linguistics, Baltimore (2014)
Altenbek, G., Xiao-long, W.: Kazakh segmentation system of inflectional affixes. In: Joint Conference on Chinese Language Processing, CIPS-SIGHAN, pp. 183–190 (2010)
Kairakbay, B.M., Zaurbekov, D.L.: Finite state approach to the Kazakh nominal paradigm. In: Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing, pp. 108–112. Association for Computational Linguistics, St Andrews (2013)
Makazhanov, A., Makhambetov, O., Sabyrgaliyev, I., Yessenbayev, Z.: Spelling correction for kazakh. In: Gelbukh, A. (ed.) Proceedings of the 2014 Computational Linguistics and Intelligent Text Processing. LNCS, vol. 8404, pp. 533–541. Springer, Heidelberg (2014)
Zafer, H.R., Tilki, B., Kurt, A., Kara, M.: Two-level description of kazakh morphology. In: Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics, FLTAL 2011, Sarajevo (May 2011)
Ranta, A.: A multilingual natural-language interface to regular expressions. In: Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, FSMNLP 2009, pp. 79–90. Association for Computational Linguistics, Stroudsburg (1998)
Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., Sharafudinov, A., Makhambetov, O.: On certain aspects of kazakh part-of-speech tagging. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (October 2014)
Oflazer, K., Say, B., Hakkani-Tür, D.Z., Tür, G.: Building a turkish treebank. In: Treebanks, pp. 261–277. Springer (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Makhambetov, O., Makazhanov, A., Sabyrgaliyev, I., Yessenbayev, Z. (2015). Data-Driven Morphological Analysis and Disambiguation for Kazakh. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-18111-0_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18110-3
Online ISBN: 978-3-319-18111-0
eBook Packages: Computer ScienceComputer Science (R0)