Abstract
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bugaeva, A., Endō, S., Kurokawa, S., Nathan, D.: A Talking Dictionary of Ainu: A New Version of Kanazawa’s Ainu Conversational Dictionary (2010). http://lah.soas.ac.uk/projects/ainu/
Chiba University Graduate School of Humanities and Social Sciences: Ainugo Mukawa Hōgen Nihongo - Ainugo Jiten [Japanese - Ainu Dictionary for the Mukawa Dialect of Ainu] (2014). http://cas-chiba.net/Ainu-archives/index.html
Chiri, Y.: Ainu shin-yōshū [Collection of Ainu mythic epics]. Kyōdo Kenkyūsha, Tōkyō (1923)
Giménez, J., Márquez, L.: SVMTool: a general POS tagger generator based on Support Vector Machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (2004)
Hokkaidō Government, Environment and Lifestyle Section: Heisei nijū-ku-nen Hokkaidō Ainu seikatsu jittai chōsa hōkokusho [Report of the Survey on the Hokkaidō Ainu actual living conditions in 2017] (2017). http://www.pref.hokkaido.lg.jp/ks/ass/H29_ainu_living_conditions_survey_.pdf
Jinbō, K., Kanazawa, S.: Ainugo kaiwa jiten [Ainu conversational dictionary]. Kinkōdō Shoseki, Tōkyō (1898)
Kirikae, H.: Ainu shin-yōshū jiten: tekisuto, bumpō kaisetsu tsuki [Lexicon to Yukie Chiri’s Ainu Shin-yōshū with text and grammatical notes]. Daigaku Shorin, Tōkyō (2003)
Majewicz, A.: Ajnu. Lud, jego język i tradycja ustna [Ainu. The people, its language and oral tradition]. Wydawnictwo Naukowe UAM, Poznań (1984)
Momouchi, Y., Azumi, Y., Kadoya, Y.: Research note: construction and utilization of electronic data for “Ainu shin-yōsyū”. Bulletin of the Faculty of Engineering at Hokkai-Gakuen University 35, 159–171 (2008)
Nowakowski, K., Ptaszynski, M., Masui, F.: Improving tokenization, transcription normalization and part-of-speech tagging of ainu language through merging multiple dictionaries. In: Proceedings of the 8th Language & Technology Conference (LTC 2017), pp. 317–321 (2017)
Nowakowski, K., Ptaszynski, M., Masui, F.: A proposal for a unified corpus of the Ainu language. IPSJ SIG Tech. Rep. 237, 1–6 (2018)
Nowakowski, K., Ptaszynski, M., Masui, F.: Word n-gram based tokenization for the Ainu language. In: Proceedings of International Workshop on Modern Science and Technology (IWMST 2018), pp. 58–69 (2018)
Nowakowski, K., Ptaszynski, M., Masui, F.: MiNgMatch - a fast N-gram model for word segmentation of the Ainu language. Information 10, 317 (2019). https://doi.org/10.3390/info10100317
Nowakowski, K., Ptaszynski, M., Masui, F., Momouchi, Y.: Applying support vector machines to POS tagging of the Ainu language. Proc. Workshop Comput. Methods Endangered Lang. 2, 17–23 (2019)
Peterson, B.: Project Okikirmui. The complete Ainu legends of Chiri Yukie, in English (2013). http://www.okikirmui.com/
Ptaszynski, M., Ito, Y., Nowakowski, K., Honma, H., Nakajima, Y., Masui, F.: Combining multiple dictionaries to improve tokenization of Ainu language. In: Proceedings of The 31st Annual Conference of the Japanese Society for Artificial Intelligence (2017)
Ptaszynski, M., Momouchi, Y.: Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model. Expert Syst. Appl. 39, 11576–11582 (2012)
Ptaszynski, M., Nowakowski, K., Momouchi, Y., Masui, F.: Comparing multiple dictionaries to improve Part-of-speech tagging of Ainu language. In: Proceedings of The 22nd Annual Meeting of The Association for Natural Language Processing, pp. 973–976 (2016)
Refsing, K.: The Ainu Language. The Morphology and Syntax of the Shizunai Dialect. Aarhus University Press, Aarhus (1986)
Shibatani, M.: The Languages of Japan. Cambridge University Press, London (1990)
Sunasawa, K.: Ku sukup oruspe [My life story]. Miyama Shobō, Sapporo (1983)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nowakowski, K., Ptaszynski, M., Masui, F. (2020). Towards Better Text Processing Tools for the Ainu Language. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-66527-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66526-5
Online ISBN: 978-3-030-66527-2
eBook Packages: Computer ScienceComputer Science (R0)