Skip to main content

Towards Better Text Processing Tools for the Ainu Language

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2017)

Abstract

In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.stv.jp/radio/ainugo/index.html.

  2. 2.

    https://otarunay.at-ninja.jp/taimuzu.html.

References

  1. Bugaeva, A., Endō, S., Kurokawa, S., Nathan, D.: A Talking Dictionary of Ainu: A New Version of Kanazawa’s Ainu Conversational Dictionary (2010). http://lah.soas.ac.uk/projects/ainu/

  2. Chiba University Graduate School of Humanities and Social Sciences: Ainugo Mukawa Hōgen Nihongo - Ainugo Jiten [Japanese - Ainu Dictionary for the Mukawa Dialect of Ainu] (2014). http://cas-chiba.net/Ainu-archives/index.html

  3. Chiri, Y.: Ainu shin-yōshū [Collection of Ainu mythic epics]. Kyōdo Kenkyūsha, Tōkyō (1923)

    Google Scholar 

  4. Giménez, J., Márquez, L.: SVMTool: a general POS tagger generator based on Support Vector Machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (2004)

    Google Scholar 

  5. Hokkaidō Government, Environment and Lifestyle Section: Heisei nijū-ku-nen Hokkaidō Ainu seikatsu jittai chōsa hōkokusho [Report of the Survey on the Hokkaidō Ainu actual living conditions in 2017] (2017). http://www.pref.hokkaido.lg.jp/ks/ass/H29_ainu_living_conditions_survey_.pdf

  6. Jinbō, K., Kanazawa, S.: Ainugo kaiwa jiten [Ainu conversational dictionary]. Kinkōdō Shoseki, Tōkyō (1898)

    Google Scholar 

  7. Kirikae, H.: Ainu shin-yōshū jiten: tekisuto, bumpō kaisetsu tsuki [Lexicon to Yukie Chiri’s Ainu Shin-yōshū with text and grammatical notes]. Daigaku Shorin, Tōkyō (2003)

    Google Scholar 

  8. Majewicz, A.: Ajnu. Lud, jego język i tradycja ustna [Ainu. The people, its language and oral tradition]. Wydawnictwo Naukowe UAM, Poznań (1984)

    Google Scholar 

  9. Momouchi, Y., Azumi, Y., Kadoya, Y.: Research note: construction and utilization of electronic data for “Ainu shin-yōsyū”. Bulletin of the Faculty of Engineering at Hokkai-Gakuen University 35, 159–171 (2008)

    Google Scholar 

  10. Nowakowski, K., Ptaszynski, M., Masui, F.: Improving tokenization, transcription normalization and part-of-speech tagging of ainu language through merging multiple dictionaries. In: Proceedings of the 8th Language & Technology Conference (LTC 2017), pp. 317–321 (2017)

    Google Scholar 

  11. Nowakowski, K., Ptaszynski, M., Masui, F.: A proposal for a unified corpus of the Ainu language. IPSJ SIG Tech. Rep. 237, 1–6 (2018)

    Google Scholar 

  12. Nowakowski, K., Ptaszynski, M., Masui, F.: Word n-gram based tokenization for the Ainu language. In: Proceedings of International Workshop on Modern Science and Technology (IWMST 2018), pp. 58–69 (2018)

    Google Scholar 

  13. Nowakowski, K., Ptaszynski, M., Masui, F.: MiNgMatch - a fast N-gram model for word segmentation of the Ainu language. Information 10, 317 (2019). https://doi.org/10.3390/info10100317

    Article  Google Scholar 

  14. Nowakowski, K., Ptaszynski, M., Masui, F., Momouchi, Y.: Applying support vector machines to POS tagging of the Ainu language. Proc. Workshop Comput. Methods Endangered Lang. 2, 17–23 (2019)

    Google Scholar 

  15. Peterson, B.: Project Okikirmui. The complete Ainu legends of Chiri Yukie, in English (2013). http://www.okikirmui.com/

  16. Ptaszynski, M., Ito, Y., Nowakowski, K., Honma, H., Nakajima, Y., Masui, F.: Combining multiple dictionaries to improve tokenization of Ainu language. In: Proceedings of The 31st Annual Conference of the Japanese Society for Artificial Intelligence (2017)

    Google Scholar 

  17. Ptaszynski, M., Momouchi, Y.: Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model. Expert Syst. Appl. 39, 11576–11582 (2012)

    Article  Google Scholar 

  18. Ptaszynski, M., Nowakowski, K., Momouchi, Y., Masui, F.: Comparing multiple dictionaries to improve Part-of-speech tagging of Ainu language. In: Proceedings of The 22nd Annual Meeting of The Association for Natural Language Processing, pp. 973–976 (2016)

    Google Scholar 

  19. Refsing, K.: The Ainu Language. The Morphology and Syntax of the Shizunai Dialect. Aarhus University Press, Aarhus (1986)

    Google Scholar 

  20. Shibatani, M.: The Languages of Japan. Cambridge University Press, London (1990)

    Google Scholar 

  21. Sunasawa, K.: Ku sukup oruspe [My life story]. Miyama Shobō, Sapporo (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karol Nowakowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nowakowski, K., Ptaszynski, M., Masui, F. (2020). Towards Better Text Processing Tools for the Ainu Language. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66527-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66526-5

  • Online ISBN: 978-3-030-66527-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics