Croatian POS Tagger as a Prerequisite for Knowledge Extraction in Intelligent Tutoring Systems

Vasić, Daniel; Žitko, Branko; Grubišić, Ani; Stankov, Slavomir; Gašpar, Angelina; Šarić-Grgić, Ines; Tomaš, Suzana; Peraić, Ivan; Markić-Vučić, Matea

doi:10.1007/978-3-030-77857-6_23

Croatian POS Tagger as a Prerequisite for Knowledge Extraction in Intelligent Tutoring Systems

Conference paper
First Online: 03 July 2021

1439 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12792))

Abstract

In this article we present an knowledge extraction approach that can be used in systems that implement teaching in a fully automated manner. These systems are called Intelligent Tutoring Systems (ITS) and are conceived around the idea of one-to-one teaching. Many such systems use natural language processing to improve the communication interface between student and the system. These techniques can be also used on the content creator side to semi-automate or fully automate the task of teaching content creation. In such systems the knowledge representation plays a crucial role to successfully implement teaching and encourage learning. The output of the knowledge extraction phase is a knowledge in the form of a hyper graph that can be used for adaption to the students current knowledge level. We present a deep neural network architecture for precise POS tagging of words written in languages that are morphologically rich. Using sparse representations for words in this task increases the vector space and makes learning more complex. This problem can be solved to some extent by using traditional vector representations but there is also the problem with representing words that are ambiguous. Proposed architecture uses a Bidirectional Encoder Representations from Transformers (BERT) model that is pre-trained on Croatian language to achieve state-of-the-art accuracy for POS tagging.

The paper is part of the work supported by the Office of Naval Research Grant No. N00014-20-1-2066.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Graesser, A.C., VanLehn, K., Rose, C.P., Jordan, P.W., Harter, D.: Intelligent tutoring systems with conversational dialogue. AI Mag. 22(4), 39 (2001). https://doi.org/10.1609/aimag.v22i4.1591
Article Google Scholar
Graesser, A.C., Chipman, P., Haynes, B.C., Olney, A.: AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Trans. Educ. 48(4), 612–618 (2005). https://doi.org/10.1109/TE.2005.856149
Article Google Scholar
Žitko, B.: Model inteligentnog tutorskog sustava zasnovan na obradi kontroliranog jezika nad ontologijom, Fakultet elektrotehnike i računarstva, Zagreb (2010)
Google Scholar
Menezes, T., Roth, C.: Semantic hypergraphs, CoRR, vol. abs/1908.10784 (2019). http://arxiv.org/abs/1908.10784
Erjavec, T.: "MULTEXT-East: morphosyntactic resources for Central and Eastern European languages. Lang. Resour. Eval. 46, 131–142 (2012). https://doi.org/10.1007/s10579-011-9174-8. LNCS, vol. 9999, pp. 1–13
Article Google Scholar
Ljubešić, N., Agić, Ž., Klubička, F., Batanović, V., Erjavec, T.: Training corpus hr500k 1.0. In: Slovenian Language Resource Repository CLARIN.SI (2018). http://hdl.handle.net/11356/1183
Charniak, E.: Statistical techniques for natural language parsing. AI Mag. 18(4) (1997). https://doi.org/10.1609/aimag.v18i4.1320
DeRose, S.J.: Grammatical category disambiguation by statistical optimization. In: Computational Linguistics, vol. 14, pp. 31–39 (1988). https://www.aclweb.org/anthology/J88-1003
Giménez, J., Màrquez, L.: SVMTool: a general POS tagger generator based on Support Vector Machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
Google Scholar
Spoustoví, D., Hajič, J., Raab, J., Spousta, M.: Semi-supervised training for the averaged perceptron POS tagger. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), pp. 763–771 (2009)
Google Scholar
Agić, Ž., Ljubešić, N., Merkler, D.: Lemmatization and morphosyntactic tagging of Croatian and Serbian. In: Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing (BSNLP 2013), Sofia, Bulgaria, pp. 48–57. Association for Computational Linguistics (2013)
Google Scholar
Ljubešić, N., Klubička, F., Agić, Ž., Jazbec, I.: New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 4264–4270. European Language Resources Association (ELRA). https://www.aclweb.org/anthology/L16-1676
Ljubešić, N.: The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1, Slovenian language resource repository CLARIN.SI (2020). http://hdl.handle.net/11356/1287
Jacob, D., Ming-Wei, C., Kenton, L., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), vol. 1, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 2227–2237. https://doi.org/10.18653/v1/N18-1202, https://www.aclweb.org/anthology/N18-1202
Ulčar, M., Robnik-Šikonja, M.: High Quality ELMo embeddings for seven less-resourced languages. In: Proceedings of the 12th Language Resources and Evaluation Conference (2020). https://www.aclweb.org/anthology/2020.lrec-1.582
Ulčar, M., Robnik-Šikonja, M.: FinEst BERT and CroSloEngual BERT. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds.) TSD 2020. LNCS (LNAI), vol. 12284, pp. 104–111. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58323-1_11
Chapter Google Scholar
Zarita, Z., Ong, P.: Function approximation using artificial neural networks, pp. 140–145. World Scientific and Engineering Academy and Society (WSEAS) (2007). https://doi.org/10.5555/1376368.1376392
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9 (1998). https://doi.org/10.1162/neco.1997.9.8.1735
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014). https://doi.org/10.3115/v1/D14-1179
Svenstrup, D., Hansen, J., Winther, O.: Hash embeddings for efficient word representations. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4935–4943 (2017). https://doi.org/10.5555/3295222.3295246
Goodfellow, I., Warde-Farley, D., Mehdi M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, vol. 28, pp. 1319–1327 (2013)
Google Scholar
spaCy - Industrial-strength Natural Language Processing in Python. https://nightly.spacy.io/. Accessed 14 Jan 2021
Ljubešić, N.: Word embeddings CLARIN.SI-embed.hr 1.0, Slovenian language resource repository CLARIN.SI (2018). http://hdl.handle.net/11356/1205
Dozat, T., Manning, C.: Simpler but more accurate semantic dependency parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 484–490 (2018). https://doi.org/10.18653/v1/P18-2077
Ruder, S.: An overview of gradient descent optimization algorithms, CoRR, vol. abs/1609.04747 (2016)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings 3rd International Conference on Learning Representations, ICLR 2015 (2015). http://arxiv.org/abs/1412.6980
Vasić, D., Žitko, B., Gašpar, A., Ljubešić, N., Štrkalj Despot, K., Merkler, D.: Semantic hypergraph corpus SemCRO 1.0, Slovenian language resource repository CLARIN.SI (2020). http://hdl.handle.net/11356/1377
Baisa, V., Kovář, V.: Information extraction for Czech based on syntactic analysis. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS (LNAI), vol. 8387, pp. 155–165. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08958-4_13
Chapter Google Scholar
Straková, J., Pecina, P.: Czech information retrieval with syntax-based language models. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010) (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Education, University of Mostar, 88000, Mostar, Bosnia and Herzegovina
Daniel Vasić
Faculty of Science, University of Split, 21000, Split, Croatia
Branko Žitko, Ani Grubišić, Slavomir Stankov, Ivan Peraić & Matea Markić-Vučić
Catholic Faculty of Theology, University of Split, 21000, Split, Croatia
Angelina Gašpar, Ines Šarić-Grgić & Suzana Tomaš
Faculty of Humanities and Social Sciences, University of Split, 21000, Split, Croatia
Daniel Vasić, Branko Žitko, Ani Grubišić, Slavomir Stankov, Angelina Gašpar, Ines Šarić-Grgić, Suzana Tomaš & Matea Markić-Vučić

Authors

Daniel Vasić
View author publications
You can also search for this author in PubMed Google Scholar
Branko Žitko
View author publications
You can also search for this author in PubMed Google Scholar
Ani Grubišić
View author publications
You can also search for this author in PubMed Google Scholar
Slavomir Stankov
View author publications
You can also search for this author in PubMed Google Scholar
Angelina Gašpar
View author publications
You can also search for this author in PubMed Google Scholar
Ines Šarić-Grgić
View author publications
You can also search for this author in PubMed Google Scholar
Suzana Tomaš
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Peraić
View author publications
You can also search for this author in PubMed Google Scholar
Matea Markić-Vučić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Vasić .

Editor information

Editors and Affiliations

Soar Technology, Inc., Orlando, FL, USA
Robert A. Sottilare
Fraunhofer FKIE, Wachtberg, Germany
Jessica Schwarz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vasić, D. et al. (2021). Croatian POS Tagger as a Prerequisite for Knowledge Extraction in Intelligent Tutoring Systems. In: Sottilare, R.A., Schwarz, J. (eds) Adaptive Instructional Systems. Design and Evaluation. HCII 2021. Lecture Notes in Computer Science(), vol 12792. Springer, Cham. https://doi.org/10.1007/978-3-030-77857-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-77857-6_23
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77856-9
Online ISBN: 978-3-030-77857-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics