Araştırma Makalesi
BibTex RIS Kaynak Göster

Turkish Temporal Expression Extraction and Identification

Yıl 2021, Cilt: 14 Sayı: 3, 337 - 343, 31.07.2021
https://doi.org/10.17671/gazibtd.853145

Öz

Temporal expression recognition and disambiguation is a critical stage for natural language processing tasks that require semantic inference. Nowadays, language processing technologies (a subfield of artificial intelligence) require the analysis of temporal expressions in many phases. In this article, a temporal expression recognition and disambiguation system for Turkish is presented for the first time in the literature. Additionally, again for the first time, a Turkish temporal expression test data set has been created and made publicly available for researchers. The introduced system was developed on HeidelTime architecture which is frequently used for other languages. On average, 90 percent performance was achieved for four different types of temporal expressions (date, time, duration, and set) on the introduced data set. The system performance is evaluated under different evaluation criteria and compared with a baseline named entity recognizer and HeidelTime automatically created language resources. It is anticipated that the system fills an important gap in the Turkish natural language studies so far and will benefit future studies.

Kaynakça

  • J. Pustejovsky, R. Ingria, R. Sauri, J. M. Castaño, J. Littman, R. J. Gaizauskas, A. Setzer, G. Katz, I. Mani, "The Specification Language TimeML", The Language of Time: A Reader, Oxford University Press, UK, 2015.
  • M. Verhagen, R. Gaizauskas, F. Schilder, M. Hepple, J. Moszkowicz, J. Pustejovsky, "The tempEval challenge: Identifying temporal relations in text", Language Resources and Evaluation, 43(2), 161-179, 2019.
  • B. Boguraev, J. Pustejovsky, R. Ando, M. Verhagen, "TimeBank evolution as a community resource for TimeML parsing", Language Resources and Evaluation, 41(1), 91-115, 2007.
  • A. Setzer, R. Gaizauskas, M. Hepple, "The role of inference in the temporal annotation and analysis of text", Language Resources and Evaluation, 39(2-3), 243-265, 2005.
  • H. Llorens, E. Saquete, B. Navarro-Colorado, "Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language", Information Processing and Management, 49(1), 179-197, 2013.
  • M. Navas-Loro, E. Filtz, V. Rodríguez-Doncel, A. Polleres, S. Kirrane, "TempCourt: Evaluation of temporal taggers on a new corpus of court decisions", Knowledge Engineering Review, 2019.
  • J. Kocoń, M. Marcińczuk, "Supervised approach to recognise polish temporal expressions and rule-based interpretation of timexes", Natural Language Engineering, 23(3), 385-418, 2017.
  • R. Gennari, P. Vittorini, "Qualitative temporal reasoning can improve on temporal annotation quality: How and why", Applied Artificial Intelligence, 30(7), 690-719, 2016.
  • R. M. Reeves, F. R. Ong, M. E. Matheny, J. C. Denny, D. Aronsky, G. T. Gobbel, … S. H. Brown, "Detecting temporal expressions in medical narratives", International Journal of Medical Informatics, 82(2), 118-127, 2013.
  • R. Gennari, S. Tonelli, P. Vittorini, "Challenges in quality of temporal data - starting with gold standards", Journal of Data and Information Quality, 6(2), 2015.
  • P. Mazur, R. Dale, "The DANTE temporal expression tagger", Human Language Technology. Challenges of the Information Society: Third Language and Technology Conference, LTC 2007, Poznan, Poland, October 5-7, 2007, Revised Selected Papers, 5603, Springer, 245-257, 2007.
  • N. UzZaman, J. Allen, "TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text", Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 276-283, 2010.
  • J. Strötgen, M. Gertz, "Heideltime: High quality rule-based extraction and normalization of temporal expressions", 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 321-324, 2010.
  • A. X. Chang, C. D. Manning, "Sutime: A library for recognizing and normalizing time expressions", Eight International Conference on Language Resources and Evaluation, European Language Resources Association, 3735-3740, 2012.
  • Ş.E. Şeker, B. Diri, "TimeML and Turkish Temporal Logic", 2010 International Conference on Artificial Intelligence, 10, 881-887, 2010.
  • Ş. E. Şeker, B. Diri, "Türkçe Metinler için Olay Sıralaması", Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 4(1), 2016.
  • D. Küçük, D. Küçük, "On TimeML-Compliant Temporal Expression Extraction in Turkish", arXiv preprint arXiv:1509.00963, 2015.
  • G. A. Şeker, G. Eryiğit, "Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content", Semantic Web, 8(5), 625-642, 2017.
  • U. Sulubacak, G. Eryiğit, "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing", Turkish Journal of Electrical Engineering & Computer Sciences, 26(3), 1662-1672, 2018.
  • J. Strötgen, A. Armiti, T. V. Canh, J. Zell, M. Gertz, "Time for more languages: Temporal tagging of Arabic, Italian, Spanish, and Vietnamese", ACM Transactions on Asian Language Information Processing (TALIP), 13(1), 1-21, 2014.
  • V. Moriceau, X. Tannier, "French resources for extraction and normalization of temporal expressions with HeidelTime", Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3239-3243, 2014.
  • L. Skukan, G. Glavaš, J. Šnajder, "HEIDELTIME. HR: extracting and normalizing temporal expressions in Croatian", 9th Slovenian Language Technologies Conferences (IS-LT 2014), 99-103, 2014.
  • H. Li, J. Strötgen, J. Zell, and M. Gertz, "Chinese temporal tagging with HeidelTime", 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, Association for Computational Linguistics, 133-137, 2014.
  • Internet: Hepworth art gallery announces plans for 10th birthday exhibition, https://www.bbc.com/news/uk-england-leeds-55116749#:~:text=The%20Hepworth%20art%20gallery%20has,gallery%20which%20opened%20in%201934, 30.11.2020.
  • A. B. Ercilasun, "Türkçenin dünya dilleri arasındaki yeri", Dil Araştırmaları, 12(12), 17-22, 2013.
  • R. Sauri, J. Littman, B. Knippen, R. Gaizauskas, A. Setzer, and J. Pustejovsky, TimeML Annotation Guidelines Version 1.2. 1, 2006.
  • S. F. Abasıyanık, Alemdağ’da Var Bir Yılan, YKY, 2010.
  • N. UzZaman, H. Llorens, L. Derczynski, J. Allen, M. Verhagen, and J. Pustejovsky, "Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations", Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Association for Computational Linguistics, 1-9, 2013.

Türkçe Zamansal İfadelerin Yakalanması ve Tanımlanması

Yıl 2021, Cilt: 14 Sayı: 3, 337 - 343, 31.07.2021
https://doi.org/10.17671/gazibtd.853145

Öz

Zamansal ifadelerin yakalanması ve tanımlanması, anlamsal çıkarım gerektiren durumlar için kritik öneme sahip bir doğal dil işleme görevidir. Günümüzde yapay zeka alanında öne çıkan dil işleme teknolojileri pek çok araştırma ve uygulama evresinde zamansal ifadelerin çözümlenmesine ihtiyaç duymaktadır. Bu makalede, Türkçe için literatürde yer alan ilk zamansal ifade yakalama ve tanımlama sistemi tanıtılmaktadır. Yine literatürde ilk kez bu konuda takip eden çalışmalarda kullanılabilecek bir sınama veri kümesi oluşturulmuş ve araştırmacıların hizmetine sunulmuştur. Açık kaynak olarak geliştirilen sistem diğer diller için sıklıkla kullanılan HeidelTime mimarisi üzerine kurulmuş ve oluşturulan veri kümesi üzerinde dört farklı tür (tarih, saat, süre, tekrar belirten zaman ifadeleri) için ortalamada yüzde 90 civarında başarım elde edilmiştir. Sistem farklı literatürde yer alan farklı değerlendirme ölçütleri ile değerlendirilmiş ve temel bir Türkçe varlık ismi tanıma ve otomatik oluşturulmuş HeidelTime dil kaynakları ile karşılaştırılmıştır. Geliştirilen sistemin Türkçe doğal dil araştırmalarında eksik kalan önemli bir yapı taşını tamamladığı ve ileriki çalışmalara fayda sağlayacağı öngörülmektedir.

Kaynakça

  • J. Pustejovsky, R. Ingria, R. Sauri, J. M. Castaño, J. Littman, R. J. Gaizauskas, A. Setzer, G. Katz, I. Mani, "The Specification Language TimeML", The Language of Time: A Reader, Oxford University Press, UK, 2015.
  • M. Verhagen, R. Gaizauskas, F. Schilder, M. Hepple, J. Moszkowicz, J. Pustejovsky, "The tempEval challenge: Identifying temporal relations in text", Language Resources and Evaluation, 43(2), 161-179, 2019.
  • B. Boguraev, J. Pustejovsky, R. Ando, M. Verhagen, "TimeBank evolution as a community resource for TimeML parsing", Language Resources and Evaluation, 41(1), 91-115, 2007.
  • A. Setzer, R. Gaizauskas, M. Hepple, "The role of inference in the temporal annotation and analysis of text", Language Resources and Evaluation, 39(2-3), 243-265, 2005.
  • H. Llorens, E. Saquete, B. Navarro-Colorado, "Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language", Information Processing and Management, 49(1), 179-197, 2013.
  • M. Navas-Loro, E. Filtz, V. Rodríguez-Doncel, A. Polleres, S. Kirrane, "TempCourt: Evaluation of temporal taggers on a new corpus of court decisions", Knowledge Engineering Review, 2019.
  • J. Kocoń, M. Marcińczuk, "Supervised approach to recognise polish temporal expressions and rule-based interpretation of timexes", Natural Language Engineering, 23(3), 385-418, 2017.
  • R. Gennari, P. Vittorini, "Qualitative temporal reasoning can improve on temporal annotation quality: How and why", Applied Artificial Intelligence, 30(7), 690-719, 2016.
  • R. M. Reeves, F. R. Ong, M. E. Matheny, J. C. Denny, D. Aronsky, G. T. Gobbel, … S. H. Brown, "Detecting temporal expressions in medical narratives", International Journal of Medical Informatics, 82(2), 118-127, 2013.
  • R. Gennari, S. Tonelli, P. Vittorini, "Challenges in quality of temporal data - starting with gold standards", Journal of Data and Information Quality, 6(2), 2015.
  • P. Mazur, R. Dale, "The DANTE temporal expression tagger", Human Language Technology. Challenges of the Information Society: Third Language and Technology Conference, LTC 2007, Poznan, Poland, October 5-7, 2007, Revised Selected Papers, 5603, Springer, 245-257, 2007.
  • N. UzZaman, J. Allen, "TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text", Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 276-283, 2010.
  • J. Strötgen, M. Gertz, "Heideltime: High quality rule-based extraction and normalization of temporal expressions", 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 321-324, 2010.
  • A. X. Chang, C. D. Manning, "Sutime: A library for recognizing and normalizing time expressions", Eight International Conference on Language Resources and Evaluation, European Language Resources Association, 3735-3740, 2012.
  • Ş.E. Şeker, B. Diri, "TimeML and Turkish Temporal Logic", 2010 International Conference on Artificial Intelligence, 10, 881-887, 2010.
  • Ş. E. Şeker, B. Diri, "Türkçe Metinler için Olay Sıralaması", Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 4(1), 2016.
  • D. Küçük, D. Küçük, "On TimeML-Compliant Temporal Expression Extraction in Turkish", arXiv preprint arXiv:1509.00963, 2015.
  • G. A. Şeker, G. Eryiğit, "Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content", Semantic Web, 8(5), 625-642, 2017.
  • U. Sulubacak, G. Eryiğit, "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing", Turkish Journal of Electrical Engineering & Computer Sciences, 26(3), 1662-1672, 2018.
  • J. Strötgen, A. Armiti, T. V. Canh, J. Zell, M. Gertz, "Time for more languages: Temporal tagging of Arabic, Italian, Spanish, and Vietnamese", ACM Transactions on Asian Language Information Processing (TALIP), 13(1), 1-21, 2014.
  • V. Moriceau, X. Tannier, "French resources for extraction and normalization of temporal expressions with HeidelTime", Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3239-3243, 2014.
  • L. Skukan, G. Glavaš, J. Šnajder, "HEIDELTIME. HR: extracting and normalizing temporal expressions in Croatian", 9th Slovenian Language Technologies Conferences (IS-LT 2014), 99-103, 2014.
  • H. Li, J. Strötgen, J. Zell, and M. Gertz, "Chinese temporal tagging with HeidelTime", 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, Association for Computational Linguistics, 133-137, 2014.
  • Internet: Hepworth art gallery announces plans for 10th birthday exhibition, https://www.bbc.com/news/uk-england-leeds-55116749#:~:text=The%20Hepworth%20art%20gallery%20has,gallery%20which%20opened%20in%201934, 30.11.2020.
  • A. B. Ercilasun, "Türkçenin dünya dilleri arasındaki yeri", Dil Araştırmaları, 12(12), 17-22, 2013.
  • R. Sauri, J. Littman, B. Knippen, R. Gaizauskas, A. Setzer, and J. Pustejovsky, TimeML Annotation Guidelines Version 1.2. 1, 2006.
  • S. F. Abasıyanık, Alemdağ’da Var Bir Yılan, YKY, 2010.
  • N. UzZaman, H. Llorens, L. Derczynski, J. Allen, M. Verhagen, and J. Pustejovsky, "Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations", Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Association for Computational Linguistics, 1-9, 2013.
Toplam 28 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgisayar Yazılımı
Bölüm Makaleler
Yazarlar

Hatice Camcı Bu kişi benim 0000-0003-4607-7305

Gülşen Eryiğit 0000-0003-4607-7305

Yayımlanma Tarihi 31 Temmuz 2021
Gönderilme Tarihi 4 Ocak 2021
Yayımlandığı Sayı Yıl 2021 Cilt: 14 Sayı: 3

Kaynak Göster

APA Camcı, H., & Eryiğit, G. (2021). Türkçe Zamansal İfadelerin Yakalanması ve Tanımlanması. Bilişim Teknolojileri Dergisi, 14(3), 337-343. https://doi.org/10.17671/gazibtd.853145