Designing a Persian question answering system based on rhetorical structure theory

Document Type : Research Paper

Authors

1 Department of Information Technology Management, Faculty of Management, Science and Research Branch, Islamic Azad University, Tehran, Iran

2 Faculty of Management, Central Tehran Branch, Islamic Azad University, Tehran, Iran

3 Department of Computer Engineering, Faculty of Computer, Saveh Branch, Islamic Azad University, Tehran, Iran

Abstract

A question answering system answers questions using natural language processing, a database, or a document set and returns an accurate answer to the user’s question. A large number of efforts have been made to design some systems to answer the user’s question. However, limited studies have been conducted on the Persian language to extract the answer to the questions with subjects “why” or “how”. The scarcity of such studies is attributed to the complexity and time-consuming analysis and processing of the text structure when going beyond the boundaries of a sentence.
The present study’s primary purpose was to analyze Persian text to create a set of linguistic patterns that can perform related information of causal/explanatory text sentences in a general domain. Information retrieval and text structure recognition algorithms were used for data and text analysis, called Rhetorical structure theory. In addition, 70 questions for “why” and 20 questions for “how” were determined for evaluating the system performance, respectively. Finally, the .NET programming language and relational database, and Persian language interpreters were used to design the software system.
Eventually, a system was designed and published to answer the question with subjects “why” or “how” with general Data Domain.
The system answered 61 questions with a recall rate of 68%. About 55% of the items were correctly responded to according to the signs of inter-sentence relation, while the correct answers to 13% of questions were related to rhetorical relation among the sentences.

Keywords

[1] E. Blancol, N. Castell and D. Moldovan, Causal Relation Extraction, In Proceedings of the International of
Conference on Language Resources and Evaluation, LREC, Morocco. (2008) 310-313.
[2] Y. Boreshban and S.A. Mirroshandel, A novel question answering system for religious domain in Persian, Journal
of ELECTRONIC INDUSTRIES. 8 (2017) 73–88.
[3] E. Breck, J. Burger, L. Ferro, D. Hous, M. Light and I. Mani, Another Sys Called Qanda, In Proceddings of the
Ninth Text REtrieval conference, NIST Special Publication 500-246, Maryland,. (2000) 369-379.
[4] K. Choi, R.M. Pacana, A.L. Tan, J. Yiu and N.R. Lim, A Question Answering System that Performs Evaluations
and Comparisons on Structured Data for Business Intelligence in Biotechnology, Journal of Uncertainty Reasoning
and Knowledge Engineering. 1 (2011) 137–140.
[5] J. Herrera, D. Parra and B. Poblete, Social QA in non-CQA platforms, Journal of Future Generation Computer
Systems. 105 (2020) 631–649.
[6] A. Hevner, S. March, J. Park and S. Ram, Design Science in Information Systems Research, MIS Quarterly 28,
(2004) 75–105.
[7] A. Hevner and S. Chatterjee, Design Science Research in Information Systems, Springer Science and Business
Media. (2010) 195-208.
[8] H. Hosseini, Question Processing for Open Domain Persian Question Answering Systems, M.Sc. Thesis ,Sharif
University of Technology, Department of Languages and Linguistics. (2016).
[9] C.S.G. Khoo, S. Chan and Y. Niu, Extraction Causal Knowledge from a Medical Database Using Graphical
Patterns, In Proceedings of 38th Annual Meeting of the ACL, HongKong. (2000) 336-343.
[10] A. Mollaei, S. Rahati-Quchani and A. Estaji. Question classification in Persian language based on conditional
random fields, International conference on computer and knowledge engineering. (2012) 295–300.
[11] D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lascatusu, A. Novischi, A. Badulescu and O. Bolohan,
LCC Tools for Question Answering, In Proceedings of the Eleventh Text REtrieval Conference, NIST Special
Publication 500- 251, Maryland. (2002) 386–395.[12] K. Peffers, T. Tuunanen, A. Marcus, Rothenberger and S. Chatterjee, A Design Science Research Methodology
for Information Systems Research, Journal of Management Information Systems. 24 (2008) 45–77.
[13] M. Razzaghnoori, H. Sajedi and I. Khani Jazani, Question classification in Persian using word vectors and
frequencies, Journal of Cognitive Systems Research. 47 (2018) 16–27.
[14] M. Sarrouti, S. Ouatik and E.L. Alaoui, SemBioNLQA: A semantic biomedical question answering system for
retrieving exact and ideal answers to natural language questions, Artificial Intelligence In Medicine. 102 (2020)
631–649.
[15] E. Sherkat and M. Farhoodi, A hybrid approach for question classification in Persian automatic question answering
systems, International eConference on computer and knowledge engineering. (2014) 279–284.
[16] H. Veisi and H. Fakour Shandi, A Persian Medical Question Answering System, International Journal on Artificial
Intelligence Tools. 29 (2020) 2050019.
[17] S. Verberne, Paragraph Retrieval for Why-question Answering, In Proceedings of the 30th Annual International
ACM SIGR Conference on Research and Development in Information Retrieval, New York. (2007) 922–927.
[18] Y.F. Wang and S. Petrina, Using Learning Analytics to Understand the Design of an Intelligent Language TutorChatbot Lucy, Journal of Advanced Computer Science and Applications. 4 (2013), 124–131.
[19] Z. Yang, Y. Li, J. Cai and E. Nyberg, QUADS: Question Answering for Decision Support, In proceedings of
SIGR 2014: the Thirty-seventh Annual Internations ACM SIGIR Conference on Research and Development in
Information Retrieval, USA. (2014) 375–384.
[20] N. Zulkarnaina and F. Mezianea, Ultrasound reports standardisation using rhetorical structure theory and domain
ontology, Journal of Biomedical Informatics. 1 (2019) 100003.
Volume 12, Special Issue
December 2021
Pages 1449-1468
  • Receive Date: 02 July 2021
  • Revise Date: 12 September 2021
  • Accept Date: 30 October 2021