Leaf: Multiple-Choice Question Generation

Vachev, Kristiyan; Hardalov, Momchil; Karadzhov, Georgi; Georgiev, Georgi; Koychev, Ivan; Nakov, Preslav

doi:10.1007/978-3-030-99739-7_41

Kristiyan Vachev¹⁵,
Momchil Hardalov¹⁵,
Georgi Karadzhov¹⁶,
Georgi Georgiev¹⁷,
Ivan Koychev¹⁵ &
…
Preslav Nakov¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2642 Accesses
8 Citations

Abstract

Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on GitHub (https://github.com/KristiyanVachev/Leaf-Question-Generation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jacopo Amidei, Paul Piwek, and Alistair Willis. Evaluation methodologies in automatic question generation 2013–2018. In Proceedings of the 11th International Conference on Natural Language Generation, INLG ’20, pages 307–317, Tilburg University, The Netherlands, 2018. Association for Computational Linguistics
Google Scholar
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. UniLMv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 642–652. PMLR, 2020
Google Scholar
Ho-Lam Chung, Ying-Hong Chan, and Yao-Chung Fan. A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4390–4400. Association for Computational Linguistics, 2020
Google Scholar
Clark, J.H., Choi, E., Collins, M., Garrette, D., Kwiatkowski, T., Nikolaev, V., Palomaki, J.: TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics 8, 454–470 (2020)
Article Google Scholar
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge. arXiv:1803.05457, 2018
Peter Clark, Oren Etzioni, Tushar Khot, Daniel Khashabi, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, and Michael Schmitz. From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An overview of the Aristo project. AI Mag., 41(4):39–53, 2020
Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’19, pages 4171–4186, Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics
Google Scholar
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS ’19, pages 13042–13054, Vancouver, British Columbia, Canada, 2019
Google Scholar
Xinya Du, Junru Shao, and Claire Cardie. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL ’17, pages 1342–1352, Vancouver, Canada, 2017. Association for Computational Linguistics
Google Scholar
Nan Duan, Duyu Tang, Peng Chen, and Ming Zhou. Question generation for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 866–874, Copenhagen, Denmark, 2017. Association for Computational Linguistics
Google Scholar
Yifan Gao, Lidong Bing, Piji Li, Irwin King, and Michael R. Lyu. Generating distractors for reading comprehension questions from real examinations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 6423–6430, 2019
Google Scholar
Momchil Hardalov, Ivan Koychev, and Preslav Nakov. Beyond English-only reading comprehension: Experiments in zero-shot multilingual transfer for Bulgarian. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 19, pages 447–459, Varna, Bulgaria, 2019. INCOMA Ltd
Google Scholar
Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. EXAMS: A multi-subject high school examinations dataset for cross-lingual and multilingual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP ’20, pages 5427–5444. Association for Computational Linguistics, 2020
Google Scholar
Ayako Hoshino and Hiroshi Nakagawa. WebExperimenter for multiple-choice question generation. In Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, HLT/EMNLP ’05, pages 18–19, Vancouver, British Columbia, Canada, 2005. Association for Computational Linguistics
Google Scholar
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 4411–4421. PMLR, 2020
Google Scholar
Yimin Jing, Deyi Xiong, and Zhen Yan. BiPaR: A bilingual parallel dataset for multilingual and cross-lingual reading comprehension on novels. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP ’19, pages 2452–2462, Hong Kong, China, 2019. Association for Computational Linguistics
Google Scholar
Kalpesh Krishna and Mohit Iyyer. Generating question-answer hierarchies. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL ’19, pages 2321–2334, Florence, Italy, 2019. Association for Computational Linguistics
Google Scholar
Ghader Kurdi, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 2019
Google Scholar
Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 785–794, Copenhagen, Denmark, 2017. Association for Computational Linguistics
Google Scholar
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedigs of the 8th International Conference on Learning Representations, ICLR ’20, Addis Ababa, Ethiopia, 2020. OpenReview.net
Google Scholar
Alon Lavie and Abhaya Agarwal. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT ’07, pages 228–231, Prague, Czech Republic, 2007. Association for Computational Linguistics
Google Scholar
John Lee, Baikun Liang, and Haley Fong. Restatement and question generation for counsellor chatbot. In Proceedings of the 1st Workshop on NLP for Positive Impact, pages 1–7. Association for Computational Linguistics, 2021
Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7871–7880. Association for Computational Linguistics, 2020
Google Scholar
Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7315–7330. Association for Computational Linguistics, 2020
Google Scholar
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedigs of the Workshop on Text Summarization Branches Out, pages 74–81, Barcelona, Spain, 2004. Association for Computational Linguistics
Google Scholar
Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, et al. Few-shot learning with multilingual language models. arXiv:2112.10668, 2021
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP ’18, pages 2381–2391, Brussels, Belgium, 2018. Association for Computational Linguistics
Google Scholar
Ruslan Mitkov and Le An Ha. Computer-aided generation of multiple-choice tests. In Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, BEA ’03, pages 17–22, Edmonton, Alberta, Canada, 2003
Google Scholar
Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, and Ngan Luu-Thuy Nguyen. Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension. IEEE Access, 8:201404–201417, 2020
Google Scholar
Jeroen Offerijns, Suzan Verberne, and Tessa Verhoef. Better distractions: Transformer-based distractor generation and multiple choice question filtering. arXiv:2010.09598, 2020
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL ’02, pages 311–318, Philadelphia, Pennsylvania, USA, 2002. Association for Computational Linguistics
Google Scholar
Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2401–2410. Association for Computational Linguistics, 2020
Google Scholar
Questgen. Questgen: AI powered question generator. http://questgen.ai/. Accessed: 2022–01-05
Quillionz. Quillionz - world’s first AI-powered question generator. https://www.quillionz.com/. Accessed: 2022–01-05
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21(140), 1–67 (2020)
MathSciNet MATH Google Scholar
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP ’16, pages 2383–2392, Austin, Texas, USA, 2016. Association for Computational Linguistics
Google Scholar
Henry L. Roediger III, Adam L. Putnam, and Megan A. Smith. Chapter one - ten benefits of testing and their applications to educational practice. In Psychology of Learning and Motivation, volume 55, pages 1–36. Academic Press, 2011
Google Scholar
Melissa Roemmele, Deep Sidhpura, Steve DeNeefe, and Ling Tsou. AnswerQuest: A system for generating question-answer items from multi-paragraph documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, EACL ’21, pages 40–52, Online, 2021. Association for Computational Linguistics
Google Scholar
Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang, and Daniel Gildea. Leveraging context information for natural question generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’18, pages 569–574, New Orleans, Louisiana, USA, 2018. Association for Computational Linguistics
Google Scholar
Susanti, Y., Tokunaga, T., Nishikawa, H., Obari, H.: Evaluation of automatically generated english vocabulary questions. Research and practice in technology enhanced learning 12(1), 1–21 (2017)
Article Google Scholar
Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, and Ashish Sabharwal. Quarel: A dataset and models for answering questions about qualitative relationships. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 7063–7071, 2019
Google Scholar
Andrew Trask, Phil Michalak, and John Liu. sense2vec - a fast and accurate method for word sense disambiguation in neural word embeddings. arXiv:1511.06388, 2015
Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, RepL4NLP ’17, pages 191–200, Vancouver, Canada, 2017. Association for Computational Linguistics
Google Scholar
Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. Generating answer candidates for quizzes and answer-aware question generators. In Proceedings of the Student Research Workshop Associated with RANLP 2021, RANLP ’21, pages 203–209. INCOMA Ltd., 2021
Google Scholar
Dongling Xiao, Han Zhang, Yu-Kun Li, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI ’20, pages 3997–4003. ijcai.org, 2020
Google Scholar
Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. Neural question generation from text: A preliminary study. In Natural Language Processing and Chinese Computing, pages 662–671, Cham, 2018. Springer International Publishing
Google Scholar

Download references

Acknowledgements

This research is partially supported by Project UNITe BG05M2OP001-1.001-0004 funded by the Bulgarian OP “Science and Education for Smart Growth.”

Author information

Authors and Affiliations

Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, Sofia, Bulgaria
Kristiyan Vachev, Momchil Hardalov & Ivan Koychev
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Georgi Karadzhov
Releva.ai, Sofia, Bulgaria
Georgi Georgiev
Qatar Computing Research Institute, HBKU, Doha, Qatar
Preslav Nakov

Authors

Kristiyan Vachev
View author publications
You can also search for this author in PubMed Google Scholar
Momchil Hardalov
View author publications
You can also search for this author in PubMed Google Scholar
Georgi Karadzhov
View author publications
You can also search for this author in PubMed Google Scholar
Georgi Georgiev
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Koychev
View author publications
You can also search for this author in PubMed Google Scholar
Preslav Nakov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kristiyan Vachev .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., Nakov, P. (2022). Leaf: Multiple-Choice Question Generation. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_41
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics