Skip to main content

Leaf: Multiple-Choice Question Generation

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

Abstract

Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on GitHub (https://github.com/KristiyanVachev/Leaf-Question-Generation).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jacopo Amidei, Paul Piwek, and Alistair Willis. Evaluation methodologies in automatic question generation 2013–2018. In Proceedings of the 11th International Conference on Natural Language Generation, INLG ’20, pages 307–317, Tilburg University, The Netherlands, 2018. Association for Computational Linguistics

    Google Scholar 

  2. Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. UniLMv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 642–652. PMLR, 2020

    Google Scholar 

  3. Ho-Lam Chung, Ying-Hong Chan, and Yao-Chung Fan. A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4390–4400. Association for Computational Linguistics, 2020

    Google Scholar 

  4. Clark, J.H., Choi, E., Collins, M., Garrette, D., Kwiatkowski, T., Nikolaev, V., Palomaki, J.: TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics 8, 454–470 (2020)

    Article  Google Scholar 

  5. Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge. arXiv:1803.05457, 2018

  6. Peter Clark, Oren Etzioni, Tushar Khot, Daniel Khashabi, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, and Michael Schmitz. From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An overview of the Aristo project. AI Mag., 41(4):39–53, 2020

    Google Scholar 

  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’19, pages 4171–4186, Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics

    Google Scholar 

  8. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS ’19, pages 13042–13054, Vancouver, British Columbia, Canada, 2019

    Google Scholar 

  9. Xinya Du, Junru Shao, and Claire Cardie. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL ’17, pages 1342–1352, Vancouver, Canada, 2017. Association for Computational Linguistics

    Google Scholar 

  10. Nan Duan, Duyu Tang, Peng Chen, and Ming Zhou. Question generation for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 866–874, Copenhagen, Denmark, 2017. Association for Computational Linguistics

    Google Scholar 

  11. Yifan Gao, Lidong Bing, Piji Li, Irwin King, and Michael R. Lyu. Generating distractors for reading comprehension questions from real examinations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 6423–6430, 2019

    Google Scholar 

  12. Momchil Hardalov, Ivan Koychev, and Preslav Nakov. Beyond English-only reading comprehension: Experiments in zero-shot multilingual transfer for Bulgarian. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 19, pages 447–459, Varna, Bulgaria, 2019. INCOMA Ltd

    Google Scholar 

  13. Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. EXAMS: A multi-subject high school examinations dataset for cross-lingual and multilingual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP ’20, pages 5427–5444. Association for Computational Linguistics, 2020

    Google Scholar 

  14. Ayako Hoshino and Hiroshi Nakagawa. WebExperimenter for multiple-choice question generation. In Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, HLT/EMNLP ’05, pages 18–19, Vancouver, British Columbia, Canada, 2005. Association for Computational Linguistics

    Google Scholar 

  15. Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML ’20, pages 4411–4421. PMLR, 2020

    Google Scholar 

  16. Yimin Jing, Deyi Xiong, and Zhen Yan. BiPaR: A bilingual parallel dataset for multilingual and cross-lingual reading comprehension on novels. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP ’19, pages 2452–2462, Hong Kong, China, 2019. Association for Computational Linguistics

    Google Scholar 

  17. Kalpesh Krishna and Mohit Iyyer. Generating question-answer hierarchies. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL ’19, pages 2321–2334, Florence, Italy, 2019. Association for Computational Linguistics

    Google Scholar 

  18. Ghader Kurdi, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 2019

    Google Scholar 

  19. Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP ’17, pages 785–794, Copenhagen, Denmark, 2017. Association for Computational Linguistics

    Google Scholar 

  20. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedigs of the 8th International Conference on Learning Representations, ICLR ’20, Addis Ababa, Ethiopia, 2020. OpenReview.net

    Google Scholar 

  21. Alon Lavie and Abhaya Agarwal. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT ’07, pages 228–231, Prague, Czech Republic, 2007. Association for Computational Linguistics

    Google Scholar 

  22. John Lee, Baikun Liang, and Haley Fong. Restatement and question generation for counsellor chatbot. In Proceedings of the 1st Workshop on NLP for Positive Impact, pages 1–7. Association for Computational Linguistics, 2021

    Google Scholar 

  23. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7871–7880. Association for Computational Linguistics, 2020

    Google Scholar 

  24. Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL ’20, pages 7315–7330. Association for Computational Linguistics, 2020

    Google Scholar 

  25. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Proceedigs of the Workshop on Text Summarization Branches Out, pages 74–81, Barcelona, Spain, 2004. Association for Computational Linguistics

    Google Scholar 

  26. Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, et al. Few-shot learning with multilingual language models. arXiv:2112.10668, 2021

  27. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019

  28. Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP ’18, pages 2381–2391, Brussels, Belgium, 2018. Association for Computational Linguistics

    Google Scholar 

  29. Ruslan Mitkov and Le An Ha. Computer-aided generation of multiple-choice tests. In Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing, BEA ’03, pages 17–22, Edmonton, Alberta, Canada, 2003

    Google Scholar 

  30. Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Nguyen, and Ngan Luu-Thuy Nguyen. Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension. IEEE Access, 8:201404–201417, 2020

    Google Scholar 

  31. Jeroen Offerijns, Suzan Verberne, and Tessa Verhoef. Better distractions: Transformer-based distractor generation and multiple choice question filtering. arXiv:2010.09598, 2020

  32. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL ’02, pages 311–318, Philadelphia, Pennsylvania, USA, 2002. Association for Computational Linguistics

    Google Scholar 

  33. Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2401–2410. Association for Computational Linguistics, 2020

    Google Scholar 

  34. Questgen. Questgen: AI powered question generator. http://questgen.ai/. Accessed: 2022–01-05

  35. Quillionz. Quillionz - world’s first AI-powered question generator. https://www.quillionz.com/. Accessed: 2022–01-05

  36. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21(140), 1–67 (2020)

    MathSciNet  MATH  Google Scholar 

  37. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP ’16, pages 2383–2392, Austin, Texas, USA, 2016. Association for Computational Linguistics

    Google Scholar 

  38. Henry L. Roediger III, Adam L. Putnam, and Megan A. Smith. Chapter one - ten benefits of testing and their applications to educational practice. In Psychology of Learning and Motivation, volume 55, pages 1–36. Academic Press, 2011

    Google Scholar 

  39. Melissa Roemmele, Deep Sidhpura, Steve DeNeefe, and Ling Tsou. AnswerQuest: A system for generating question-answer items from multi-paragraph documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, EACL ’21, pages 40–52, Online, 2021. Association for Computational Linguistics

    Google Scholar 

  40. Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang, and Daniel Gildea. Leveraging context information for natural question generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’18, pages 569–574, New Orleans, Louisiana, USA, 2018. Association for Computational Linguistics

    Google Scholar 

  41. Susanti, Y., Tokunaga, T., Nishikawa, H., Obari, H.: Evaluation of automatically generated english vocabulary questions. Research and practice in technology enhanced learning 12(1), 1–21 (2017)

    Article  Google Scholar 

  42. Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, and Ashish Sabharwal. Quarel: A dataset and models for answering questions about qualitative relationships. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33 of AAAI ’19, pages 7063–7071, 2019

    Google Scholar 

  43. Andrew Trask, Phil Michalak, and John Liu. sense2vec - a fast and accurate method for word sense disambiguation in neural word embeddings. arXiv:1511.06388, 2015

  44. Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, RepL4NLP ’17, pages 191–200, Vancouver, Canada, 2017. Association for Computational Linguistics

    Google Scholar 

  45. Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. Generating answer candidates for quizzes and answer-aware question generators. In Proceedings of the Student Research Workshop Associated with RANLP 2021, RANLP ’21, pages 203–209. INCOMA Ltd., 2021

    Google Scholar 

  46. Dongling Xiao, Han Zhang, Yu-Kun Li, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI ’20, pages 3997–4003. ijcai.org, 2020

    Google Scholar 

  47. Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. Neural question generation from text: A preliminary study. In Natural Language Processing and Chinese Computing, pages 662–671, Cham, 2018. Springer International Publishing

    Google Scholar 

Download references

Acknowledgements

This research is partially supported by Project UNITe BG05M2OP001-1.001-0004 funded by the Bulgarian OP “Science and Education for Smart Growth.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristiyan Vachev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., Nakov, P. (2022). Leaf: Multiple-Choice Question Generation. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics