ABSTRACT
Learning informative representations for educational questions is a fundamental problem in online learning systems, which can promote many applications, e.g., difficulty estimation. Most solutions integrate all information of one question together following a supervised manner, where the representation results are unsatisfactory sometimes due to the following issues. First, they cannot ensure the presentation ability due to the scarcity of labeled data. Then, the label-dependent representation results have poor feasibility to be transferred. Moreover, aggregating all information into the unified may introduce some noises in applications since it cannot distinguish the diverse characteristics of questions. In this paper, we aim to learn the disentangled representations of questions. We propose a novel unsupervised model, namely DisenQNet, to divide one question into two parts, i.e., a concept representation that captures its explicit concept meaning and an individual representation that preserves its personal characteristics. We achieve this goal via mutual information estimation by proposing three self-supervised estimators in a large unlabeled question corpus. Then, we propose another enhanced model, DisenQNet+, that transfers the representation knowledge from unlabeled questions to labeled questions in specific applications by maximizing the mutual information between both. Extensive experiments on real-world datasets demonstrate that DisenQNet can generate effective and meaningful disentangled representations for questions, and furthermore, DisenQNet+ can improve the performance of different applications.
Supplemental Material
- Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2014. Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide web. 687--698.Google ScholarDigital Library
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214--223.Google Scholar
- Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. 2018. Mutual information neural estimation. In International Conference on Machine Learning. PMLR, 531--540.Google Scholar
- Luca Benedetto, Andrea Cappelli, Roberto Turrin, and Paolo Cremonesi. 2020. Introducing a framework to assess newly created questions with Natural Language Processing. In International Conference on Artificial Intelligence in Education. Springer, 43--54.Google ScholarDigital Library
- Markus Broer. 2005. Ensuring the fairness of GRE writing prompts: Assessing differential difficulty. ETS Research Report Series, Vol. 2005, 1 (2005), i--41.Google Scholar
- Youngduck Choi, Youngnam Lee, Junghyun Cho, Jineon Baek, Dongmin Shin, Seewoo Lee, Youngmin Cha, Byungsoo Kim, and Jaewe Heo. 2020. Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems. arXiv preprint arXiv:2002.05505 (2020).Google Scholar
- Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. 2019. Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101 (2019).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Huizhong Duan, Yunbo Cao, Chin-Yew Lin, and Yong Yu. 2008. Searching questions by identifying question topic and question focus. In Proceedings of Acl-08: HLT. 156--164.Google Scholar
- Kaiming He, Ross Girshick, and Piotr Dollár. 2019. Rethinking imagenet pre-training. In IEEE/CVF ICCV. 4918--4927.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In IProceedings of the IEEE international conference on computer vision. 1026--1034.Google ScholarDigital Library
- R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations.Google Scholar
- Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018).Google Scholar
- Zhenya Huang, Qi Liu, Enhong Chen, Hongke Zhao, Mingyong Gao, Si Wei, Yu Su, and Guoping Hu. 2017. Question Difficulty Prediction for READING Problems in Standard Tests. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.Google ScholarCross Ref
- Zhenya Huang, Qi Liu, Weibo Gao, Jinze Wu, Yu Yin, Hao Wang, and Enhong Chen. 2020. Neural mathematical solver with enhanced formula structure. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1729--1732.Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746--1751.Google Scholar
- Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Physical review E, Vol. 69, 6 (2004), 066138.Google Scholar
- Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu Su, and Guoping Hu. 2018. Finding similar exercises in online education systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1821--1830.Google ScholarDigital Library
- Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, and Guoping Hu. 2019. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 1 (2019), 100--115.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546 (2013).Google Scholar
- Radek Pelánek. 2019. Measuring similarity of educational items: An overview. IEEE Transactions on Learning Technologies, Vol. 13, 2 (2019), 354--366.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543.Google Scholar
- Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).Google Scholar
- Zhaopeng Qiu, Xian Wu, and Wei Fan. 2019. Question difficulty prediction for multiple choice problems in medical exams. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 139--148.Google ScholarDigital Library
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.Google Scholar
- Mark D Reckase. 2009. Multidimensional item response theory models. Multidimensional item response theory. Springer, 79--112.Google Scholar
- Jirí Rihák and Radek Pelánek. 2017. Measuring Similarity of Educational Items Using Data on Learners' Performance. International Educational Data Mining Society (2017).Google Scholar
- Eduardo Hugo Sanchez, Mathieu Serrurier, and Mathias Ortner. 2020. Learning disentangled representations via mutual information estimation. In European Conference on Computer Vision. Springer, 205--221.Google ScholarDigital Library
- Aravind Sankar, Yanhong Wu, Yuhang Wu, Wei Zhang, Hao Yang, and Hari Sundaram. 2020. GroupIM: A Mutual Information Maximization Framework for Neural Group Recommendation. In ACM SIGIR. 1279--1288.Google Scholar
- Norbert Ed Schwarz and Seymour Ed Sudman. 1996. Answering questions: Methodology for determining cognitive and communicative processes in survey research. Jossey-Bass/Wiley.Google Scholar
- Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2019. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000 (2019).Google Scholar
- Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax.. In ICLR (Poster).Google Scholar
- Hao Wang, Tong Xu, Qi Liu, Defu Lian, Enhong Chen, Dongfang Du, Han Wu, and Wen Su. 2019. MCNE: An end-to-end framework for learning multiple conditional network representations of social network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1064--1072.Google ScholarDigital Library
- Yan Wang, Xiaojiang Liu, and Shuming Shi. 2017. Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 845--854.Google ScholarCross Ref
- Victoria Yaneva, Peter Baldwin, Janet Mee, et al. 2020. Predicting item survival for multiple choice questions in a high-stakes medical exam. In Proceedings of The 12th Language Resources and Evaluation Conference. 6812--6818.Google Scholar
- Yu Yin, Qi Liu, Zhenya Huang, Enhong Chen, Wei Tong, Shijin Wang, and Yu Su. 2019. Quesnet: A unified representation for heterogeneous test questions. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1328--1336.Google ScholarDigital Library
- Jing Zhang and Xindong Wu. 2018. Multi-label inference for crowdsourcing. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2738--2747.Google ScholarDigital Library
Index Terms
- DisenQNet: Disentangled Representation Learning for Educational Questions
Recommendations
Knowledge-Guided Disentangled Representation Learning for Recommender Systems
In recommender systems, it is essential to understand the underlying factors that affect user-item interaction. Recently, several studies have utilized disentangled representation learning to discover such hidden factors from user-item interaction data, ...
Submodular Feature Selection for Partial Label Learning
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPartial label learning induces a multi-class classifier from training examples each associated with a candidate label set where the ground-truth label is concealed. Feature selection improves the generalization ability of learning system via selecting ...
Feature selection for multi-label learning with streaming label
Highlights- A novel framework based on inter-class discrimination and intra-class neighbor recognition is designed to generate label-specific features when each label ...
AbstractMulti-label feature selection has drawn wide attention in recent years. The existing multi-label feature selection algorithms mainly assume that the labels of the training data are obtained before feature selection takes place. However,...
Comments