skip to main content
10.1145/3303772.3303795acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses

Authors Info & Claims
Published:04 March 2019Publication History

ABSTRACT

The effectiveness of learning in massive open online courses (MOOCs) can be significantly enhanced by introducing personalized intervention schemes which rely on building predictive models of student learning behaviors such as some engagement or performance indicators. A major challenge that has to be addressed when building such models is to design handcrafted features that are effective for the prediction task at hand. In this paper, we make the first attempt to solve the feature learning problem by taking the unsupervised learning approach to learn a compact representation of the raw features with a large degree of redundancy. Specifically, in order to capture the underlying learning patterns in the content domain and the temporal nature of the clickstream data, we train a modified auto-encoder (AE) combined with the long short-term memory (LSTM) network to obtain a fixed-length embedding for each input sequence. When compared with the original features, the new features that correspond to the embedding obtained by the modified LSTM-AE are not only more parsimonious but also more discriminative for our prediction task. Using simple supervised learning models, the learned features can improve the prediction accuracy by up to 17% compared with the supervised neural networks and reduce overfitting to the dominant low-performing group of students, specifically in the task of predicting students' performance. Our approach is generic in the sense that it is not restricted to a specific supervised learning model nor a specific prediction task for MOOC learning analytics.

References

  1. Yoshua Bengio. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures. Springer Berlin Heidelberg, Berlin, Heidelberg, 437--478.Google ScholarGoogle Scholar
  2. Nigel Bosch. 2017. Unsupervised Deep Autoencoders for Feature Extraction with Educational Data. In Proceedings of the EDM 2017 Workshops and Tutorials co-located with the 10th International Conference on Educational Data Mining. EDM, Urbana, IL, USA.Google ScholarGoogle Scholar
  3. Sebastien Boyer and Kalyan Veeramachaneni. 2015. Transfer Learning for Predictive Models in Massive Open Online Courses. In Artificial Intelligence in Education. Springer International Publishing, Massachusetts Institute of Technology, 54--63.Google ScholarGoogle Scholar
  4. Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim. 2015. Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks. In AIED Workshops. AIED, Seoul, South Korea.Google ScholarGoogle Scholar
  5. T. Daradoumis, R. Bassi, F. Xhafa, and S. Caballé. 2013. A Review on Massive E-Learning (MOOC) Design, Delivery and Assessment. In 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC, Mytilini, Greece, 208--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Fei and D. Y. Yeung. 2015. Temporal Models for Predicting Student Dropout in Massive Open Online Courses. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). ICDMW, Hong Kong, China, 256--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sherif Halawa, Daniel Greene, and John Mitchell. 2014. Dropout prediction in MOOCs using learner activity features. Proceedings of the Second European MOOC Stakeholder Summit 37, 1 (2014), 58--65.Google ScholarGoogle Scholar
  8. Jiazhen He, James Bailey, Benjamin IP Rubinstein, and Rui Zhang. 2015. Identifying At-Risk Students in Massive Open Online Courses. In AAAI. AAAI, Melbourne, Australia, 1749--1755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Geoffrey E Hinton and Sam T Roweis. 2003. Stochastic neighbor embedding. In Advances in neural information processing systems. NIPS, Toronto, Canada, 857--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. T. Jolliffe. 1986. Principal Component Analysis and Factor Analysis. Springer, New York, NY, 115--128.Google ScholarGoogle Scholar
  11. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  12. Severin Klingler, Rafael Wampfler, Tanja Käser, Barbara Solenthaler, and Markus Gross. 2017. Efficient Feature Embeddings for Student Classification with Variational Autoencoders. In Proceedings of the 10th International Conference on Educational Data Mining. EDM, ETH Zurich, Switzerland, 72--79.Google ScholarGoogle Scholar
  13. Marius Kloft, Felix Stiehler, Zhilin Zheng, and Niels Pinkwart. 2014. Predicting MOOC dropout over weeks using machine learning methods. In Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs. EMNLP, Berlin, Germany, 60--65.Google ScholarGoogle ScholarCross RefCross Ref
  14. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  15. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 843--852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., Mountain View, CA, USA, 3104--3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.Google ScholarGoogle Scholar
  19. Jacob Whitehill, Kiran Mohan, Daniel Seaton, Yigal Rosen, and Dustin Tingley. 2017. MOOC Dropout Prediction: How to Measure Accuracy?. In Proceedings of the Fourth (2017) ACM Conference on Learning@Scale. ACM, L@S, Worcester, MA, USA, 161--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jacob Whitehill, Joseph Jay Williams, Glenn Lopez, Cody Austun Coleman, and Justin Reich. 2015. Beyond prediction: First steps toward automatic intervention in MOOC student stopout. In Proceedings of the 8th International Conference on Educational Data Mining. EDM, Worcester, MA, USA.Google ScholarGoogle ScholarCross RefCross Ref
  21. Cheng Ye and Gautam Biswas. 2014. Early prediction of student dropout and performance in MOOCs using higher granularity temporal information. Journal of Learning Analytics 1, 3 (2014), 169--172.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Effective Feature Learning with Unsupervised Learning for Improving the Predictive Models in Massive Open Online Courses

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            LAK19: Proceedings of the 9th International Conference on Learning Analytics & Knowledge
            March 2019
            565 pages
            ISBN:9781450362566
            DOI:10.1145/3303772

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 March 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate236of782submissions,30%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader