Elsevier

Computers & Education

Volume 137, August 2019, Pages 91-103
Computers & Education

Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems

https://doi.org/10.1016/j.compedu.2019.04.009Get rights and content

Highlights

  • A hybrid learning system is built integrating machine learning and IRT.

  • Machine learning alleviates the cold start problem in IRT.

  • Proposed system leads to precise predictions of learner's ability and item responses.

  • It should be used by scientists who want fast/accurate predictions for new learners.

Abstract

Adaptive learning systems aim to provide learning items tailored to the behavior and needs of individual learners. However, one of the outstanding challenges in adaptive item selection is that often the corresponding systems do not have information on initial ability levels of new learners entering a learning environment. Thus, the proficiency of those new learners is very difficult to be predicted. This heavily impairs the quality of personalized items' recommendation during the initial phase of the learning environment. In order to handle this issue, known as the cold-start problem, we propose a system that combines item response theory (IRT) with machine learning. Specifically, we perform ability estimation and item response prediction for new learners by integrating IRT with classification and regression trees built on learners’ side information. The goal of this work is to build a learning system that incorporates IRT and machine learning into a unified framework. We compare the proposed hybrid model to alternative approaches by conducting experiments on two educational data sets. The obtained results affirmed the potential of the proposed method. In particular, the obtained results indicate that IRT combined with Random Forests provides the lowest error for the ability estimation and the highest accuracy in terms of response prediction. This way, we deduce that the employment of machine learning in combination with IRT could indeed alleviate the effect of the cold start problem in an adaptive learning environment.

Introduction

Over the last decade, online learning environments have received a rapidly growing attention. Technology-enhanced environments are deemed to have a greater potential than traditional classroom learning as they are capable of personalizing students' learning opportunities based on adaptive learning technologies (Albatayneh, Ghauth, & Chua, 2018; Truong, 2016; Ortigosa, Martín, & Carro, 2014; Kalyuga & Sweller, 2005; Shute & Towle, 2003; Brusilovsky, 1999). The goal of an adaptive learning system is to modify instructions using a set of predefined rules (Burgos, Tattersall, & Koper, 2007; Marcos-García, Martínez-Monés, & Dimitriadis, 2015) and to provide learning materials (or items) tailored to the behavior and needs of individual learners (Wauters, Desmet, & Van den Noortgate, 2010). An example is a system that selects items of an appropriate difficulty level. For such a system, it is crucial that the system builds up enough information about the learners' ability levels and predicts their responses to the learning items in a timely and accurate manner. However, one of the challenges of the process is that the system may have very limited (or no) information on initial ability levels of new learners when they enter a learning environment. In this case, it takes a long time until estimates of an acceptable accuracy are obtained of the learners’ learning proficiency. Therefore, the system is likely to fail to recommend tailored items during the initial phase of the learning environment. This issue is referred to as the cold start problem (Schein, Popescul, Ungar, & Pennock, 2002). Studies showed that the cold start problem often makes new learners to abandon the system due to inappropriate first recommendations, which are experienced as frustrating (Bobadilla, Ortega, Hernando, & Bernal, 2012; Mackness, Mak, & Williams, 2010). Also lack of motivation, anxiety and boredom may be associated with the failure of adaptive item selection (Wauters et al., 2010; Klinkenberg, Straatemeier, & van der Maas, 2011; Ostrow, 2015; Jagust, Boticki, & So, 2018). It is therefore crucial to further develop methodologies and models that tackle this cold start problem.

In fact, this problem has received a considerable amount of attention in another context, the context of recommender systems that seek to predict the rating that a user would give in e-commerce or online streaming websites to an item based on his or her interest (e.g., books, movies, songs). Many studies (e.g., Barjasteh, Forsati, Ross, Esfahanian, & Radha, 2016; Contratres, Alves-Souza, Filgueiras, & DeSouza, 2018; Fernández-Tobías et al., 2016; Forsati, Mahdavi, Shamsfard, & Sarwat, 2014; Lika, Kolomvatsos, & Hadjiefthymiades, 2014; Ling, Lyu, & King, 2014, pp. 105–112; Menon, Chitrapura, Garg, Agarwal, & Kota, 2011; Pereira & Hruschka, 2015; Tang and McCalla, 2004a, Tang and McCalla, 2004b) proposed data mining and machine learning techniques (specifically, collaborative filtering algorithms) to address the cold-start problem using the side information about existing users (i.e., users' attributes) to make recommendations for new users with similar profiles. However, most of their approaches focus heavily on the prediction of the new user's rates on a given set of items, lacking the psychometric component i.e., assessment of the users' latent traits. In adaptive learning systems, however, getting insight in the latent ability level of persons is of crucial importance because of its role in evaluating how effectively the learning process is working and how the learner performed on those learning programs. Only a very limited number of studies (Tang and McCalla, 2004a, Tang and McCalla, 2004b, August; Sun, Cui, Xu, Shen, & Chen, 2018) have paid attention to the cold-start problem in the context of online-learning. Therefore, this study aims to answer the research question: how can the effect of the new learner's side information (e.g., age, relevant courses taken, IQ, pre-test scores) be exploited in order to estimate the learner's initial ability and the corresponding performance on items with a variety of difficulty levels?

With respect to the ability assessment, the use of item response theory (IRT; Van der Linden & Hambleton, 1997) is considered as one of the most recognized psychometric methods. The basic IRT model, the Rasch model (Rasch, 1960), is based on the idea that the probability of correctly solving an item is a logistic function of the difference between a person parameter and an item parameter, that are often interpreted as the person's ability parameter and the item's difficulty parameter. Fitting the model to responses of learners on a set of items allows to estimate the learners' ability levels and the item difficulties, which can be used afterwards to provide learners with the most informative item. The larger (smaller) the person's ability is compared to the item difficulty, the larger (smaller) the probability on a correct response. IRT models have a strong tradition in testing situations, because of several advantages (Hambleton & Jones, 1993; Van der Linden & Hambleton, 1997), which will be discussed below. Once a set of calibrated items in the bank is available (a measurement scale of items is constructed), new learners can be placed on the scale by assessing how successful the person responds to some of these calibrated items. IRT is often used in computerized adaptive testing (CAT; Van der Linden, 2009), in which after each response the ability estimate of a test taker is updated and an item is given with a difficulty that matches closely the ability estimate. In this way, shorter tests are sufficient for obtaining an accurate view on the learner's ability. The idea of selecting items whose difficulty matches the ability of the learner is also applicable to the online learning environments. Yet, while the goal of the adaptive testing is to gain efficiency in assessing test takers' ability level and examine their relative standing in the population, the goal of adaptive learning is to enhance the learning progress by providing more personalized learning items (Zhang & Chang, 2016).

Not only IRT, also machine learning techniques can be valuable for adaptive item selection. Response predictions for new learners can be made by addressing the ability estimation as a regression task based on machine learning. The system can predict the new learner's responses by using first a machine learning model to estimate the ability parameter of this new learner and then use this estimated ability parameter to predict the responses with IRT. Alternatively, the response prediction can be addressed as a multi-target prediction task (Kocev, Vens, Struyf, & Džeroski, 2013). In this case, the system can, for example, employ a decision tree-based learning model in a multi-output setup (Kocev et al., 2013). The machine learning model is learned on a training set of learners containing their descriptive features (background information) and their responses.

This study proposes a hybrid approach by combining the strength of IRT models with machine learning. Specifically, the approach integrates the Rasch model with the use of classification and regression trees (Breiman et al., 1984) trained on side information (i.e., learner attributes/features). In this way, the model potentially can surpass the cold start problem and make reasonable predictions for new learners. We suggest the use of decision tree-based learning methods (i.e., single decision trees or ensembles, such as Random Forests; Breiman, 2001) among various machine learning methods because of their interpretability and visualization properties. In addition, when they are extended to ensembles, their predictive performance is greatly improved (Fernández-Delgado, Cernadas, & Amorim, 2014). In our case, this means that we can get more accurate predictions of a student's latent ability and responses. In order to validate the effectiveness of the proposed hybrid approach, we conducted experiments using educational data sets including background information on learners and items.

The novelty of our approach lies in its capability of incorporating learner features (a) to estimate the new learner's initial ability level when getting engaged in an e-learning environment; and (b) to predict the corresponding responses to a given item bank. In particular, when the estimation of the ability of the learners is concerned, the hybrid system employs:

  • 1.

    IRT to estimate the ability of the learners for which we already have data on their responses to items.

  • 2.

    A regression tree-based method trained using the features that characterize the learners and the IRT generated abilities.

In summary, the overarching goal is to develop a method that integrates decision tree-based techniques and IRT for predicting the response pattern and estimating the latent ability of new learners. As the current study aims to address the cold start problem in adaptive learning environments, our focus is to investigate the performance of the hybrid method against one of the most common approaches used in computerized adaptive systems i.e., assuming at the start of the algorithm that the new learner has an average ability (e.g., Van der Linden & Veldkamp, 2004). When it comes to adaptive learning systems, usually there is an initial phase where new learners are given some items and their ability is estimated based on the responses to those first items. However, in case these recommendations are incompatible (e.g., too easy, too difficult) then the learner gets frustrated or even abandons the effort (Mackness et al., 2010). We investigate the possibility to build a system that improves the performance of the initial phase of an adaptive learning system, without making use of prior ability tests.

The rest of the paper is organized as follows. In the next section, we start by presenting previous studies that are relevant to this work. In section 2, we describe general frameworks of IRT and decision tree-based methods. We then propose a hybrid approach combining the two methods in order to address the cold-start problem. Next, in section 3, we introduce two educational data sets (one for an educational testing and the other for an online learning environment) for the evaluation of our approach and present the experimental strategy. The results are demonstrated in section 4. We discuss our findings and provide concluding remarks in section 5.

Although several recent studies proposed methods to tackle the cold-start problem in recommender systems, their applications in an educational domain (i.e., the adaptive learning systems) are an underexplored topic.

Like in a typical educational testing environment, the majority of the online learning platforms currently do not use any prior information on new learners for personalized learning (Thai-Nghe, Horvath, & Schmidt-Thieme, 2011). After an item bank is calibrated by IRT modelling, which means that item difficulties have become available, the naive method renders initial items without knowing anything about the new learners. The item can be selected randomly or by using the average ability estimate of the pre-existing learners as the starting value for the new learners. Thus, the prediction performance for new learners is not very efficient in the sense that it may take longer for the learning system to estimate the learner's ability level with sufficient accuracy.

A number of studies tackled the cold-start problem in relation to recommender systems in general. The recommender systems refer to information filtering and decision support systems that seek to predict the ratings or preferences a user would give to an item (e.g, movies, music, books, and products) in e-commerce or online streaming sites. The systems typically utilize a variety of collaborative filtering (CF) algorithms that generate automatic predictions about the user by collecting information from other users who shared similar ratings or preferences. To handle the cold-start problem, several studies used data mining techniques that incorporate user features (age, gender, and social contact) in the CF. Said and Bellogín (2014), Guo, Zhang, and Yorke-Smith (2013), and Vozalis and Margaritis (2004) proposed a modified version of k-nearest neighbors (k-NN) by adding a user demographic vector to the user profile and embedding it in the CF. Similarly, Son, Minh, Cuong, and Canh (2013) proposed using a fuzzy clustering method that incorporates the demographic features in the filtering system. Fernández-Tobías et al. (2016) proposed adding the users’ personality information to a matrix factorization (MF) model that incorporates user features to improve the recommendation where there are no ratings for the new users. Contratres et al. (2018) showed that the user cold-start issue can be alleviated by using sentiment analysis based on support vector machine (SVM) (Burges, 1998) in the recommender systems.

Likewise, despite the recent popularity and prolificacy of the cold-start problem in the recommender systems, not much research has been done in the domain of adaptive learning systems. However, there have been various studies applying machine learning in educational systems in general. The majority of them harness the prediction accuracy of machine learning to develop predictive models for students. These models are often trained over students' demographic characteristics or other kinds of student related attributes/features (e.g., school progress, number of books at home, dyslexia, dyscalculia, etc.), targeting at performing grade or drop-out predictions. More specifically, Kotsiantis (2012) built a decision support system to predict students' performance. The system was trained on students' demographic features and marks in written assignments addressing student grade prediction as a regression problem. Kai, Almeda, Baker, Heffernan, and Heffernan (2018) used a decision tree to classify students into two groups− productive persistence or wheel-spinning. Rovira, Puertas, and Igual (2017) employed machine learning for students' grades and dropout intention prediction. The authors also proposed a personalized course recommendation model. Course preferences as well as course completeness ratios were studied using decision tree learning in (Hsia, Shie, & Chen, 2008). Lykourentzou, Giannoukos, Nikolopoulos, Mpardis, and Loumos (2009) proposed a dropout prediction method for e-learning courses using a combination of machine learning techniques. Vie, Popineau, Bruillard, and Bourda (2018) proposed a determinantal point process to select adaptive items for new learners, using ability and difficulty estimates calibrated by a cognitive diagnosis model. Park, Joo, Cornillie, and Van der Maas (2018) proposed a psychometric method to reduce the new learner cold-start problem, zooming in on the adaptive learning systems. Based on an explanatory IRT model trained by learner-item interaction data and learner features (e.g., age, gender, learning disability), their method first provides initial ability estimates for the new learners based on his or her profiles, then allows to make recommendations for the most informative items. Based on the previous studies, it is clear that background information of learner could contribute significantly to obtain models that precisely predict the learner's performance. In addition, more learner's information can provide more precise and accurate prediction in the context of machine learning in educational assessment.

Section snippets

Methods

In the following, we propose a hybrid approach that combines Item Response Theory and Decision Tree-based learning. First, we describe in more detail both components.

Evaluation of the hybrid approach

To illustrate and compare the new hybrid system, we apply it on two real datasets, described below. For the implementation and experimental evaluation of our system we used the machine learning library Scikit-learn (Python) (Pedregosa et al., 2011). Scikit-learn contains all the machine learning algorithms used in this study as well as the relevant evaluation metrics that were employed. We also used the library NumPy for data handling purposes. When it comes to IRT, we used the IRT

Results

First, the performance of the system in predicting the ability parameter of the new learners (Tnew) is described. In Table 1, the obtained regression results are presented in terms of MSE. A comparison with the approach of imputing random values drawn from the estimated abilities from the existing users or the mean value, shows that the performance of using machine learning is relatively effective for predicting the abilities of new learners: for both datasets used in our study, the MSE results

Discussion and conclusion

In the current study, we presented an approach that combines psychometric modelling with machine learning techniques. We proposed that a hybrid model can be used as an alternative approach to address the cold start problem by predicting learner's ability in the initial stage of adaptive learning in online learning systems. More specifically, the proposed approach starts with estimating existing learner's abilities based on IRT analysis. Then, a tree-based method is used by regressing the

Acknowledgements

"This research includes a methodological approach from the LEarning analytics for AdaPtiveSupport (LEAPS) project, funded by imec (Kapeldreef 75, B-3001, Leuven, Belgium) and the Agentschap Innoveren & Ondernemen. The LEAPS project aimed to develop a self-learning analytical system to enable adaptive learning. This support system can be integrated into educational games and in software supporting professional communication and persons with dyslexia. The study also was partially carried out

References (74)

  • H. Truong

    Integrating learning styles and adaptive e-learning system: Current developments, problems and opportunities

    Computers in Human Behavior

    (2016)
  • K. Wauters et al.

    Item difficulty estimation: an auspicious collaboration between data and judgment

    Computers & Education

    (2012)
  • N. Albatayneh et al.

    Utilizing Learners' Negative Ratings in Semantic Content-based Recommender System for e-Learning Forum

    Journal of Educational Technology & Society

    (2018)
  • M.J. Allen et al.

    Introduction to measurement theory

    (2001)
  • I. Barjasteh et al.

    Coldstart recommendation with provable guarantees: A decoupled approach

    IEEE Transactions on Knowledge and Data Engineering

    (2016)
  • A. Birnbaum

    Some latent trait models and their use in inferring an examinee's ability

  • K. Boyd et al.

    Area under the precision-recall curve: Point estimates and confidence intervals

  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • P. Brusilovsky
  • C.J. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • D. Burgos et al.

    Representing adaptive and adaptable units of learning

  • F.G. Contratres et al.

    Sentiment analysis of social network data for cold-start relief in recommender systems

  • A.T. Corbett et al.

    Knowledge tracing: Modeling the acquisition of procedural knowledge

    User Modeling and User-Adapted Interaction

    (1995)
  • J. Davis et al.

    June). The relationship between Precision-Recall and ROC curves

  • P. De Boeck et al.

    IRTrees: Tree-based item response models of the GLMM family

    Journal of Statistical Software

    (2012)
  • F. Doshi-Velez et al.

    Towards a rigorous science of interpretable machine learning

    (2017)
  • I. Fernández-Tobías et al.

    Alleviating the new user problem in collaborative filtering by exploiting personality information

    User Modeling and User-Adapted Interaction

    (2016)
  • M. Fernández-Delgado et al.

    Do we need hundreds of classifiers to solve real world classification problems?

    The Journal of Machine Learning Research

    (2014)
  • R. Forsati et al.

    Matrix factorization with explicit trust and distrust side information for improved social recommendation

    ACM Transactions on Information Systems

    (2014)
  • A.C. Gottschall et al.

    A comparison of item-level and scale-level multiple imputation for questionnaire batteries

    Multivariate Behavioral Research

    (2012)
  • G. Guo et al.

    A novel bayesian similarity measure for recommender systems

  • R.K. Hambleton et al.

    Comparison of classical test theory and item response theory and their applications to test development

    Educational Measurement: Issues and Practice

    (1993)
  • M. Jeon et al.

    A generalized item response tree model for psychological assessments

    Behavior Research Methods

    (2016)
  • M.I. Jordan et al.

    Machine learning: Trends, perspectives, and prospects

    Science

    (2015)
  • D.T. Kadengye et al.

    Modeling growth in electronic learning environments using a longitudinal random item response model

    Journal of Experimental Education

    (2015)
  • S. Kai et al.

    Decision tree modeling of wheel- spinning and productive persistence in skill builders

    Journal of Educational Data Mining

    (2018)
  • Cited by (0)

    View full text