Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems
Introduction
Over the last decade, online learning environments have received a rapidly growing attention. Technology-enhanced environments are deemed to have a greater potential than traditional classroom learning as they are capable of personalizing students' learning opportunities based on adaptive learning technologies (Albatayneh, Ghauth, & Chua, 2018; Truong, 2016; Ortigosa, Martín, & Carro, 2014; Kalyuga & Sweller, 2005; Shute & Towle, 2003; Brusilovsky, 1999). The goal of an adaptive learning system is to modify instructions using a set of predefined rules (Burgos, Tattersall, & Koper, 2007; Marcos-García, Martínez-Monés, & Dimitriadis, 2015) and to provide learning materials (or items) tailored to the behavior and needs of individual learners (Wauters, Desmet, & Van den Noortgate, 2010). An example is a system that selects items of an appropriate difficulty level. For such a system, it is crucial that the system builds up enough information about the learners' ability levels and predicts their responses to the learning items in a timely and accurate manner. However, one of the challenges of the process is that the system may have very limited (or no) information on initial ability levels of new learners when they enter a learning environment. In this case, it takes a long time until estimates of an acceptable accuracy are obtained of the learners’ learning proficiency. Therefore, the system is likely to fail to recommend tailored items during the initial phase of the learning environment. This issue is referred to as the cold start problem (Schein, Popescul, Ungar, & Pennock, 2002). Studies showed that the cold start problem often makes new learners to abandon the system due to inappropriate first recommendations, which are experienced as frustrating (Bobadilla, Ortega, Hernando, & Bernal, 2012; Mackness, Mak, & Williams, 2010). Also lack of motivation, anxiety and boredom may be associated with the failure of adaptive item selection (Wauters et al., 2010; Klinkenberg, Straatemeier, & van der Maas, 2011; Ostrow, 2015; Jagust, Boticki, & So, 2018). It is therefore crucial to further develop methodologies and models that tackle this cold start problem.
In fact, this problem has received a considerable amount of attention in another context, the context of recommender systems that seek to predict the rating that a user would give in e-commerce or online streaming websites to an item based on his or her interest (e.g., books, movies, songs). Many studies (e.g., Barjasteh, Forsati, Ross, Esfahanian, & Radha, 2016; Contratres, Alves-Souza, Filgueiras, & DeSouza, 2018; Fernández-Tobías et al., 2016; Forsati, Mahdavi, Shamsfard, & Sarwat, 2014; Lika, Kolomvatsos, & Hadjiefthymiades, 2014; Ling, Lyu, & King, 2014, pp. 105–112; Menon, Chitrapura, Garg, Agarwal, & Kota, 2011; Pereira & Hruschka, 2015; Tang and McCalla, 2004a, Tang and McCalla, 2004b) proposed data mining and machine learning techniques (specifically, collaborative filtering algorithms) to address the cold-start problem using the side information about existing users (i.e., users' attributes) to make recommendations for new users with similar profiles. However, most of their approaches focus heavily on the prediction of the new user's rates on a given set of items, lacking the psychometric component i.e., assessment of the users' latent traits. In adaptive learning systems, however, getting insight in the latent ability level of persons is of crucial importance because of its role in evaluating how effectively the learning process is working and how the learner performed on those learning programs. Only a very limited number of studies (Tang and McCalla, 2004a, Tang and McCalla, 2004b, August; Sun, Cui, Xu, Shen, & Chen, 2018) have paid attention to the cold-start problem in the context of online-learning. Therefore, this study aims to answer the research question: how can the effect of the new learner's side information (e.g., age, relevant courses taken, IQ, pre-test scores) be exploited in order to estimate the learner's initial ability and the corresponding performance on items with a variety of difficulty levels?
With respect to the ability assessment, the use of item response theory (IRT; Van der Linden & Hambleton, 1997) is considered as one of the most recognized psychometric methods. The basic IRT model, the Rasch model (Rasch, 1960), is based on the idea that the probability of correctly solving an item is a logistic function of the difference between a person parameter and an item parameter, that are often interpreted as the person's ability parameter and the item's difficulty parameter. Fitting the model to responses of learners on a set of items allows to estimate the learners' ability levels and the item difficulties, which can be used afterwards to provide learners with the most informative item. The larger (smaller) the person's ability is compared to the item difficulty, the larger (smaller) the probability on a correct response. IRT models have a strong tradition in testing situations, because of several advantages (Hambleton & Jones, 1993; Van der Linden & Hambleton, 1997), which will be discussed below. Once a set of calibrated items in the bank is available (a measurement scale of items is constructed), new learners can be placed on the scale by assessing how successful the person responds to some of these calibrated items. IRT is often used in computerized adaptive testing (CAT; Van der Linden, 2009), in which after each response the ability estimate of a test taker is updated and an item is given with a difficulty that matches closely the ability estimate. In this way, shorter tests are sufficient for obtaining an accurate view on the learner's ability. The idea of selecting items whose difficulty matches the ability of the learner is also applicable to the online learning environments. Yet, while the goal of the adaptive testing is to gain efficiency in assessing test takers' ability level and examine their relative standing in the population, the goal of adaptive learning is to enhance the learning progress by providing more personalized learning items (Zhang & Chang, 2016).
Not only IRT, also machine learning techniques can be valuable for adaptive item selection. Response predictions for new learners can be made by addressing the ability estimation as a regression task based on machine learning. The system can predict the new learner's responses by using first a machine learning model to estimate the ability parameter of this new learner and then use this estimated ability parameter to predict the responses with IRT. Alternatively, the response prediction can be addressed as a multi-target prediction task (Kocev, Vens, Struyf, & Džeroski, 2013). In this case, the system can, for example, employ a decision tree-based learning model in a multi-output setup (Kocev et al., 2013). The machine learning model is learned on a training set of learners containing their descriptive features (background information) and their responses.
This study proposes a hybrid approach by combining the strength of IRT models with machine learning. Specifically, the approach integrates the Rasch model with the use of classification and regression trees (Breiman et al., 1984) trained on side information (i.e., learner attributes/features). In this way, the model potentially can surpass the cold start problem and make reasonable predictions for new learners. We suggest the use of decision tree-based learning methods (i.e., single decision trees or ensembles, such as Random Forests; Breiman, 2001) among various machine learning methods because of their interpretability and visualization properties. In addition, when they are extended to ensembles, their predictive performance is greatly improved (Fernández-Delgado, Cernadas, & Amorim, 2014). In our case, this means that we can get more accurate predictions of a student's latent ability and responses. In order to validate the effectiveness of the proposed hybrid approach, we conducted experiments using educational data sets including background information on learners and items.
The novelty of our approach lies in its capability of incorporating learner features (a) to estimate the new learner's initial ability level when getting engaged in an e-learning environment; and (b) to predict the corresponding responses to a given item bank. In particular, when the estimation of the ability of the learners is concerned, the hybrid system employs:
- 1.
IRT to estimate the ability of the learners for which we already have data on their responses to items.
- 2.
A regression tree-based method trained using the features that characterize the learners and the IRT generated abilities.
In summary, the overarching goal is to develop a method that integrates decision tree-based techniques and IRT for predicting the response pattern and estimating the latent ability of new learners. As the current study aims to address the cold start problem in adaptive learning environments, our focus is to investigate the performance of the hybrid method against one of the most common approaches used in computerized adaptive systems i.e., assuming at the start of the algorithm that the new learner has an average ability (e.g., Van der Linden & Veldkamp, 2004). When it comes to adaptive learning systems, usually there is an initial phase where new learners are given some items and their ability is estimated based on the responses to those first items. However, in case these recommendations are incompatible (e.g., too easy, too difficult) then the learner gets frustrated or even abandons the effort (Mackness et al., 2010). We investigate the possibility to build a system that improves the performance of the initial phase of an adaptive learning system, without making use of prior ability tests.
The rest of the paper is organized as follows. In the next section, we start by presenting previous studies that are relevant to this work. In section 2, we describe general frameworks of IRT and decision tree-based methods. We then propose a hybrid approach combining the two methods in order to address the cold-start problem. Next, in section 3, we introduce two educational data sets (one for an educational testing and the other for an online learning environment) for the evaluation of our approach and present the experimental strategy. The results are demonstrated in section 4. We discuss our findings and provide concluding remarks in section 5.
Although several recent studies proposed methods to tackle the cold-start problem in recommender systems, their applications in an educational domain (i.e., the adaptive learning systems) are an underexplored topic.
Like in a typical educational testing environment, the majority of the online learning platforms currently do not use any prior information on new learners for personalized learning (Thai-Nghe, Horvath, & Schmidt-Thieme, 2011). After an item bank is calibrated by IRT modelling, which means that item difficulties have become available, the naive method renders initial items without knowing anything about the new learners. The item can be selected randomly or by using the average ability estimate of the pre-existing learners as the starting value for the new learners. Thus, the prediction performance for new learners is not very efficient in the sense that it may take longer for the learning system to estimate the learner's ability level with sufficient accuracy.
A number of studies tackled the cold-start problem in relation to recommender systems in general. The recommender systems refer to information filtering and decision support systems that seek to predict the ratings or preferences a user would give to an item (e.g, movies, music, books, and products) in e-commerce or online streaming sites. The systems typically utilize a variety of collaborative filtering (CF) algorithms that generate automatic predictions about the user by collecting information from other users who shared similar ratings or preferences. To handle the cold-start problem, several studies used data mining techniques that incorporate user features (age, gender, and social contact) in the CF. Said and Bellogín (2014), Guo, Zhang, and Yorke-Smith (2013), and Vozalis and Margaritis (2004) proposed a modified version of k-nearest neighbors (k-NN) by adding a user demographic vector to the user profile and embedding it in the CF. Similarly, Son, Minh, Cuong, and Canh (2013) proposed using a fuzzy clustering method that incorporates the demographic features in the filtering system. Fernández-Tobías et al. (2016) proposed adding the users’ personality information to a matrix factorization (MF) model that incorporates user features to improve the recommendation where there are no ratings for the new users. Contratres et al. (2018) showed that the user cold-start issue can be alleviated by using sentiment analysis based on support vector machine (SVM) (Burges, 1998) in the recommender systems.
Likewise, despite the recent popularity and prolificacy of the cold-start problem in the recommender systems, not much research has been done in the domain of adaptive learning systems. However, there have been various studies applying machine learning in educational systems in general. The majority of them harness the prediction accuracy of machine learning to develop predictive models for students. These models are often trained over students' demographic characteristics or other kinds of student related attributes/features (e.g., school progress, number of books at home, dyslexia, dyscalculia, etc.), targeting at performing grade or drop-out predictions. More specifically, Kotsiantis (2012) built a decision support system to predict students' performance. The system was trained on students' demographic features and marks in written assignments addressing student grade prediction as a regression problem. Kai, Almeda, Baker, Heffernan, and Heffernan (2018) used a decision tree to classify students into two groups− productive persistence or wheel-spinning. Rovira, Puertas, and Igual (2017) employed machine learning for students' grades and dropout intention prediction. The authors also proposed a personalized course recommendation model. Course preferences as well as course completeness ratios were studied using decision tree learning in (Hsia, Shie, & Chen, 2008). Lykourentzou, Giannoukos, Nikolopoulos, Mpardis, and Loumos (2009) proposed a dropout prediction method for e-learning courses using a combination of machine learning techniques. Vie, Popineau, Bruillard, and Bourda (2018) proposed a determinantal point process to select adaptive items for new learners, using ability and difficulty estimates calibrated by a cognitive diagnosis model. Park, Joo, Cornillie, and Van der Maas (2018) proposed a psychometric method to reduce the new learner cold-start problem, zooming in on the adaptive learning systems. Based on an explanatory IRT model trained by learner-item interaction data and learner features (e.g., age, gender, learning disability), their method first provides initial ability estimates for the new learners based on his or her profiles, then allows to make recommendations for the most informative items. Based on the previous studies, it is clear that background information of learner could contribute significantly to obtain models that precisely predict the learner's performance. In addition, more learner's information can provide more precise and accurate prediction in the context of machine learning in educational assessment.
Section snippets
Methods
In the following, we propose a hybrid approach that combines Item Response Theory and Decision Tree-based learning. First, we describe in more detail both components.
Evaluation of the hybrid approach
To illustrate and compare the new hybrid system, we apply it on two real datasets, described below. For the implementation and experimental evaluation of our system we used the machine learning library Scikit-learn (Python) (Pedregosa et al., 2011). Scikit-learn contains all the machine learning algorithms used in this study as well as the relevant evaluation metrics that were employed. We also used the library NumPy for data handling purposes. When it comes to IRT, we used the IRT
Results
First, the performance of the system in predicting the ability parameter of the new learners (Tnew) is described. In Table 1, the obtained regression results are presented in terms of MSE. A comparison with the approach of imputing random values drawn from the estimated abilities from the existing users or the mean value, shows that the performance of using machine learning is relatively effective for predicting the abilities of new learners: for both datasets used in our study, the MSE results
Discussion and conclusion
In the current study, we presented an approach that combines psychometric modelling with machine learning techniques. We proposed that a hybrid model can be used as an alternative approach to address the cold start problem by predicting learner's ability in the initial stage of adaptive learning in online learning systems. More specifically, the proposed approach starts with estimating existing learner's abilities based on IRT analysis. Then, a tree-based method is used by regressing the
Acknowledgements
"This research includes a methodological approach from the LEarning analytics for AdaPtiveSupport (LEAPS) project, funded by imec (Kapeldreef 75, B-3001, Leuven, Belgium) and the Agentschap Innoveren & Ondernemen. The LEAPS project aimed to develop a self-learning analytical system to enable adaptive learning. This support system can be integrated into educational games and in software supporting professional communication and persons with dyslexia. The study also was partially carried out
References (74)
- et al.
Top-down induction of first-order logical decision trees
Artificial Intelligence
(1998) - et al.
A collaborative filtering approach to mitigate the new user cold start problem
Knowledge-Based System
(2012) - et al.
Course planning of extension education to meet market demand by using data mining techniques–an example of Chinkuo technology university in Taiwan
Expert Systems with Applications
(2008) - et al.
Examining competitive, collaborative and adaptive gamification in young learners' math learning
Computer & Education
(2018) - et al.
Computer adaptive practice on Maths ability using a new item response model for on the fly ability and difficulty estimation
Computers & Education
(2011) - et al.
Tree ensembles for predicting structured outputs
Pattern Recognition
(2013) - et al.
Facing the cold start problem in recommender systems
Expert Systems with Applications
(2014) - et al.
Dropout prediction in e-learning courses through the combination of machine learning techniques
Computers & Education
(2009) - et al.
DESPRO: A method based on roles to provide collaboration analysis support adapted to the participants in CSCL situations
Computers & Education
(2015) - et al.
Sentiment analysis in Facebook and its application to e-learning
Computers in Human Behavior
(2014)