research-article

Validity and Reliability of Student Models for Problem-Solving Activities

Authors:
Tomas Effenberger

Masaryk University, Czechia

Masaryk University, Czechia
View Profile

,
Radek Pelánek

Masaryk University Brno, Czechia

Masaryk University Brno, Czechia
View Profile

LAK21: LAK21: 11th International Learning Analytics and Knowledge ConferenceApril 2021Pages 1–11https://doi.org/10.1145/3448139.3448140

Published:12 April 2021Publication History

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

Pages 1–11

ABSTRACT

Student models are typically evaluated through predicting the correctness of the next answer. This approach is insufficient in the problem-solving context, especially for student models that use performance data beyond binary correctness. We propose more comprehensive methods for validating student models and illustrate them in the context of introductory programming. We demonstrate the insufficiency of the next answer correctness prediction task, as it is neither able to reveal low validity of student models that use just binary correctness, nor does it show increased validity of models that use other performance data. The key message is that the prevalent usage of the next answer correctness for validating student models and binary correctness as the only input to the models is not always warranted and limits the progress in learning analytics.

References

Richard Anderson-Sprecher. 1994. Model comparisons and R2. The American Statistician 48, 2 (1994), 113–117.Google Scholar
David Bau, Jeff Gray, Caitlin Kelleher, Josh Sheldon, and Franklyn Turbak. 2017. Learnable Programming: Blocks and Beyond. Commun. ACM 60, 6 (2017), 72–80.Google ScholarDigital Library
Joseph E Beck and Xiaolu Xiong. 2013. Limits to accuracy: How well can we do at student modeling. In Proceedings of Educational Data Mining.Google Scholar
Yoav Bergner. 2017. Measurement and its uses in learning analytics. Handbook of learning analytics 35, 2 (2017).Google Scholar
Peter Brusilovsky, Charalampos Karagiannidis, and Demetrios Sampson. 2004. Layered evaluation of adaptive learning systems. International Journal of Continuing Engineering Education and Life Long Learning 14, 4-5 (2004), 402–421.Google ScholarCross Ref
David A Cook and Thomas J Beckman. 2006. Current concepts in validity and reliability for psychometric instruments: theory and application. The American journal of medicine 119, 2 (2006), 166–e7.Google Scholar
Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 3 (1951), 297–334.Google ScholarCross Ref
Holli A DeVon, Michelle E Block, Patricia Moyle-Wright, Diane M Ernst, Susan J Hayden, Deborah J Lazzara, Suzanne M Savoy, and Elizabeth Kostas-Polston. 2007. A psychometric toolbox for testing validity and reliability. Journal of Nursing Scholarship 39, 2 (2007), 155–164.Google ScholarCross Ref
Christopher Doble, Jeffrey Matayoshi, Eric Cosyn, Hasan Uzun, and Arash Karami. 2019. A data-based simulation study of reliability for an adaptive assessment based on knowledge space theory. International Journal of Artificial Intelligence in Education 29, 2(2019), 258–282.Google ScholarCross Ref
Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring Difficulty of Introductory Programming Tasks. In Proceedings of Learning at Scale. ACM.Google ScholarDigital Library
Tomáš Effenberger and Radek Pelánek. 2018. Towards making block-based programming activities adaptive. In Proceedings of Learning at Scale. ACM, 13.Google ScholarDigital Library
Tomáš Effenberger and Radek Pelánek. 2019. Measuring Students’ Performance on Programming Tasks. In Proceedings of Learning at Scale. ACM.Google ScholarDigital Library
Tomáš Effenberger and Radek Pelánek. 2020. Impact of Methodological Choices on the Evaluation of Student Models. In Proceedings of Artificial Intelligence in Education. Springer, 153–164.Google ScholarDigital Library
Luke Glenn Eglington and Philip I Pavlik. 2019. Predictiveness of Prior Failures is Improved by Incorporating Trial Duration. Journal of Educational Data Mining 11, 2 (2019), 1–19.Google Scholar
José González-Brenes and Yun Huang. 2015. Your model is predictive - but is it useful? theoretical and empirical considerations of a new paradigm for adaptive tutoring evaluation. In Proceedings of Educational Data Mining.Google Scholar
Kilem L Gwet. 2014. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.Google Scholar
Wynne Harlen. 2005. Trusting teachers’ judgement: Research evidence of the reliability and validity of teachers’ assessment used for summative purposes. Research papers in education 20, 3 (2005), 245–270.Google Scholar
Yun Huang, José P González-Brenes, Rohit Kumar, and Peter Brusilovsky. 2015. A Framework for Multifaceted Evaluation of Student Models. In Proceedings of Educational Data Mining.Google Scholar
Michael T Kane. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement 50, 1 (2013), 1–73.Google ScholarCross Ref
S Klinkenberg, M Straatemeier, and HLJ Van der Maas. 2011. Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation. Computers & Education 57, 2 (2011), 1813–1824.Google ScholarDigital Library
G Frederic Kuder and Marion W Richardson. 1937. The theory of the estimation of test reliability. Psychometrika 2, 3 (1937), 151–160.Google ScholarCross Ref
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276–282.Google Scholar
Samuel Messick. 1993. Foundations of validity: Meaning and consequences in psychological assessment. ETS Research Report Series 1993, 2 (1993), i–18.Google Scholar
Korinn S Ostrow and Neil T Heffernan. 2018. Testing the Validity and Reliability of Intrinsic Motivation Inventory Subscales Within ASSISTments. In Proceedings of Artificial Intelligence in Education. Springer, 381–394.Google ScholarCross Ref
Alexandros Paramythis, Stephan Weibelzahl, and Judith Masthoff. 2010. Layered evaluation of interactive adaptive systems: framework and formative methods. User Modeling and User-Adapted Interaction 20, 5 (2010), 383–453.Google ScholarDigital Library
Miranda C Parker, Mark Guzdial, and Shelly Engleman. 2016. Replication, validation, and use of a language independent CS1 knowledge assessment. In Proceedings of International Computing Education Research. 93–101.Google ScholarDigital Library
Radek Pelánek. 2015. Metrics for Evaluation of Student Models.Journal of Educational Data Mining 7, 2 (2015), 1–19.Google Scholar
Radek Pelánek. 2017. Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Modeling and User-Adapted Interaction 27, 3-5 (2017), 313–350.Google ScholarDigital Library
Radek Pelánek. 2018. The details matter: methodological nuances in the evaluation of student models. User Modeling and User-Adapted Interaction 28 (2018), 207–235. Issue 3. https://doi.org/10.1007/s11257-018-9204-yGoogle ScholarDigital Library
Radek Pelánek and Tomáš Effenberger. 2020. Beyond Binary Correctness: Classification of Students’ Answers in Learning Systems. User Modeling and User-Adapted Interaction 27, 1 (2020), 89–118.Google Scholar
Radek Pelánek and Petr Jarušek. 2015. Student Modeling Based on Problem Solving Times. International Journal of Artificial Intelligence in Education (2015), 1–27.Google ScholarCross Ref
Radek Pelánek and Jiří Řihák. 2018. Analysis and design of mastery learning criteria. New Review of Hypermedia and Multimedia 24, 3 (2018), 133–159.Google ScholarCross Ref
Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: a language independent assessment of CS1 knowledge. In Proceedings of ACM Technical Symposium on Computer Science Education. 111–116.Google ScholarDigital Library
Ian Utting, Allison Elliott Tew, Mike McCracken, Lynda Thomas, Dennis Bouvier, Roger Frye, James Paterson, Michael Caspersen, Yifat Ben-David Kolikant, Juha Sorva, 2013. A fresh look at novice programmers’ performance and their teachers’ expectations. In Proceedings of Innovation and Technology in Computer Science Education. 15–32.Google ScholarDigital Library
Eric G Van Inwegen, Seth A Adjei, Yan Wang, and Neil T Heffernan. 2015. Using Partial Credit and Response History to Model User Knowledge. In Proceedings of Educational Data Mining.Google Scholar
Yutao Wang and Neil Heffernan. 2013. Extending knowledge tracing to allow partial credit: Using continuous versus binary nodes. In Proceedings of Artificial Intelligence in Education. Springer, 181–188.Google ScholarCross Ref
Yutao Wang, Neil T Heffernan, and Joseph E Beck. 2010. Representing Student Performance with Partial Credit. In Proceedings of Educational Data Mining. 335–336.Google Scholar

Recommendations

Standards of validity and the validity of standards in behavioral software engineering research: the perspective of psychological test theory
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Background. There are some publications in software engineering research that aim at guiding researchers in assessing validity threats to their studies. Still, many researchers fail to address many aspects of validity that are essential to quantitative ...
Read More
Validity in (Co-) Simulation
Software Engineering and Formal Methods. SEFM 2022 Collocated Workshops
Abstract
Co-simulation is an essential tool for the design of complex engineered systems. From early on in the life cycle of a system, models at different levels of abstraction and approximation are combined to make decisions about the system under design. ...
Read More
Validity Under Assumptions and Modus Ponens
Logic and Argumentation
Abstract
Slightly altering and extending McGee’s semantics for conditionals, we define a ternary notion of validity for natural language arguments, which can be regarded as a unification of two kinds of validity in the literature. By the new notion of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference
April 2021
645 pages
ISBN:9781450389358
DOI:10.1145/3448139
Program Chairs:
Maren Scheffel
Ruhr University Bochum, Germany
,
Nia Dowell
University of California, Irvine, USA
,
Srecko Joksimovic
University of South Australia, Australia
,
George Siemens
University of Texas, Arlington, USA & University of South Australia, Australia
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
difficulties
introductory programming
performance measures
problem solving
reliability
skills
student modeling
validity
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate236of782submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 431
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Validity and Reliability of Student Models for Problem-Solving Activities

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Recommendations

Standards of validity and the validity of standards in behavioral software engineering research: the perspective of psychological test theory

Validity in (Co-) Simulation

Validity Under Assumptions and Modus Ponens

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Validity and Reliability of Student Models for Problem-Solving Activities

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Recommendations

Standards of validity and the validity of standards in behavioral software engineering research: the perspective of psychological test theory

Validity in (Co-) Simulation

Validity Under Assumptions and Modus Ponens

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media