skip to main content
10.1145/3448139.3448140acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

Validity and Reliability of Student Models for Problem-Solving Activities

Published:12 April 2021Publication History

ABSTRACT

Student models are typically evaluated through predicting the correctness of the next answer. This approach is insufficient in the problem-solving context, especially for student models that use performance data beyond binary correctness. We propose more comprehensive methods for validating student models and illustrate them in the context of introductory programming. We demonstrate the insufficiency of the next answer correctness prediction task, as it is neither able to reveal low validity of student models that use just binary correctness, nor does it show increased validity of models that use other performance data. The key message is that the prevalent usage of the next answer correctness for validating student models and binary correctness as the only input to the models is not always warranted and limits the progress in learning analytics.

References

  1. Richard Anderson-Sprecher. 1994. Model comparisons and R2. The American Statistician 48, 2 (1994), 113–117.Google ScholarGoogle Scholar
  2. David Bau, Jeff Gray, Caitlin Kelleher, Josh Sheldon, and Franklyn Turbak. 2017. Learnable Programming: Blocks and Beyond. Commun. ACM 60, 6 (2017), 72–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joseph E Beck and Xiaolu Xiong. 2013. Limits to accuracy: How well can we do at student modeling. In Proceedings of Educational Data Mining.Google ScholarGoogle Scholar
  4. Yoav Bergner. 2017. Measurement and its uses in learning analytics. Handbook of learning analytics 35, 2 (2017).Google ScholarGoogle Scholar
  5. Peter Brusilovsky, Charalampos Karagiannidis, and Demetrios Sampson. 2004. Layered evaluation of adaptive learning systems. International Journal of Continuing Engineering Education and Life Long Learning 14, 4-5 (2004), 402–421.Google ScholarGoogle ScholarCross RefCross Ref
  6. David A Cook and Thomas J Beckman. 2006. Current concepts in validity and reliability for psychometric instruments: theory and application. The American journal of medicine 119, 2 (2006), 166–e7.Google ScholarGoogle Scholar
  7. Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 3 (1951), 297–334.Google ScholarGoogle ScholarCross RefCross Ref
  8. Holli A DeVon, Michelle E Block, Patricia Moyle-Wright, Diane M Ernst, Susan J Hayden, Deborah J Lazzara, Suzanne M Savoy, and Elizabeth Kostas-Polston. 2007. A psychometric toolbox for testing validity and reliability. Journal of Nursing Scholarship 39, 2 (2007), 155–164.Google ScholarGoogle ScholarCross RefCross Ref
  9. Christopher Doble, Jeffrey Matayoshi, Eric Cosyn, Hasan Uzun, and Arash Karami. 2019. A data-based simulation study of reliability for an adaptive assessment based on knowledge space theory. International Journal of Artificial Intelligence in Education 29, 2(2019), 258–282.Google ScholarGoogle ScholarCross RefCross Ref
  10. Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring Difficulty of Introductory Programming Tasks. In Proceedings of Learning at Scale. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tomáš Effenberger and Radek Pelánek. 2018. Towards making block-based programming activities adaptive. In Proceedings of Learning at Scale. ACM, 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tomáš Effenberger and Radek Pelánek. 2019. Measuring Students’ Performance on Programming Tasks. In Proceedings of Learning at Scale. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tomáš Effenberger and Radek Pelánek. 2020. Impact of Methodological Choices on the Evaluation of Student Models. In Proceedings of Artificial Intelligence in Education. Springer, 153–164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luke Glenn Eglington and Philip I Pavlik. 2019. Predictiveness of Prior Failures is Improved by Incorporating Trial Duration. Journal of Educational Data Mining 11, 2 (2019), 1–19.Google ScholarGoogle Scholar
  15. José González-Brenes and Yun Huang. 2015. Your model is predictive - but is it useful? theoretical and empirical considerations of a new paradigm for adaptive tutoring evaluation. In Proceedings of Educational Data Mining.Google ScholarGoogle Scholar
  16. Kilem L Gwet. 2014. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.Google ScholarGoogle Scholar
  17. Wynne Harlen. 2005. Trusting teachers’ judgement: Research evidence of the reliability and validity of teachers’ assessment used for summative purposes. Research papers in education 20, 3 (2005), 245–270.Google ScholarGoogle Scholar
  18. Yun Huang, José P González-Brenes, Rohit Kumar, and Peter Brusilovsky. 2015. A Framework for Multifaceted Evaluation of Student Models. In Proceedings of Educational Data Mining.Google ScholarGoogle Scholar
  19. Michael T Kane. 2013. Validating the interpretations and uses of test scores. Journal of Educational Measurement 50, 1 (2013), 1–73.Google ScholarGoogle ScholarCross RefCross Ref
  20. S Klinkenberg, M Straatemeier, and HLJ Van der Maas. 2011. Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation. Computers & Education 57, 2 (2011), 1813–1824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G Frederic Kuder and Marion W Richardson. 1937. The theory of the estimation of test reliability. Psychometrika 2, 3 (1937), 151–160.Google ScholarGoogle ScholarCross RefCross Ref
  22. Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276–282.Google ScholarGoogle Scholar
  23. Samuel Messick. 1993. Foundations of validity: Meaning and consequences in psychological assessment. ETS Research Report Series 1993, 2 (1993), i–18.Google ScholarGoogle Scholar
  24. Korinn S Ostrow and Neil T Heffernan. 2018. Testing the Validity and Reliability of Intrinsic Motivation Inventory Subscales Within ASSISTments. In Proceedings of Artificial Intelligence in Education. Springer, 381–394.Google ScholarGoogle ScholarCross RefCross Ref
  25. Alexandros Paramythis, Stephan Weibelzahl, and Judith Masthoff. 2010. Layered evaluation of interactive adaptive systems: framework and formative methods. User Modeling and User-Adapted Interaction 20, 5 (2010), 383–453.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Miranda C Parker, Mark Guzdial, and Shelly Engleman. 2016. Replication, validation, and use of a language independent CS1 knowledge assessment. In Proceedings of International Computing Education Research. 93–101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Radek Pelánek. 2015. Metrics for Evaluation of Student Models.Journal of Educational Data Mining 7, 2 (2015), 1–19.Google ScholarGoogle Scholar
  28. Radek Pelánek. 2017. Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Modeling and User-Adapted Interaction 27, 3-5 (2017), 313–350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Radek Pelánek. 2018. The details matter: methodological nuances in the evaluation of student models. User Modeling and User-Adapted Interaction 28 (2018), 207–235. Issue 3. https://doi.org/10.1007/s11257-018-9204-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  30. Radek Pelánek and Tomáš Effenberger. 2020. Beyond Binary Correctness: Classification of Students’ Answers in Learning Systems. User Modeling and User-Adapted Interaction 27, 1 (2020), 89–118.Google ScholarGoogle Scholar
  31. Radek Pelánek and Petr Jarušek. 2015. Student Modeling Based on Problem Solving Times. International Journal of Artificial Intelligence in Education (2015), 1–27.Google ScholarGoogle ScholarCross RefCross Ref
  32. Radek Pelánek and Jiří Řihák. 2018. Analysis and design of mastery learning criteria. New Review of Hypermedia and Multimedia 24, 3 (2018), 133–159.Google ScholarGoogle ScholarCross RefCross Ref
  33. Allison Elliott Tew and Mark Guzdial. 2011. The FCS1: a language independent assessment of CS1 knowledge. In Proceedings of ACM Technical Symposium on Computer Science Education. 111–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ian Utting, Allison Elliott Tew, Mike McCracken, Lynda Thomas, Dennis Bouvier, Roger Frye, James Paterson, Michael Caspersen, Yifat Ben-David Kolikant, Juha Sorva, 2013. A fresh look at novice programmers’ performance and their teachers’ expectations. In Proceedings of Innovation and Technology in Computer Science Education. 15–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Eric G Van Inwegen, Seth A Adjei, Yan Wang, and Neil T Heffernan. 2015. Using Partial Credit and Response History to Model User Knowledge. In Proceedings of Educational Data Mining.Google ScholarGoogle Scholar
  36. Yutao Wang and Neil Heffernan. 2013. Extending knowledge tracing to allow partial credit: Using continuous versus binary nodes. In Proceedings of Artificial Intelligence in Education. Springer, 181–188.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yutao Wang, Neil T Heffernan, and Joseph E Beck. 2010. Representing Student Performance with Partial Credit. In Proceedings of Educational Data Mining. 335–336.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference
    April 2021
    645 pages
    ISBN:9781450389358
    DOI:10.1145/3448139

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 April 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate236of782submissions,30%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format