research-article

ProgSnap2: A Flexible Format for Programming Process Data

Authors:
Thomas W. Price

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
David Hovemeyer

Johns Hopkins University, Baltimore, MD, USA

Johns Hopkins University, Baltimore, MD, USA
View Profile

,
Kelly Rivers

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Ge Gao

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Austin Cory Bart

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Ayaan M. Kazerouni

Virginia Tech, Blackburg, VA, USA

Virginia Tech, Blackburg, VA, USA
View Profile

,
Brett A. Becker

University College Dublin, Dublin, Ireland

University College Dublin, Dublin, Ireland
View Profile

,
Andrew Petersen

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Luke Gusukuma

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA
View Profile

,
Stephen H. Edwards

Virginia Tech, Blacksburg, VA, USA

Virginia Tech, Blacksburg, VA, USA
View Profile

,
David Babcock

York College, York, PA, USA

York College, York, PA, USA
View Profile

ITiCSE '20: Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science EducationJune 2020Pages 356–362https://doi.org/10.1145/3341525.3387373

Published:15 June 2020Publication History

ITiCSE '20: Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education

Pages 356–362

ABSTRACT

In this paper, we introduce ProgSnap2, a standardized format for logging programming process data. ProgSnap2 is a tool for computing education researchers, with the goal of enabling collaboration by helping them to collect and share data, analysis code, and data-driven tools to support students. We give an overview of the format, including how events, event attributes, metadata, code snapshots and external resources are represented. We also present a case study to evaluate how ProgSnap2 can facilitate collaborative research. We investigated three metrics designed to quantify students' difficulty with compiler errors - the Error Quotient, Repeated Error Density and Watwin score - and compared their distributions and ability to predict students' performance. We analyzed five different ProgSnap2 datasets, spanning a variety of contexts and programming languages. We found that each error metric is mildly predictive of students' performance. We reflect on how the common data format allowed us to more easily investigate our research questions.

References

[n.d.]. SPLICE: Standards, Protocols, and Learning Infrastructure for Computing Education. https://cssplice.github.io/. Accessed: 2019-08--19.Google Scholar
Austin Cory Bart, Javier Tibau, Eli Tilevich, Clifford A Shaffer, and Dennis Kafura. 2017. Blockpy: An open access data-science environment for introductory programmers. Computer 50, 5 (2017), 18--26.Google ScholarDigital Library
Brett A. Becker. 2016. A New Metric to Quantify Repeated Compiler Errors for Novice Programmers. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '16). ACM, New York, NY, USA, 296--301. https://doi.org/10.1145/2899415.2899463Google ScholarDigital Library
Brett A. Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J. Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, Janice L. Pearce, and James Prather. 2019. Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education (ITiCSE-WGR '19). ACM, New York, NY, USA, 177--210. https://doi.org/10.1145/3344429.3372508Google ScholarDigital Library
Brett A Becker and Catherine Mooney. 2016. Categorizing compiler error messages with principal component analysis. In 12th China-Europe International Symposium on Software Engineering Education (CEISEE 2016), Shenyang, China, 28--29 May 2016.Google Scholar
Neil C.C. Brown and Amjad Altadmri. 2014. Investigating Novice Programming Mistakes: Educator Beliefs vs. Student Data. In Proceedings of the Tenth Annual Conference on International Computing Education Research (ICER '14). ACM, New York, NY, USA, 43--50. https://doi.org/10.1145/2632320.2632343Google ScholarDigital Library
Neil CC Brown, Amjad Altadmri, Sue Sentance, and Michael Kölling. 2018. Blackbox, Five Years On: An Evaluation of a Large-scale Programming Data Collection Project. In Proceedings of the 2018 ACM Conference on International Computing Education Research. ACM, 196--204.Google ScholarDigital Library
Neil C.C. Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A Large Scale Repository of Novice Programmers' Activity. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE '14). ACM, New York, NY, USA, 223--228. https://doi.org/10.1145/2538862.2538924Google ScholarDigital Library
Adam S Carter, Christopher D Hundhausen, and Olusola Adesope. 2015. The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In Proceedings of the eleventh annual International Conference on International Computing Education Research. ACM, 141--150.Google ScholarDigital Library
Stephen H Edwards and Krishnan Panamalai Murali. 2017. CodeWorkout: short programming exercises with built-in data collection. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 188--193.Google ScholarDigital Library
Luke Gusukuma, Austin Cory Bart, Dennis Kafura, and Jeremy Ernst. 2018. Misconception-Driven Feedback. Proceedings of the 2018 ACM Conference on International Computing Education Research - ICER '18 1 (2018), 160--168. https://doi.org/10.1145/3230977.3231002Google Scholar
Qiang Hao, David H Smith IV, Naitra Iriumi, Michail Tsikerdekis, and Andrew J Ko. 2019. A Systematic Investigation of Replications in Computing Education Research. ACM Transactions on Computing Education (TOCE) 19, 4 (2019), 42.Google Scholar
Arto Hellas, Juho Leinonen, and Petri Ihantola. 2017. Plagiarism in Take-home Exams: Help-seeking, Collaboration, and Systematic Cheating. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '17). ACM, New York, NY, USA, 238--243. https://doi.org/10.1145/3059009.3059065Google ScholarDigital Library
David Hovemeyer, Arto Hellas, Andrew Petersen, and Jaime Spacco. 2017. Progsnap: Sharing Programming Snapshots for Research (Abstract Only). In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 709--709. https://doi.org/10.1145/3017680.3022418Google ScholarDigital Library
David Hovemeyer and Jaime Spacco. 2013. CloudCoder: a web-based programming exercise system. Journal of Computing Sciences in Colleges 28, 3 (2013), 30--30.Google ScholarDigital Library
Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, et al. 2015. Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 ITiCSE on Working Group Reports. ACM, 41--63.Google ScholarDigital Library
Matthew C. Jadud. 2006. Methods and Tools for Exploring Novice Compilation Behaviour. In Proceedings of the Second International Workshop on Computing Education Research (ICER '06). ACM, New York, NY, USA, 73--84. https://doi.org/10.1145/1151588.1151600Google ScholarDigital Library
Matthew C Jadud and Brian Dorn. 2015. Aggregate Compilation Behavior: Findings and Implications from 27,698 Users. In Proceedings of the 11th International Computing Education Research Conference. 131--139. https://doi.org/10.1145/2787622.2787718Google ScholarDigital Library
Ioannis Karvelas, Annie Li, and Brett A. Becker. 2020. The Effects of Compilation Mechanisms and Error Message Presentation on Novice Programmer Behavior. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE '20). Association for Computing Machinery, New York, NY, USA, 759--765. https://doi.org/10.1145/3328778.3366882Google Scholar
Kenneth R Koedinger, Ryan SJd Baker, Kyle Cunningham, Alida Skogsholm, Brett Leber, and John Stamper. 2010. A data repository for the EDM community: The PSLC DataShop. Handbook of educational data mining 43 (2010), 43--56.Google Scholar
Kenneth R Koedinger, John Stamper, and Paulo F Carvalho. [n.d.]. Sharing and Reusing Data and Analytic Methods with LearnSphere. Hands-on 2 ([n. d.]), 30p.Google Scholar
Daniel Marchena Parreira, Andrew Petersen, and Michelle Craig. 2015. Pcrs-c: Helping students learn c. In Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education. ACM, 347--347.Google Scholar
Andrew Petersen, Jaime Spacco, and Arto Vihavainen. 2015. An Exploration of Error Quotient in Multiple Contexts. In Proceedings of the 15th Koli Calling Conference on Computing Education Research (Koli Calling '15). ACM, New York, NY, USA, 77--86. https://doi.org/10.1145/2828959.2828966Google ScholarDigital Library
Thomas W. Price, Yihuan Dong, and Dragan Lipovac. 2017. iSnap: Towards Intelligent Tutoring in Novice Programming Environments. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, NewYork, NY, USA, 483--488. https://doi.org/10.1145/3017680.3017762Google ScholarDigital Library
Thomas W Price and Ge Gao. 2019. Lightning Talk: Curating Analyses for Programming Log Data. In Proceedings of SPLICE 2019 workshop Computing Science Education Infrastructure: From Tools to Data at 15th ACM International Computing Education Research Conference.Google Scholar
Thomas W Price, Rui Zhi, and Tiffany Barnes. 2017. Evaluation of a Data-driven Feedback Algorithm for Open-ended Programming.. In EDM.Google Scholar
Kyle Reestman and Brian Dorn. 2019. Native Language's Effect on Java Compiler Errors. In Proceedings of the 2019 ACM Conference on International Computing Education Research (ICER '19). Association for Computing Machinery, New York, NY, USA, 249--257. https://doi.org/10.1145/3291279.3339423Google ScholarDigital Library
Kelly Rivers, Erik Harpstead, and Ken Koedinger. 2016. Learning Curve Analysis for Programming: Which Concepts do Students Struggle With?. In Proceedings of the International Computing Education Research Conference. 143--151.Google ScholarDigital Library
Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27, 1 (2017), 37--64.Google ScholarCross Ref
Maria Mercedes T Rodrigo, Emily Tabanao, Ma Beatriz E Lahoz, and Matthew C Jadud. 2009. Analyzing online protocols to characterize novice java programmers. Philippine Journal of Science 138, 2 (2009), 177--190.Google Scholar
Jaime Spacco, Jaymie Strecker, David Hovemeyer, and William Pugh. 2005. Software repository mining with Marmoset: An automated programming project snapshot and testing system. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 1--5.Google ScholarDigital Library
John Stamper, Stephen Edwards, Andrew Petersen, Thomas Price, and Ian Utting. 2017. Developing a Data Standard for Computing Education Learning Process Data (DATASTAND). https://cssplice.github.io/DATASTAND.pdf. Accessed: 2019-08--19.Google Scholar
Emily S. Tabanao, Ma. Mercedes T. Rodrigo, and Matthew C. Jadud. 2011. Predicting At-risk Novice Java Programmers Through the Analysis of Online Protocols. In Proceedings of the Seventh International Workshop on Computing Education Research (ICER '11). ACM, New York, NY, USA, 85--92. https://doi.org/10.1145/2016911.2016930Google Scholar
Christopher Watson, Frederick WB Li, and Jamie L Godwin. 2013. Predicting performance in an introductory programming course by logging and analyzing student programming behavior. In 2013 IEEE 13th International Conference on Advanced Learning Technologies. IEEE, 319--323.Google ScholarDigital Library
Christopher Watson, Frederick W.B. Li, and Jamie L. Godwin. 2014. No Tests Required: Comparing Traditional and Dynamic Predictors of Programming Success. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE '14). ACM, New York, NY, USA, 469--474. https://doi.org/10.1145/2538862.2538930Google Scholar
Michael Yudelson, Roya Hosseini, Arto Vihavainen, and Peter Brusilovsky. 2014. Investigating automated student modeling in a Java MOOC. Educational Data Mining 2014 (2014), 261--264.Google Scholar

Index Terms

ProgSnap2: A Flexible Format for Programming Process Data
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education

Recommendations

Exploring Novice Programming Behavior over Time
ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 2

This work focuses on the effect that programming time has on novice programmers' interaction with two versions of the BlueJ programming environment that differ in compilation mechanism and error message presentation. We utilize programming process data ...
Read More
Investigating Novice Programmers' Interaction with Programming Environments
ITiCSE '19: Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education

Learning computer programming can be challenging for novices. Students have to deal with theoretical aspects of programming and problem solving in general, as well as mastering the syntax of a programming language. However, the feedback students receive ...
Read More
Sympathy for the (Novice) Developer: Programming Activity When Compilation Mechanism Varies
SIGCSE 2022: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1

In this work we investigate compilation behavior and error resolution time of thousands of novice programmers using two different versions of the BlueJ pedagogical Java programming environment. The two versions feature different compilation and error ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ITiCSE '20: Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education
June 2020
615 pages
ISBN:9781450368742
DOI:10.1145/3341525
General Chairs:
Michail Giannakos
NTNU, Norway
,
Guttorm Sindre
Excited/NTNU, Norway
,
Program Chairs:
Andrew Luxton-Reilly
University of Auckland, NZ
,
Monica Divitini
NTNU, Norway
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compiler error metrics
data sharing
programming process data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate552of1,613submissions,34%
Upcoming Conference
ITiCSE 2024

Sponsor:

sigcse

Innovation and Technology in Computer Science Education

July 8 - 10, 2024

Milan , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 414
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ProgSnap2: A Flexible Format for Programming Process Data

ITiCSE '20: Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring Novice Programming Behavior over Time

Investigating Novice Programmers' Interaction with Programming Environments

Sympathy for the (Novice) Developer: Programming Activity When Compilation Mechanism Varies