short-paper

Open Access

Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course

Authors:
Huy Nguyen

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

,
Michelle Lim

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

,
Steven Moore

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

,
Eric Nyberg

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

,
Majd Sakr

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

,
John Stamper

Carnegie Mellon University, United States

Carnegie Mellon University, United States
View Profile

LAK21: LAK21: 11th International Learning Analytics and Knowledge ConferenceApril 2021Pages 632–638https://doi.org/10.1145/3448139.3448209

Published:12 April 2021Publication History

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

Pages 632–638

ABSTRACT

While data science education has gained increased recognition in both academic institutions and industry, there has been a lack of research on automated coding assessment for novice students. Our work presents a first step in this direction, by leveraging the coding metrics from traditional software engineering (Halstead Volume and Cyclomatic Complexity) in combination with those that reflect a data science project’s learning objectives (number of library calls and number of common library calls with the solution code). Through these metrics, we examined the code submissions of 97 students across two semesters of an introductory data science course. Our results indicated that the metrics can identify cases where students had overly complicated codes and would benefit from scaffolding feedback. The number of library calls, in particular, was also a significant predictor of changes in submission score and submission runtime, which highlights the distinctive nature of data science programming. We conclude with suggestions for extending our analyses towards more actionable intervention strategies, for example by tracking the fine-grained submission grading outputs throughout a student’s submission history, to better model and support them in their data science learning process.

References

[n.d.]. Radon. https://github.com/rubik/radon.Google Scholar
Craig Anslow, John Brosz, Frank Maurer, and Mike Boyes. 2016. Datathons: an experience report of data hackathons for data science education. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 615–620.Google ScholarDigital Library
Elena García Barriocanal, Miguel-Ángel Sicilia Urbán, Ignacio Aedo Cuevas, and Paloma Díaz Pérez. 2002. An experience in integrating automated unit testing practices in an introductory programming course. ACM SIGCSE Bulletin 34, 4 (2002), 125–128.Google ScholarDigital Library
Robert J Brunner and Edward J Kim. 2016. Teaching data science. Procedia Computer Science 80 (2016), 1947–1956.Google ScholarDigital Library
Longbing Cao. 2018. Data Science Thinking. In Data Science Thinking. Springer, 59–90.Google Scholar
Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarDigital Library
Richard D De Veaux, Mahesh Agarwal, Maia Averett, Benjamin S Baumer, Andrew Bray, Thomas C Bressoud, Lance Bryant, Lei Z Cheng, Amanda Francis, Robert Gould, 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application 4 (2017), 15–30.Google ScholarCross Ref
Nicholas Diana, Michael Eagle, John Stamper, Shuchi Grover, Marie Bienkowski, and Satabdi Basu. 2017. An instructor dashboard for real-time analytics in interactive programming assignments. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference. 272–279.Google ScholarDigital Library
Tomáš Effenberger, Jaroslav Cechák, and Radek Pelánek. 2019. Difficulty and Complexity of Introductory Programming Problems. (2019).Google Scholar
Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring Difficulty of Introductory Programming Tasks. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale. 1–4.Google ScholarDigital Library
Seth Copen Goldstein, Hongyi Zhang, Majd Sakr, Haokang An, and Cameron Dashti. 2019. Understanding how work habits influence student performance. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 154–160.Google ScholarDigital Library
Maurice Howard Halstead 1977. Elements of software science. Vol. 7. Elsevier New York.Google Scholar
Erik Harpstead and Vincent Aleven. 2015. Using empirical learning curve analysis to inform design in an educational game. In Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play. 197–207.Google ScholarDigital Library
Charles R Harris, K Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357–362.Google Scholar
Petri Ihantola and Andrew Petersen. 2019. Code complexity in introductory programming courses. In Proceedings of the 52nd Hawaii International Conference on System Sciences.Google ScholarCross Ref
Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 ITiCSE on Working Group Reports. 41–63.Google ScholarDigital Library
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87–90.Google Scholar
Pardha Koyya, Young Lee, and Jeong Yang. 2013. Feedback for programming assignments using software-metrics and reference code. International Scholarly Research Notices 2013 (2013).Google ScholarCross Ref
Sean Kross and Philip J Guo. 2019. Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
Andrew Luxton-Reilly and Andrew Petersen. 2017. The compound nature of novice programming assessments. In Proceedings of the Nineteenth Australasian Computing Education Conference. 26–35.Google ScholarDigital Library
Sohail Iqbal Malik. 2018. Improvements in introductory programming course: action research insights and outcomes. Systemic Practice and Action Research 31, 6 (2018), 637–656.Google ScholarCross Ref
Samiha Marwan, Joseph Jay Williams, and Thomas Price. 2019. An Evaluation of the Impact of Automated Programming Hints on Performance and Learning. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 61–70.Google ScholarDigital Library
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarDigital Library
Wes McKinney 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56.Google ScholarCross Ref
Huy Nguyen, Yeyu Wang, John Stamper, and Bruce M McLaren. 2019. Using Knowledge Component Modeling to Increase Domain Understanding in a Digital Learning Game.International Educational Data Mining Society (2019).Google Scholar
Vu Nguyen, Sophia Deeds-Rubin, Thomas Tan, and Barry Boehm. 2007. A SLOC counting standard. In Cocomo ii forum, Vol. 2007. Citeseer, 1–16.Google Scholar
Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, and Arnab Bhattacharya. 2017. Automatic grading and feedback using program repair for introductory programming courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. 92–97.Google ScholarDigital Library
Thomas Price, Baker Franke, Shuchi Grover, and Monica M McGill. 2020. Using Data to Inform Computing Education Research and Practice. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 175–176.Google ScholarDigital Library
Keith Quille and Susan Bergin. 2019. CS1: how will they do? How can we help? A decade of research and practice. Computer Science Education 29, 2-3 (2019), 254–282.Google ScholarCross Ref
Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27, 1(2017), 37–64.Google ScholarCross Ref
Jeffrey Saltz and Robert Heckman. 2016. Big Data science education: A case study of a project-focused introductory course. Themes in science and technology education 8, 2 (2016), 85–94.Google Scholar
Jeffrey S Saltz, Neil I Dewar, and Robert Heckman. 2018. Key concepts for a data science ethics curriculum. In Proceedings of the 49th ACM technical symposium on computer science education. 952–957.Google ScholarDigital Library
Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.Google ScholarCross Ref
John C Stamper and Kenneth R Koedinger. 2011. Human-machine student model discovery and improvement using DataShop. In International Conference on Artificial Intelligence in Education. Springer, 353–360.Google ScholarCross Ref
Rong Tang and Watinee Sae-Lim. 2016. Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information 32, 3 (2016), 269–290.Google ScholarCross Ref
Leo C Ureel II and Charles Wallace. 2019. Automated Critique of Early Programming Antipatterns. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 738–744.Google Scholar
Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272.Google Scholar

Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course
1. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs

Recommendations

Exploring Interdisciplinary Data Science Education for Undergraduates: Preliminary Results
Diversity, Divergence, Dialogue
Abstract
This paper reports a systematic literature review on undergraduate data science education followed by semi-structured interviews with two frontier data science educators. Through analyzing the hosting departments, design principles, curriculum ...
Read More
Big data and data science: what should we teach?

The era of big data has arrived. Big data bring us the data-driven paradigm and enlighten us to challenge new classes of problems we were not able to solve in the past. We are beginning to see the impacts of big data in every aspect of our lives and ...
Read More
Computing in Data Science or Data in Computer Science? Exploring the Relationship between Data Science and Computer Science in K-12 Education
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2

students to learn in order to succeed in an increasingly data-driven world. Foundational data literacy skills currently live in a number of subjects across K-12 (e.g., data collection and analysis in science classes, statistical calculations in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference
April 2021
645 pages
ISBN:9781450389358
DOI:10.1145/3448139
Program Chairs:
Maren Scheffel
Ruhr University Bochum, Germany
,
Nia Dowell
University of California, Irvine, USA
,
Srecko Joksimovic
University of South Australia, Australia
,
George Siemens
University of Texas, Arlington, USA & University of South Australia, Australia
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Coding Metrics
Data Science Education
Linear Mixed Model
Programming Analysis
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate236of782submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 591
  Total Downloads
- Downloads (Last 12 months)154
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Recommendations

Exploring Interdisciplinary Data Science Education for Undergraduates: Preliminary Results

Big data and data science: what should we teach?

Computing in Data Science or Data in Computer Science? Exploring the Relationship between Data Science and Computer Science in K-12 Education

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course

LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Recommendations

Exploring Interdisciplinary Data Science Education for Undergraduates: Preliminary Results

Big data and data science: what should we teach?

Computing in Data Science or Data in Computer Science? Exploring the Relationship between Data Science and Computer Science in K-12 Education

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media