ABSTRACT
While data science education has gained increased recognition in both academic institutions and industry, there has been a lack of research on automated coding assessment for novice students. Our work presents a first step in this direction, by leveraging the coding metrics from traditional software engineering (Halstead Volume and Cyclomatic Complexity) in combination with those that reflect a data science project’s learning objectives (number of library calls and number of common library calls with the solution code). Through these metrics, we examined the code submissions of 97 students across two semesters of an introductory data science course. Our results indicated that the metrics can identify cases where students had overly complicated codes and would benefit from scaffolding feedback. The number of library calls, in particular, was also a significant predictor of changes in submission score and submission runtime, which highlights the distinctive nature of data science programming. We conclude with suggestions for extending our analyses towards more actionable intervention strategies, for example by tracking the fine-grained submission grading outputs throughout a student’s submission history, to better model and support them in their data science learning process.
- [n.d.]. Radon. https://github.com/rubik/radon.Google Scholar
- Craig Anslow, John Brosz, Frank Maurer, and Mike Boyes. 2016. Datathons: an experience report of data hackathons for data science education. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 615–620.Google ScholarDigital Library
- Elena García Barriocanal, Miguel-Ángel Sicilia Urbán, Ignacio Aedo Cuevas, and Paloma Díaz Pérez. 2002. An experience in integrating automated unit testing practices in an introductory programming course. ACM SIGCSE Bulletin 34, 4 (2002), 125–128.Google ScholarDigital Library
- Robert J Brunner and Edward J Kim. 2016. Teaching data science. Procedia Computer Science 80 (2016), 1947–1956.Google ScholarDigital Library
- Longbing Cao. 2018. Data Science Thinking. In Data Science Thinking. Springer, 59–90.Google Scholar
- Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarDigital Library
- Richard D De Veaux, Mahesh Agarwal, Maia Averett, Benjamin S Baumer, Andrew Bray, Thomas C Bressoud, Lance Bryant, Lei Z Cheng, Amanda Francis, Robert Gould, 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application 4 (2017), 15–30.Google ScholarCross Ref
- Nicholas Diana, Michael Eagle, John Stamper, Shuchi Grover, Marie Bienkowski, and Satabdi Basu. 2017. An instructor dashboard for real-time analytics in interactive programming assignments. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference. 272–279.Google ScholarDigital Library
- Tomáš Effenberger, Jaroslav Cechák, and Radek Pelánek. 2019. Difficulty and Complexity of Introductory Programming Problems. (2019).Google Scholar
- Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring Difficulty of Introductory Programming Tasks. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale. 1–4.Google ScholarDigital Library
- Seth Copen Goldstein, Hongyi Zhang, Majd Sakr, Haokang An, and Cameron Dashti. 2019. Understanding how work habits influence student performance. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 154–160.Google ScholarDigital Library
- Maurice Howard Halstead 1977. Elements of software science. Vol. 7. Elsevier New York.Google Scholar
- Erik Harpstead and Vincent Aleven. 2015. Using empirical learning curve analysis to inform design in an educational game. In Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play. 197–207.Google ScholarDigital Library
- Charles R Harris, K Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357–362.Google Scholar
- Petri Ihantola and Andrew Petersen. 2019. Code complexity in introductory programming courses. In Proceedings of the 52nd Hawaii International Conference on System Sciences.Google ScholarCross Ref
- Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 ITiCSE on Working Group Reports. 41–63.Google ScholarDigital Library
- Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87–90.Google Scholar
- Pardha Koyya, Young Lee, and Jeong Yang. 2013. Feedback for programming assignments using software-metrics and reference code. International Scholarly Research Notices 2013 (2013).Google ScholarCross Ref
- Sean Kross and Philip J Guo. 2019. Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
- Andrew Luxton-Reilly and Andrew Petersen. 2017. The compound nature of novice programming assessments. In Proceedings of the Nineteenth Australasian Computing Education Conference. 26–35.Google ScholarDigital Library
- Sohail Iqbal Malik. 2018. Improvements in introductory programming course: action research insights and outcomes. Systemic Practice and Action Research 31, 6 (2018), 637–656.Google ScholarCross Ref
- Samiha Marwan, Joseph Jay Williams, and Thomas Price. 2019. An Evaluation of the Impact of Automated Programming Hints on Performance and Learning. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 61–70.Google ScholarDigital Library
- Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarDigital Library
- Wes McKinney 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56.Google ScholarCross Ref
- Huy Nguyen, Yeyu Wang, John Stamper, and Bruce M McLaren. 2019. Using Knowledge Component Modeling to Increase Domain Understanding in a Digital Learning Game.International Educational Data Mining Society (2019).Google Scholar
- Vu Nguyen, Sophia Deeds-Rubin, Thomas Tan, and Barry Boehm. 2007. A SLOC counting standard. In Cocomo ii forum, Vol. 2007. Citeseer, 1–16.Google Scholar
- Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, and Arnab Bhattacharya. 2017. Automatic grading and feedback using program repair for introductory programming courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. 92–97.Google ScholarDigital Library
- Thomas Price, Baker Franke, Shuchi Grover, and Monica M McGill. 2020. Using Data to Inform Computing Education Research and Practice. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 175–176.Google ScholarDigital Library
- Keith Quille and Susan Bergin. 2019. CS1: how will they do? How can we help? A decade of research and practice. Computer Science Education 29, 2-3 (2019), 254–282.Google ScholarCross Ref
- Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27, 1(2017), 37–64.Google ScholarCross Ref
- Jeffrey Saltz and Robert Heckman. 2016. Big Data science education: A case study of a project-focused introductory course. Themes in science and technology education 8, 2 (2016), 85–94.Google Scholar
- Jeffrey S Saltz, Neil I Dewar, and Robert Heckman. 2018. Key concepts for a data science ethics curriculum. In Proceedings of the 49th ACM technical symposium on computer science education. 952–957.Google ScholarDigital Library
- Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.Google ScholarCross Ref
- John C Stamper and Kenneth R Koedinger. 2011. Human-machine student model discovery and improvement using DataShop. In International Conference on Artificial Intelligence in Education. Springer, 353–360.Google ScholarCross Ref
- Rong Tang and Watinee Sae-Lim. 2016. Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information 32, 3 (2016), 269–290.Google ScholarCross Ref
- Leo C Ureel II and Charles Wallace. 2019. Automated Critique of Early Programming Antipatterns. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 738–744.Google Scholar
- Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272.Google Scholar
- Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course
Recommendations
Exploring Interdisciplinary Data Science Education for Undergraduates: Preliminary Results
Diversity, Divergence, DialogueAbstractThis paper reports a systematic literature review on undergraduate data science education followed by semi-structured interviews with two frontier data science educators. Through analyzing the hosting departments, design principles, curriculum ...
Big data and data science: what should we teach?
The era of big data has arrived. Big data bring us the data-driven paradigm and enlighten us to challenge new classes of problems we were not able to solve in the past. We are beginning to see the impacts of big data in every aspect of our lives and ...
Computing in Data Science or Data in Computer Science? Exploring the Relationship between Data Science and Computer Science in K-12 Education
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2students to learn in order to succeed in an increasingly data-driven world. Foundational data literacy skills currently live in a number of subjects across K-12 (e.g., data collection and analysis in science classes, statistical calculations in ...
Comments