skip to main content
10.1145/3448139.3448209acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
short-paper
Open Access

Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course

Published:12 April 2021Publication History

ABSTRACT

While data science education has gained increased recognition in both academic institutions and industry, there has been a lack of research on automated coding assessment for novice students. Our work presents a first step in this direction, by leveraging the coding metrics from traditional software engineering (Halstead Volume and Cyclomatic Complexity) in combination with those that reflect a data science project’s learning objectives (number of library calls and number of common library calls with the solution code). Through these metrics, we examined the code submissions of 97 students across two semesters of an introductory data science course. Our results indicated that the metrics can identify cases where students had overly complicated codes and would benefit from scaffolding feedback. The number of library calls, in particular, was also a significant predictor of changes in submission score and submission runtime, which highlights the distinctive nature of data science programming. We conclude with suggestions for extending our analyses towards more actionable intervention strategies, for example by tracking the fine-grained submission grading outputs throughout a student’s submission history, to better model and support them in their data science learning process.

References

  1. [n.d.]. Radon. https://github.com/rubik/radon.Google ScholarGoogle Scholar
  2. Craig Anslow, John Brosz, Frank Maurer, and Mike Boyes. 2016. Datathons: an experience report of data hackathons for data science education. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 615–620.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Elena García Barriocanal, Miguel-Ángel Sicilia Urbán, Ignacio Aedo Cuevas, and Paloma Díaz Pérez. 2002. An experience in integrating automated unit testing practices in an introductory programming course. ACM SIGCSE Bulletin 34, 4 (2002), 125–128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Robert J Brunner and Edward J Kim. 2016. Teaching data science. Procedia Computer Science 80 (2016), 1947–1956.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Longbing Cao. 2018. Data Science Thinking. In Data Science Thinking. Springer, 59–90.Google ScholarGoogle Scholar
  6. Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Richard D De Veaux, Mahesh Agarwal, Maia Averett, Benjamin S Baumer, Andrew Bray, Thomas C Bressoud, Lance Bryant, Lei Z Cheng, Amanda Francis, Robert Gould, 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application 4 (2017), 15–30.Google ScholarGoogle ScholarCross RefCross Ref
  8. Nicholas Diana, Michael Eagle, John Stamper, Shuchi Grover, Marie Bienkowski, and Satabdi Basu. 2017. An instructor dashboard for real-time analytics in interactive programming assignments. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference. 272–279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tomáš Effenberger, Jaroslav Cechák, and Radek Pelánek. 2019. Difficulty and Complexity of Introductory Programming Problems. (2019).Google ScholarGoogle Scholar
  10. Tomáš Effenberger, Jaroslav Čechák, and Radek Pelánek. 2019. Measuring Difficulty of Introductory Programming Tasks. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale. 1–4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Seth Copen Goldstein, Hongyi Zhang, Majd Sakr, Haokang An, and Cameron Dashti. 2019. Understanding how work habits influence student performance. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 154–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Maurice Howard Halstead 1977. Elements of software science. Vol. 7. Elsevier New York.Google ScholarGoogle Scholar
  13. Erik Harpstead and Vincent Aleven. 2015. Using empirical learning curve analysis to inform design in an educational game. In Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play. 197–207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charles R Harris, K Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357–362.Google ScholarGoogle Scholar
  15. Petri Ihantola and Andrew Petersen. 2019. Code complexity in introductory programming courses. In Proceedings of the 52nd Hawaii International Conference on System Sciences.Google ScholarGoogle ScholarCross RefCross Ref
  16. Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 ITiCSE on Working Group Reports. 41–63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87–90.Google ScholarGoogle Scholar
  18. Pardha Koyya, Young Lee, and Jeong Yang. 2013. Feedback for programming assignments using software-metrics and reference code. International Scholarly Research Notices 2013 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  19. Sean Kross and Philip J Guo. 2019. Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrew Luxton-Reilly and Andrew Petersen. 2017. The compound nature of novice programming assessments. In Proceedings of the Nineteenth Australasian Computing Education Conference. 26–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sohail Iqbal Malik. 2018. Improvements in introductory programming course: action research insights and outcomes. Systemic Practice and Action Research 31, 6 (2018), 637–656.Google ScholarGoogle ScholarCross RefCross Ref
  22. Samiha Marwan, Joseph Jay Williams, and Thomas Price. 2019. An Evaluation of the Impact of Automated Programming Hints on Performance and Learning. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 61–70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wes McKinney 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56.Google ScholarGoogle ScholarCross RefCross Ref
  25. Huy Nguyen, Yeyu Wang, John Stamper, and Bruce M McLaren. 2019. Using Knowledge Component Modeling to Increase Domain Understanding in a Digital Learning Game.International Educational Data Mining Society (2019).Google ScholarGoogle Scholar
  26. Vu Nguyen, Sophia Deeds-Rubin, Thomas Tan, and Barry Boehm. 2007. A SLOC counting standard. In Cocomo ii forum, Vol. 2007. Citeseer, 1–16.Google ScholarGoogle Scholar
  27. Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, and Arnab Bhattacharya. 2017. Automatic grading and feedback using program repair for introductory programming courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education. 92–97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thomas Price, Baker Franke, Shuchi Grover, and Monica M McGill. 2020. Using Data to Inform Computing Education Research and Practice. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 175–176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Keith Quille and Susan Bergin. 2019. CS1: how will they do? How can we help? A decade of research and practice. Computer Science Education 29, 2-3 (2019), 254–282.Google ScholarGoogle ScholarCross RefCross Ref
  30. Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27, 1(2017), 37–64.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jeffrey Saltz and Robert Heckman. 2016. Big Data science education: A case study of a project-focused introductory course. Themes in science and technology education 8, 2 (2016), 85–94.Google ScholarGoogle Scholar
  32. Jeffrey S Saltz, Neil I Dewar, and Robert Heckman. 2018. Key concepts for a data science ethics curriculum. In Proceedings of the 49th ACM technical symposium on computer science education. 952–957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.Google ScholarGoogle ScholarCross RefCross Ref
  34. John C Stamper and Kenneth R Koedinger. 2011. Human-machine student model discovery and improvement using DataShop. In International Conference on Artificial Intelligence in Education. Springer, 353–360.Google ScholarGoogle ScholarCross RefCross Ref
  35. Rong Tang and Watinee Sae-Lim. 2016. Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information 32, 3 (2016), 269–290.Google ScholarGoogle ScholarCross RefCross Ref
  36. Leo C Ureel II and Charles Wallace. 2019. Automated Critique of Early Programming Antipatterns. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 738–744.Google ScholarGoogle Scholar
  37. Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272.Google ScholarGoogle Scholar
  1. Exploring Metrics for the Analysis of Code Submissions in an Introductory Data Science Course

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      LAK21: LAK21: 11th International Learning Analytics and Knowledge Conference
      April 2021
      645 pages
      ISBN:9781450389358
      DOI:10.1145/3448139

      Copyright © 2021 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 April 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate236of782submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format