Abstract
Background and Context. Students’ programming projects are often assessed on the basis of their tests as well as their implementations, most commonly using test adequacy criteria like branch coverage, or, in some cases, mutation analysis. As a result, students are implicitly encouraged to use these tools during their development process (i.e., so they have awareness of the strength of their own test suites).
Objectives. Little is known about how students choose test cases for their software while being guided by these feedback mechanisms. We aim to explore the interaction between students and commonly used testing feedback mechanisms (in this case, branch coverage and mutation-based feedback).
Method. We use grounded theory to explore this interaction. We conducted 12 think-aloud interviews with students as they were asked to complete a series of software testing tasks, each of which involved a different feedback mechanism. Interviews were recorded and transcripts were analyzed, and we present the overarching themes that emerged from our analysis.
Findings. Our findings are organized into a process model describing how students completed software testing tasks while being guided by a test adequacy criterion. Program comprehension strategies were commonly employed to reason about feedback and devise test cases. Mutation-based feedback tended to be cognitively overwhelming for students, and they resorted to weaker heuristics in order to address this feedback.
Implications. In the presence of testing feedback, students did not appear to consider problem coverage as a testing goal so much as program coverage. While test adequacy criteria can be useful for assessment of software tests, we must consider whether they represent good goals for testing, and if our current methods of practice and assessment are encouraging poor testing habits.
- [1] . 2010. Mutation analysis vs. code coverage in automated assessment of students’ testing skills. In Proceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion (OOPSLA’10). Association for Computing Machinery, 153–160.
DOI: Google ScholarDigital Library - [2] . 2008. Introduction to Software Testing. Cambridge University Press.
2007035077 Retrieved from https://books.google.com/books?id=BMbaAAAAMAAJGoogle ScholarDigital Library - [3] . 2022. Effective Software Testing: A Developer’s Guide. Manning, Shelter Island, NY.Google Scholar
- [4] . 2019. Pragmatic software testing education. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE’19). ACM, 414–420.
DOI: Google ScholarDigital Library - [5] . 2021. How developers engineer test cases: An observational study. IEEE Transactions on Software Engineering 48, 12 (2021), 4925–4946
DOI: Google ScholarCross Ref - [6] . 2015. 16.9 — Rainfall Problem — AP CS Principles — Student Edition. Retrieved September 30, 2023 from https://runestone.academy/ns/books/published/StudentCSP/CSPIntroData/rainfall.htmlGoogle Scholar
- [7] . 2021. How students unit test: Perceptions, practices, and pitfalls. In Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1 (ITiCSE’21). Association for Computing Machinery, 248–254.
DOI: Google ScholarDigital Library - [8] . 2020. Qualitative analyses of movements between task-level and code-level thinking of novice programmers. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, 487–493.
DOI: Google ScholarDigital Library - [9] . 2019. Grounded theory research: A design framework for novice researchers. SAGE Open Medicine 7 (2019), 2050312118822927. Google ScholarCross Ref
- [10] . 2000. QuickCheck: A lightweight tool for random testing of haskell programs. SIGPLAN Notices 35, 9(2000), 268–279.
DOI: Google ScholarDigital Library - [11] . 2014. Integrating testing into software engineering courses supported by a collaborative learning environment. ACM Transactions on Computing Education 14, 3, (2014), 33 pages.
DOI: Google ScholarDigital Library - [12] . 2017. Impact of using tools in an undergraduate software testing course supported by WReSTT. ACM Transactions on Computing Education 17, 4 (2017), 28 pages.
DOI: Google ScholarDigital Library - [13] . 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin Boston. Google Scholar
- [14] . 2021. A comparison of inquiry-based conceptual feedback vs. traditional detailed feedback mechanisms in software testing education: An empirical investigation. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (SIGCSE’21). Association for Computing Machinery, 87–93.
DOI: Google ScholarDigital Library - [15] . 2002. The roles beacons play in comprehension for novice and expert programmers. In Proceedings of the PPIG. 5.Google Scholar
- [16] . 2014. Designing deletion mutation operators. In Proceedings of the 2014 IEEE 7th International Conference on Software Testing, Verification, and Validation. IEEE, 11–20.
DOI: Google ScholarDigital Library - [17] . 1978. Hints on test data selection: Help for the practicing programmer. Computer 11, 4(1978), 34–41.
DOI: Google ScholarDigital Library - [18] . 2014. Analysis of mutation operators for the python language. In Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. Brunów, Poland, , , , , and (Eds.), Springer International Publishing, Cham, 155–164. Google ScholarCross Ref
- [19] . 2023. Set the right example when teaching programming: Test Informed Learning with Examples (TILE). In Proceedings of the 2023 IEEE Conference on Software Testing, Verification and Validation (ICST). 269–280.
DOI: Google ScholarCross Ref - [20] . 2021. Exploring students’ sensemaking of test case design. An initial study. In Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). 1069–1078.
DOI: Google ScholarCross Ref - [21] . 2019. Experiences using heat maps to help students find their bugs: Problems and solutions. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE’19). Association for Computing Machinery, 260–266.
DOI: Google ScholarDigital Library - [22] . 2020. Turn up the heat! using heat maps to visualize suspicious code to help students successfully complete programming problems faster. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET’20). Association for Computing Machinery, 34–44.
DOI: Google ScholarDigital Library - [23] . 2004. Using software testing to move students from trial-and-error to reflection-in-action. ACM SIGCSE Bulletin 36, 1(2004), 26.
DOI: Google ScholarDigital Library - [24] . 2014. Comparing test quality measures for assessing student-written tests. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, 354–363.
DOI: Google ScholarDigital Library - [25] . 2014. Comparing test quality measures for assessing student-written tests. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). Association for Computing Machinery, 354–363.
DOI: Google ScholarDigital Library - [26] . 2012. Running students’ software tests against each others’ code: New life for an old “Gimmick”. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE’12). Association for Computing Machinery, 221–226.
DOI: Google ScholarDigital Library - [27] . 2021. Towards human-like automated test generation: Perspectives from cognition and problem solving. In Proceedings of the 2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 123–124.
DOI: Google ScholarCross Ref - [28] . 2020. Towards a model of testers’ cognitive processes: Software testing as a problem solving approach. In Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C). 272–279.
DOI: Google ScholarCross Ref - [29] . 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine. Google Scholar
- [30] . 2002. A gimmick to integrate software testing throughout the curriculum. ACM SIGCSE Bulletin 34, 1(2002), 271–275.
DOI: Google ScholarDigital Library - [31] . 1975. Toward a theory of test data selection. IEEE Transactions on Software Engineering SE-1, 2 (1975), 156–173.
DOI: Google ScholarDigital Library - [32] . 2014. Mutations: How close are they to real faults?. In Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering. 189–200.
DOI: Google ScholarDigital Library - [33] . 2022. Evaluating the quality of student-written software tests with curated mutation analysis. In Proceedings of the 2022 ACM SIGPLAN International Symposium on SPLASH-E (SPLASH-E 2022). Association for Computing Machinery, 24–34.
DOI: Google ScholarDigital Library - [34] . 2015. How effective are code coverage criteria?. In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security. 151–156.
DOI: Google ScholarDigital Library - [35] . 2013. Exploring Code Coverage in Software Testing and its Correlation with Software Quality; A Systematic Literature Review. Bachelor’s Thesis. University of Gothenburg, 405 30 Gothenburg, Sweden.Google Scholar
- [36] . 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 435–445.
DOI: Google ScholarDigital Library - [37] . 2008. Test-driven learning in early programming courses. ACM SIGCSE Bulletin 40, 1(2008), 532–536.
DOI: Google ScholarDigital Library - [38] . 2000. Software testing in the computer science curriculum – a holistic approach. In Proceedings of the Australasian Conference on Computing Education (ACSE’00). ACM, 153–157.
DOI: Google ScholarDigital Library - [39] . 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, 654–665.
DOI: Google ScholarDigital Library - [40] . 2004. Perceived achievement goal structure and college student help seeking. Journal of Educational Psychology 96, 3 (2004), 569–581.
DOI: Google ScholarCross Ref - [41] . 2021. Fast and accurate incremental feedback for students’ software tests using selective mutation analysis. Journal of Systems and Software 175, 110905 (2021). https://www.sciencedirect.com/science/article/pii/S0164121221000029?via%3DihubGoogle ScholarCross Ref
- [42] . 2017. Quantifying incremental development practices and their relationship to procrastination. In Proceedings of the 2017 ACM Conference on International Computing Education Research (ICER’17). Association for Computing Machinery, 191–199.
DOI: Google ScholarDigital Library - [43] . 2019. Assessing incremental testing practices and their impact on project outcomes. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE’19). Association for Computing Machinery, 407–413.
DOI: Google ScholarDigital Library - [44] . 1991. A fortran language system for mutation-based software testing. Journal of Software: Practice and Experience 21, 7 (1991), 685–718.
DOI: Google ScholarDigital Library - [45] . 2016. Analysing and comparing the effectiveness of mutation testing tools: A manual study. In Proceedings of the 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). 147–156.
DOI: Google ScholarCross Ref - [46] . 2015. Code coverage and test suite effectiveness: Empirical study with real bugs in large systems. In Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 560–564.
DOI: Google ScholarCross Ref - [47] . 2004. pytest 6.2.2. Retrieved from https://github.com/pytest-dev/pytest. Accessed December 4, 2023.Google Scholar
- [48] . 2014. Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation. IEEE Transactions on Software Engineering 40, 1 (2014), 23–42.
DOI: Google ScholarDigital Library - [49] . 2001. A multi-national, multi-institutional study of assessment of programming skills of first-year CS students. In Proceedings of the Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-WGR’01). Association for Computing Machinery, 125–180.
DOI: Google ScholarDigital Library - [50] . 2022. Understanding similar code through comparative comprehension. In Proceedings of the 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 1–11.
DOI: Google ScholarCross Ref - [51] . 2012. The Art of Software Testing (3rd. ed.). John Wiley and Sons, Hoboken and N.J.Google ScholarCross Ref
- [52] . 1996. An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering and Methodology 5, 2(1996), 99–118.
DOI: Google ScholarDigital Library - [53] . 1996. Subsumption of Condition Coverage Techniques by Mutation Testing. Technical Report ISSE-TR-96-01. Information and Software Systems Engineering, George Mason University.Google Scholar
- [54] . 1987. Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychology 19, 3 (1987), 295–341.
DOI: Google ScholarCross Ref - [55] . 2021. Does mutation testing improve testing practices?. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 910–921.
DOI: Google ScholarDigital Library - [56] . 2019. First things first: Providing metacognitive scaffolding for interpreting problem prompts. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE’19). Association for Computing Machinery, 531–537.
DOI: Google ScholarDigital Library - [57] . 2002. An empirical analysis of roles of variables in novice-level procedural programs. In Proceedings of the IEEE 2002 Symposia on Human Centric Computing Languages and Environments. 37–39.
DOI: Google ScholarCross Ref - [58] . 2005. An experiment on using roles of variables in teaching introductory programming. Computer Science Education 15, 1 (2005), 59–82.
DOI: Google ScholarCross Ref - [59] . 2019. Software testing in introductory programming courses: A systematic mapping study. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE’19). Association for Computing Machinery, 421–427.
DOI: Google ScholarDigital Library - [60] . 2008. Sufficient mutation operators for measuring test effectiveness. In Proceedings of the 30th International Conference on Software Engineering (ICSE’08). ACM, 351–360.
DOI: Google ScholarDigital Library - [61] . 2006. Helping students appreciate test-driven development (tdd). In Proceedings of the Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’06). Association for Computing Machinery, 907–913.
DOI: Google ScholarDigital Library - [62] . 1997. Grounded Theory in Practice. Sage. Google Scholar
- [63] . 2016. Relating code coverage, mutation score and test suite reducibility to defect density. In Proceedings of the 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 174–179.
DOI: Google ScholarCross Ref - [64] . 2023. A think-aloud study of novice debugging. ACM Transactions on Computing Education 23, 2, (2023), 38 pages.
DOI: Google ScholarDigital Library - [65] . 2022. Executable Examples: Empowering Students to Hone Their Problem Comprehension. Ph. D. Dissertation. Brown University.Google Scholar
- [66] . 2019. Executable examples for programming problem comprehension. In Proceedings of the 2019 ACM Conference on International Computing Education Research (ICER’19). Association for Computing Machinery, 131–139.
DOI: Google ScholarDigital Library - [67] . 2018. Who tests the testers?. In Proceedings of the 2018 ACM Conference on International Computing Education Research (ICER’18). ACM, 51–59.
DOI: Google ScholarDigital Library - [68] . 2020. Using relational problems to teach property-based testing. The Art, Science, and Engineering of Programming 5, 2(2020). DOI: https://programming-journal.org/2021/5/9/Google ScholarCross Ref
Index Terms
- A Model of How Students Engineer Test Cases With Feedback
Recommendations
A Static Approach to Prioritizing JUnit Test Cases
Test case prioritization is used in regression testing to schedule the execution order of test cases so as to expose faults earlier in testing. Over the past few years, many test case prioritization techniques have been proposed in the literature. Most ...
Checked Coverage and Object Branch Coverage: New Alternatives for Assessing Student-Written Tests
SIGCSE '15: Proceedings of the 46th ACM Technical Symposium on Computer Science EducationMany educators currently use code coverage metrics to assess student-written software tests. While test adequacy criteria such as statement or branch coverage can also be used to measure the thoroughness of a test suite, they have limitations. Coverage ...
Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction
Software testing is a critical part of software development. As new test cases are generated over time due to software modifications, test suite sizes may grow significantly. Because of time and resource constraints for testing, test suite minimization ...
Comments