Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension

Feitelson, Dror G.

doi:10.1007/s10664-022-10160-3

Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension

Published: 23 June 2022

Volume 27, article number 123, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Dror G. Feitelson ORCID: orcid.org/0000-0002-2733-7709¹

511 Accesses
5 Citations
Explore all metrics

Abstract

Understanding program code is a complicated endeavor. As a result, studying code comprehension is also hard. The prevailing approach for such studies is to use controlled experiments, where the difference between treatments sheds light on factors which affect comprehension. But it is hard to conduct controlled experiments with human developers, and we also need to find a way to operationalize what “comprehension” actually means. In addition, myriad different factors can influence the outcome, and seemingly small nuances may be detrimental to the study’s validity. In order to promote the development and use of sound experimental methodology, we discuss both considerations which need to be applied and potential problems that might occur, with regard to the experimental subjects, the code they work on, the tasks they are asked to perform, and the metrics for their performance. A common thread is that decisions that were taken in an effort to avoid one threat to validity may pose a larger threat than the one they removed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Article 21 May 2019

Questionnaire Design

Notes

They used 999999, which today looks unjustifiable; it should have been MAXINT.
For example, GazeRecorder https://gazerecorder.com/.
https://insights.stackoverflow.com/survey/2021

References

Abbes M, Khomh F, Guéhéneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 15th European Conf. Softw. Maintenance & Reengineering. https://doi.org/10.1109/CSMR.2011.24, pp 181–190
Abrahão S, Gravino C, Insfran E, Scanniello G, Tortora G (2013) Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: Results from a family of five experiments. IEEE Trans Softw Eng 39 (3):327–342. https://doi.org/10.1109/TSE.2012.27
Article Google Scholar
Adams WK, Wieman CE (2011) Development and validation of instruments to measure learning of expert-like thinking. Intl J Science Education 33 (9):1289–1312. https://doi.org/10.1080/09500693.2010.512369 https://doi.org/10.1080/09500693.2010.512369
Article Google Scholar
Ajami S, Woodbridge Y, Feitelson DG (2019) Syntax, predicates, idioms — what really affects code complexity?. Empirical Softw Eng 24(1):287–328. https://doi.org/10.1007/s10664-018-9628-3
Article Google Scholar
Arnaoudova V, Di Penta M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empirical Softw Eng 21(1):104–158. https://doi.org/10.1007/s10664-014-9350-8
Article Google Scholar
Avidan E, Feitelson DG (2017) Effects of variable names on comprehension: An empirical study. In: 25th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2017.27, pp 55–65
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng SE-13(12):1278–1296. https://doi.org/10.1109/TSE.1987.232881
Article Google Scholar
Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng SE-12(7):733–743. https://doi.org/10.1109/TSE.1986.6312975
Article Google Scholar
Basili VR, Zelkowitz MV (2007) Empirical studies to build a science of computer science. Comm ACM 50(11):33–37. https://doi.org/10.1145/1297797.1297819
Article Google Scholar
Bauer J, Siegmund J, Peitek N, Hofmeister JC, Apel S (2019) Indentation: Simply a matter of style or support for program comprehension?. In: 27th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2019.00033, pp 154–164
Bednarik R, Myller N, Sutinen E, Tukiainen M (2005) Effects of experience on gaze behavior during program animation. In: 17th workshop of psychology of programming interest group, pp 49–61
Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing porgram comprehension processes. In: 4th Symp. Eye Tracking Res. & App. https://doi.org/10.1145/1117309.1117356, pp 125–132
Bednarik R et al (2020) EMIP: The eye movements in programming dataset. Sci Comput Programming 198:102520. https://doi.org/10.1016/j.scico.2020.102520
Article Google Scholar
Beniamini G, Gingichashvili S, Klein Orbach A, Feitelson DG (2017) Meaningful identifier names: The case of single-letter variables. In: 25th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2017.18, pp 45–54
Bergersen GR, Gustafsson J.-E. (2011) Programming skill, knowledge, and working memory among professional software developers from an investment theory perspective. J Individual Differences 32(4):201–209. https://doi.org/10.1027/1614-0001/a000052
Article Google Scholar
Bergersen GR, Hannay JE, Sjøberg DIK, Dybå T., Karahasanović A (2011) Inferring skill from tests of programming performance: Combining time and quality. In: 5th Intl. Symp. Empirical Softw. Eng. & Measurement. https://doi.org/10.1109/ESEM.2011.39, pp 305–314
Bergersen GR, Sjøberg DIK (2012) Evaluating methods and technologies in software engineering with respect to developer’s skill level. In: 16th Intl. Conf. Evaluation & Assessment in Softw. Eng. https://doi.org/10.1049/ic.2012.0013, pp 101–110
Bergersen GR, Sjøberg DIK, Dybå T (2014) Construction and validation of an instrument for measuring programming skill. IEEE Trans Softw Eng 40(12):1163–1184. https://doi.org/10.1109/TSE.2014.2348997
Article Google Scholar
Biggerstaff TJ (1989) Design recovery for maintenance and reuse. Computer 22(7):36–49. https://doi.org/10.1109/2.30731
Article Google Scholar
Bishop B, McDaid K (2008) Spreadsheet debugging behaviour of expert and novice end-users. In: 4th Intl. Workshop End-User Software Engineering. https://doi.org/10.1145/1370847.1370860, pp 56–60
Brooks FP Jr (1987) No silver bullet: Essence and accidents of software engineering. Computer 20(4):10–19. https://doi.org/10.1109/MC.1987.1663532
Article MathSciNet Google Scholar
Brooks R (1983) Towards a theory of the comprehension of computer programs. Intl J Man-Machine Studies 18(6):543–554. https://doi.org/10.1016/S0020-7373(83)80031-5
Article Google Scholar
Brooks RE (1980) Studying programmer behavior experimentally: The problems of proper methodology. Comm ACM 23(4):207–213. https://doi.org/10.1145/358841.358847
Article Google Scholar
Buse RPL, Weimer WR (2008) A metric for software readability. Intl. Symp. Softw. Testing & Analysis, 121–130. https://doi.org/10.1145/1390630.1390647
Buse RPL, Weimer WR (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558. https://doi.org/10.1109/TSE.2009.70
Article Google Scholar
Busjahn T, Bednarik R, Begel A, Crosby M, Paterson JH, Schulte C, Sharif B, Tamm S (2015) Eye movements in code reading: Relaxing the linear order. In: 23rd Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2015.36, pp 255–265
Campbell JP, McCloy RA, Oppler SH, Sager CE (1993) A theory of performance. In: Schmitt N, Borman WC, Associates (eds) Personnel Selection in Organizations. Jossey-Bass Pub, pp 35–70
Carver J, Shull F, Basili V (2003) Observational studies to accelerate process experience in classroom studies: An evaluation. In: Intl. Symp. Empirical Softw. Eng. https://doi.org/10.1109/ISESE.2003.1237966, pp 72–79
Carver JC, Jaccheri L, Morasca S, Shull F (2010) A checklist for integrating student empirical studies with research and teaching goals. Empirical Softw Eng 15(1):35–59. https://doi.org/10.1007/s10664-009-9109-9
Article Google Scholar
Cates R, Yunik N, Feitelson DG (2021) Does code structure affect comprehension? on using and naming intermediate variables. In: 29th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC52881.2021.00020, pp 118–126
Ceccato M, Di Penta M, Falcarin P, Ricca F, Torchiano M, Tonella P (2014) A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical Softw Eng 19(4):1040–1074. https://doi.org/10.1007/s10664-013-9248-x
Google Scholar
Cherubini M, Venolia G, DeLine R, Ko AJ (2007) Let’s go to the whiteboard: How and why software developers use drawings. In: SIGCHI Conf. Human Factors in Comput. Syst. https://doi.org/10.1145/1240624.1240714, pp 557–566
Chikofsky EJ, Cross II JH (1990) Reverse engineering and design recovery: A taxonomy. IEEE Softw 7(1):13–17. https://doi.org/10.1109/52.43044
Article Google Scholar
Cook C, Bregar W, Foote D (1984) A preliminary investigation of the use of the cloze procedure as a measure of program understanding. Inf Process & Management 20(1–2):199–208. https://doi.org/10.1016/0306-4573(84)90050-5
Article Google Scholar
Cornelissen B, Zaidman A, van Deursen A, Moonen L, Koschke R (2009) A systematic survey of program comprehension through dynamic analysis. IEEE Trans Softw Eng 35(5):684–702. https://doi.org/10.1109/TSE.2009.28
Article Google Scholar
Couceiro R, Duarte G, Durães J., Castelhano J, Duarte C, Teixeira C, Castelo Branco M, de Carvalho P, Madeira H (2019) Biofeedback augmented software engineering: Monitoring of programmers’ mental effort. In: 41st Intl. Conf. Softw. Eng. https://doi.org/10.1109/ICSE-NIER.2019.00018. (NIER track)., pp 37–40
Crosby ME, Scholtz J, Wiedenbeck S (2002) The roles beacons play in comprehension for novice and expert programmers. In: 14th workshop psychology of programming interest group, pp 58–73
Curtis B (1981) Substantiating programmer variability. Proc IEEE 69(7):846. https://doi.org/10.1109/PROC.1981.12088
Article Google Scholar
Curtis B (2014) A career spent wading through industry’s empirical ooze. In: 2nd Intl. Workshop Conducting Empirical Studies in Industry. https://doi.org/10.1145/2593690.2593699, pp 1–2
Denaro G, Pezzè M (2002) An empirical evaluation of fault-proneness models. In: 24th Intl. Conf. Softw. Eng. https://doi.org/10.1145/581339.581371, pp 241–251
Dijkstra EW (1968) Go To statement considered harmful. Comm ACM 11(3):147–148. https://doi.org/10.1145/362929.362947
Article MathSciNet Google Scholar
Dreyfus SE, Dreyfus HL (1980) A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. Tech. Rep. ORC-80-2, Operations Research Center. University of California, Berkeley
Google Scholar
DuBay WH (2004) The principles of readability. http://www.impact-information.com/impactinfo/readability02.pdf
Dunsmore A, Roper M (2000) A Comparative Evaluation of Program Comprehension Measures. Tech. Rep. EFoCS-35-2000. University of Strathclyde, Glasgow
Google Scholar
Dunsmore A, Roper M, Wood M (2000) The role of comprehension in software inspection. J Syst Softw 52(2–3):121–129. https://doi.org/10.1016/S0164-1212(99)00138-7
Article Google Scholar
Ericsson KA, Krampe RT, Tesch-Römer C (1993) The role of deliberate practice in the acquisition of expert performance. Psychological Rev 100 (3):363–406. https://doi.org/10.1037/0033-295X.100.3.363
Article Google Scholar
Ericsson KA, Prietula MJ, Cokely ET (2007) The making of an expert. Harvard Business Rev, Massachusetts
Google Scholar
Etgar A, Friedman R, Haiman S, Perez D, Feitelson DG (2022) The effect of information content and length on name recollection. In: 30th Intl Conf Program Comprehension. https://doi.org/10.1145/3524610.3529159
Fakhoury S, Roy D, Ma Y, Arnaoudova V, Adesope O (2020) Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization. Empirical Softw Eng 25(3):2140–2178. https://doi.org/10.1007/s10664-019-09751-4
Article Google Scholar
Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2018) Empirical software engineering experts on the use of students and professionals in experiments. Empirical Softw Eng 23(1):452–489. https://doi.org/10.1007/s10664-017-9523-3
Article Google Scholar
Feigenspan J, Apel S, Liebig J, Kästner C (2011) Exploring software measures to assess program comprehension. In: Intl. Symp. Empirical Softw. Eng. & Measurement. https://doi.org/10.1109/ESEM.2011.21, pp 127–136
Feitelson DG (2015) Using students as experimental subjects in software engineering research – a review and discussion of the evidence. arXiv:1512.08409 [cs.SE]
Feitelson DG (2021) Considerations and pitfalls in controlled experiments on code comprehension. In: 29th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC52881.2021.00019, pp 106–117
Feitelson DG, Mizrahi A, Noy N, Ben Shabat A, Eliyahu O, Sheffer R (2022) How developers choose names. IEEE Trans Softw Eng 48(1):37–52. https://doi.org/10.1109/TSE.2020.2976920
Article Google Scholar
Floyd B, Santander T, Weimer W (2017) Decoding the representation of code in the brain: An fMRI study of code review and expertise. In: 39th Intl Conf Softw Eng. https://doi.org/10.1109/ICSE.2017.24, pp 175–186
Fowler M (2019) Refactoring: Improving the Design of Existing Code, 2nd edn. Pearson Education Inc, Boston
MATH Google Scholar
Fritz T, Begel A, Muller̈ SC, Yigit-Elliott S, Züger M. (2014) Using psycho-physiological measures to assess task difficulty in software development. In: 36th Intl Conf Softw Eng. https://doi.org/10.1145/2568225.2568266, pp 402–413
Geffen Y, Maoz S (2016) On method ordering. In: 24th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2016.7503711
Gil Y, Lalouche G (2017) On the correlation between size and metric validity. Empirical Softw Eng 22(5):2585–2611. https://doi.org/10.1007/s10664-017-9513-5
Article Google Scholar
Gopstein D, Iannacone J, Yan Y, DeLong L, Zhuang Y, Yeh MK-C, Cappos J (2017) Understanding misunderstanding in source code. In: 11th ESEC/FSE. https://doi.org/10.1145/3106237.3106264, pp 129–139
Graziotin D, Fagerholm F, Wang X, Abrahamsson P (2018) What happens when software developers are (un)happy. J Syst Softw 140:32–47. https://doi.org/10.1016/j.jss.2018.02.041
Article Google Scholar
Graziotin D, Wang X, Abrahamsson P (2014) Software developers, moods, emotions, and performance. IEEE Softw 31(4):24–27. https://doi.org/10.1109/MS.2014.94
Article Google Scholar
Graziotin D, Wang X, Abrahamsson P (2015) How do you feel, developer? an explanatory theory of the impact of affects on programming performance. peerJ Comput Sci 1:e18. https://doi.org/10.7717/peerj-cs.18
Article Google Scholar
Hannay JE (2011). Personality, intelligence, and expertise: Impacts on software development. In: Oram A., Wilson G. (eds) Making Software, pp 79–110. O'Reilly Media Inc, Massachusetts
Hannebauer C, Hesenius M, Gruhn V (2018) Does syntax highlighting help programming novices?. Empirical Softw Eng 23 (5):2795–2828. https://doi.org/10.1007/s10664-017-9579-0
Article Google Scholar
Heathcote A, Brown S, Mewhort DJK (2000) The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review 7 (2):185–207. https://doi.org/10.3758/BF03212979
Article Google Scholar
Hofmeister JC, Siegmund J, Holt DV (2019) Shorter identifier names take longer to comprehend. Empirical Softw Eng 24(1):417–443. https://doi.org/10.1007/s10664-018-9621-x
Article Google Scholar
Hollmann N, Hanenberg S (2017) An empirical study on the readability of regular expressions: Textual versus graphical. In: Working Conf. Softw. Visualization. https://doi.org/10.1109/VISSOFT.2017.27, pp 74–84
Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, O’Reilly U.-M., Bers MU, Fedorenko E (2020) Comprehension of computer code relies primarily on domain-general executive brain regions. eLife 9:e58906. https://doi.org/10.7554/eLife.58906
Article Google Scholar
Jansen AR, Blackwell AF, Marriott K (2003) A tool for tracking visual attention: The restricted focus viewer. Behavior Research Methods, Instruments, & Comput 35(1):57–69. https://doi.org/10.3758/BF03195497
Article MATH Google Scholar
Jbara A, Feitelson DG (2014) On the effect of code regularity on comprehension. In: 22nd Intl. Conf. Program Comprehension. https://doi.org/10.1145/2597008.2597140, pp 189–200
Jbara A, Feitelson DG (2017) How programmers read regular code: A controlled experiment using eye tracking. Empirical Softw Eng 22(3):1440–1477. https://doi.org/10.1007/s10664-016-9477-x
Article Google Scholar
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Intl Symp Empirical Softw Eng. https://doi.org/10.1109/ISESE.2005.1541818, pp 95–104
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer
Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. In: 5th Intl Conf Software Testing, Verification, & Validation. https://doi.org/10.1109/ICST.2012.113, pp 330–339
Kaczmarczyk LC, Petrick ER, East JP, Herman GL (2010) Identifying student misconceptions of programming. In: 41st SIGCSE Tech Symp Comput Sci Ed, pp 107–111
Kahneman D (1973) Attention and Effort. Prantice-Hall, Hoboken
Google Scholar
Ko AJ, LaToza TD, Burnett MM (2015) A practical guide to controlled experiments of software engineering tools with human participants. Empirical Softw Eng 20 (1):110–141. https://doi.org/10.1007/s10664-013-9279-3 https://doi.org/10.1007/s10664-013-9279-3
Article Google Scholar
Kruchten P (1995) The 4 + 1 view model of architecture. IEEE Softw 12(6):42–50. https://doi.org/10.1109/52.469759 https://doi.org/10.1109/52.469759
Article Google Scholar
Krueger R, Huang Y, Liu X, Santander T, Weimer W, Leach K (2020) Neurological divide: An fMRI study of prose and code writing. In: 42nd Intl Conf Softw Eng. https://doi.org/10.1145/3377811.3380348 https://doi.org/10.1145/3377811.3380348, pp 678–690
Lawrie D, Morrell C, Field H, Binkley D (2006) What’s in a name? a study of identifiers. In: 14th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2006.51, pp 3–12
Levy O, Feitelson DG (2021) Understanding large-scale software systems — structure and flows. Empirical Softw Eng 26(3):48. https://doi.org/10.1007/s10664-021-09938-8
Article Google Scholar
Lientz BP, Swanson EB, Tompkins GE (1978) Characteristics of application software maintenance. Comm ACM 21(6):466–471. https://doi.org/10.1145/359511.359522
Article Google Scholar
Littman DC, Pinto J, Letovsky S, Soloway E (1987) Mental models and software maintenance. J Syst Softw 7(4):341–355. https://doi.org/10.1016/0164-1212(87)90033-1
Article Google Scholar
Ma L, Ferguson J, Roper M, Wood M (2007) Investigating the viability of mental models held by novice programmers. In: 38th SIGCSE Symp Comput Sci Education. https://doi.org/10.1145/1227504.1227481 https://doi.org/10.1145/1227504.1227481, pp 499–503
Madison S, Gifford J (2002) Modular programming: Novice misconceptions. J Res Tech Ed 34(3):217–229. https://doi.org/10.1080/15391523.2002.10782346
Article Google Scholar
Martin RC (2009) Clean Code: A Handbook of Agile Software Craftmanship. Prentice Hall, Hoboken
Google Scholar
McCabe T (1976) A complexity measure. IEEE Trans Softw Eng SE-2(4):308–320. https://doi.org/10.1109/TSE.1976.233837 https://doi.org/10.1109/TSE.1976.233837
Article MathSciNet MATH Google Scholar
McKeithen KB, Reitman JS, Reuter HH, Hirtle SC (1981) Knowledge organization and skill differences in computer programmers. Cognitive Psychol 13(3):307–325. https://doi.org/10.1016/0010-0285(81)90012-8 https://doi.org/10.1016/0010-0285(81)90012-8
Article Google Scholar
McMeekin DA, von Konsky BR, Robey M, Cooper DJA (2009) The significance of participant experience when evaluating software inspection techniques. In: Australian Softw Eng Conf. https://doi.org/10.1109/ASWEC.2009.13, pp 200–209
Meyer B (1992) Applying “design by contract”. Computer 25 (10):40–51. https://doi.org/10.1109/2.161279
Article Google Scholar
Miara RJ, Musselman JA, Navarro JA, Shneiderman B (1983) Program indentation and comprehensibility. Comm ACM 26(11):851–867. https://doi.org/10.1145/182.358437
Article Google Scholar
Nagappan M, Robbes R, Kamei Y, Tanter E, McIntosh S, Mockus A, Hassan AE (2015) An empirical study of goto in C code from GitHub repositories. In: 10th ESEC/FSE. https://doi.org/10.1145/2786805.2786834, pp 404–414
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: 28th Intl Conf Softw Eng. https://doi.org/10.1145/1134285.1134349, pp 452–461
Newell A, Rosenbloom PS (1981) Mechanisms of skill acquisition and the law of practice. In: Anderson JR (ed) Cognitive skills and their acquisition. Lawrence Erlbaum Assoc, pp 1–55
Nyström M., Andersson R, Holmqvist K, van der Weijer J (2013) The influence of calibration method and eye physiology on eyetracking data quality. Behavioral Res Meth 45(1):272–288. https://doi.org/10.3758/s13428-012-0247-4 https://doi.org/10.3758/s13428-012-0247-4
Article Google Scholar
Obaidellah U, Al Haek M, Cheng PC-H (2018) A survey on the usage of eye-tracking in computer programming. ACM Comput Surv 51(1):5. https://doi.org/10.1145/3145904
Google Scholar
Oliveira D, Bruno R, Madeiral F, Castor F (2020) Evaluating code readability and legibility: An examination of human-centric studies. In: Intl Conf Softw. Maintenance & Evolution. https://doi.org/10.1109/ICSME46990.2020.00041, pp 348–359
Oman PW, Cook CR (1990) Typographic style is more than cosmetic. Comm ACM 33(5):506–520. https://doi.org/10.1145/78607.78611
Article Google Scholar
Orso A, Sinha S, Harrold MJ (2001) Effects of pointers on data dependences. In: 9th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2001.921712, pp 39–49
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Comm ACM 15(12):1053–1058. https://doi.org/10.1145/361598.361623
Article Google Scholar
Parnas DL, Clements PC, Weiss DM (1985) The modular structure of complex systems. IEEE Trans Softw Eng SE-11(3):259–266. https://doi.org/10.1109/TSE.1985.232209
Article Google Scholar
Paulson JW, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256. https://doi.org/10.1109/TSE.2004.1274044
Article Google Scholar
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B (2017) Evaluating and improving fault localization. 39th Intl Conf Softw Eng, 609–620. https://doi.org/10.1109/ICSE.2017.62
Pennington N (1987) Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychology 19(3):295–341. https://doi.org/10.1016/0010-0285(87)90007-7
Article Google Scholar
Politowski C, Khomh F, Romano S, Scanniello G, Petrillo F, Guéhéneuc Y-G, Maiga A (2020) A large scale empirical study of the impact of Spaghetti Code and Blob anti-patterns on program comprehension. InfSoftw Tech 122:106278. https://doi.org/10.1016/j.infsof.2020.106278
Article Google Scholar
Prechelt L (1999) Comparing Java vs. C/C++ efficiency differences to interpersonal differences. Comm ACM 42(10):109–112. https://doi.org/10.1145/317665.317683
Article Google Scholar
Purchase HC, Colpoys L, McGill M, Carrington D (2002) UML collaboration diagram syntax: An empirical study of comprehension. In: 1st Intl Workshop Visualizing Softw for Understanding & Analysis. https://doi.org/10.1109/VISSOF.2002.1019790, pp 13–22
Raghunathan S, Prasad A, Mishra BK, Chang H (2005) Open source versus closed source: Software quality in monopoly and competitive markets. IEEE Trans Syst Man Cybernetics 35(6):903–918. https://doi.org/10.1109/TSMCA.2005.853493
Article Google Scholar
Rajlich V, Cowan GS (1997) Towards standard for experiments in program comprehension. In: 5th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.1997.601284, pp 160–161
Rajlich V, Wilde N (2002) The role of concepts in program comprehension. In: 10th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2002.1021348, pp 271–278
Raymond ES (2000) The cathedral and the bazaar. https://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar
Roehm T, Tiarks R, Koschke R, Maalej W (2012) How do professional developers comprehend software?. In: 34th Intl Conf Softw Eng. https://doi.org/10.1109/ICSE.2012.6227188, pp 255–265
Sackman H, Erikson WJ, Grant EE (1968) Exploratory experimental studies comparing online and offline programming performance. Comm ACM 11 (1):3–11. https://doi.org/10.1145/362851.362858
Article Google Scholar
Salviulo F, Scanniello G (2014) Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In: 18th Intl Conf Evaluation & Assessment in Softw Eng, art. 48. https://doi.org/10.1145/2601248.2601251
Scalabrino S, Bavota G, Vendome C, Linares-Vśquez M, Poshyvanyk D, Oliveto R (2021) Automatically assessing code understandability. IEEE Trans Softw Eng 47(3):595–613. https://doi.org/10.1109/TSE.2019.2901468 https://doi.org/10.1109/TSE.2019.2901468.
Article Google Scholar
Scalabrino S, Linares-Vásquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: 24th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2016.7503707 https://doi.org/10.1109/ICPC.2016.7503707
Schankin A, Berger A, Holt DV, Hofmeister JC, Riedel T, Beigl M (2018) Descriptive compound identifier names improve source code comprehension. In: 26th Intl Conf Program Comprehension. https://doi.org/10.1145/3196321.3196332, pp 31–40
Schenk KD, Vitalari NP, Davis KS (1998) Differences between novice and expert systems analysts: What do we know and what do we do?. J Mgmt Inf Syst 15(1):9–50. https://doi.org/10.1080/07421222.1998.11518195 https://doi.org/10.1080/07421222.1998.11518195
Article Google Scholar
Shaffer TR, Wise JL, Walters BM, Muller̈ SC, Falcone M, Sharif B (2015) iTrace: Enabling eye tracking on software artifacts within the IDE to support software engineering tasks. In: ESEC/FSE. https://doi.org/10.1145/2786805.2803188, pp 954–957
Shaft TM, Vessey I (1998) The relevance of application domain knowledge: Characterizing the computer program comprehension process. J Mgmt Inf Syst 15(1):51–78. https://doi.org/10.1080/07421222.1998.11518196 https://doi.org/10.1080/07421222.1998.11518196
Article Google Scholar
Sharafi Z, Huang Y, Leach K, Weimer W (2021) Toward an objective measure of developers’ cognitive activities. ACM Trans Softw Eng Methodology 30 (3):30. https://doi.org/10.1145/3434643
Article Google Scholar
Sharafi Z, Sharif B, Guéhéneuc Y-G, Begel A, Bednarik R, Crosby M (2020) A practical guide on conducting eye tracking studies in software engineering. Empirical Softw Eng 25(5):3128–3174. https://doi.org/10.1007/s10664-020-09829-4
Article Google Scholar
Sharafi Z, Soh Z, Guéhéneuc Y-G (2015) A systematic litareture review on the usage of eye-tracking in software engineering. Inf Softw Tech 67:79–107. https://doi.org/10.1016/j.infsof.2015.06.008
Article Google Scholar
Sharafi Z, Soh Z, Guéhéneuc Y-G, Antoniol G (2012) Women and men — different but equal: On the impact of identifier style on source code reading. In: 20th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2012.6240505, pp 27–36
Sharif B, Maletic JI (2010) An eye tracking study on camelCase and under_score identifier styles. In: 18th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2010.41, pp 196–205
Sharma T, Spinellis D (2018) A survey of code smells. J Syst Softw 138:158–173. https://doi.org/10.1016/j.jss.2017.12.034 https://doi.org/10.1016/j.jss.2017.12.034
Article Google Scholar
Shneiderman B (1977) Measuring computer program quality and comprehension. Intl J Man-Machine Studies 9 (4):465–478. https://doi.org/10.1016/S0020-7373(77)80014-X
Article Google Scholar
Shneiderman B, Mayer R (1979) Syntactic/semantic interactions in programmer behavior: A model and experimental results. Intl J Comput Inf Syst 8 (3):219–238. https://doi.org/10.1007/BF00977789
Article MATH Google Scholar
Shull F, Singer J, Sjøberg DIK (eds) (2008) Guide to Advanced Empirical Software Engineering. Springer, Berlin
Siegmund J (2016) Program comprehension: Past, present, and future. In: 23rd Intl Conf Softw Analysis, Evolution, & Reengineering. https://doi.org/10.1109/SANER.2016.35, pp 13–20
Siegmund J, Kästner C, Apel S, Brechmann A, Saake G (2013) Experience from measuring program comprehension—toward a general framework. In: Kowalewski S, Rumpe B (eds) Software Engineering. Gesellschaft für Informatik e.V. LNI, vol P-213, pp 239–257
Siegmund J, Kästner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A (2014) Understanding understanding source code with functional magnetic resonance imaging. In: 36th Intl Conf Softw Eng. https://doi.org/10.1145/2568225.2568252, pp 378–389
Siegmund J, Kästner C, Liebig J, Apel S, Hanenberg S (2014) Measuring and modeling programming experience. Empirical Softw Eng 19(5):1299–1334. https://doi.org/10.1007/s10664-013-9286-4
Article Google Scholar
Siegmund J, Peitek N, Apel S, Siegmund N (2021) Mastering variation in human studies: The role of aggregation. ACM Trans Softw Eng Methodology 30(1):art. 2. https://doi.org/10.1145/3406544
Article Google Scholar
Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kästner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: 11th ESEC/FSE. https://doi.org/10.1145/3106237.3106268, pp 140–150
Siegmund J, Schumann J (2015) Confounding parameters on program comprehension: A literature survey. Empirical Softw Eng 20(4):1159–1192. https://doi.org/10.1007/s10664-014-9318-8
Article Google Scholar
Simon HA, Chase WG (1973) Skill in chess. American Scientist 61(4):394–403
Google Scholar
Sjøberg DIK, Anda B, Arisholm E, Dybå T, Jørgensen M, Karahasanovic A, Koren EF, Vokác M. (2002) Conducting realistic experiments in software engineering. In: Intl Symp Empirical Softw Eng. https://doi.org/10.1109/ISESE.2002.1166921, pp 17–26
Sjøberg DIK, Anda B, Arisholm E, Dybå T, Jørgensen M, Karahasanović A, Vokáč M (2003) Challenges and recommendations when increasing the realism of controlled software engineering experiments. In: Conradi R, Wang AI (eds) Empirical methods and studies in software engineering: experiences from ESERNET, Springer, pp 24–38. https://doi.org/10.1007/978-3-540-45143-3. Lect Notes Comput vol 2765
Sjøberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanović A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753. https://doi.org/10.1109/TSE.2005.97
Article Google Scholar
Smith M, Taffler R (1992) Readability and understandability: Different measures of the textual complexity of accounting narrative. Accounting, Audting & Accountability J 5(4):84–98. https://doi.org/10.1108/09513579210019549 https://doi.org/10.1108/09513579210019549
Google Scholar
Sochat VV, Eisenberg IW, Enkavi AZ, Li J, Bissett PG, Poldrack RA (2016) The experiment factory: Standardizing behavioral experiments. Frontiers in Psychology 7:art. 610. https://doi.org/10.3389/fpsyg.2016.00610 https://doi.org/10.3389/fpsyg.2016.00610
Article Google Scholar
Soloway E, Ehrlich K (1984) Empirical studies of programming knowledge. IEEE Trans Softw Eng SE-10(5):595–609. https://doi.org/10.1109/TSE.1984.5010283
Article Google Scholar
Sonnentag S (1998) Expertise in professional software design: A process study. J App Psychol 83(5):703–715. https://doi.org/10.1037/0021-9010.83.5.703
Article Google Scholar
Sonnentag S, Niessen C, Volmer J (2006) Expertise in software design. In: Ericsson KA, Charness N, Feltovich PJ, Hoffman RR (eds) The Cambridge Handbook of Expertise and Expert Performance. Cambridge University Press, pp 373–387
Spolsky J (2005) The perils of JavaSchools. https://www.joelonsoftware.com/2005/12/29/the-perils-of-javaschools-2, 29 Dec 2005
Stefik A, Siebert S (2013) An empirical investigation into programing language syntax. ACM Trans Computing Education 13(4) art. 19. https://doi.org/10.1145/2534973
Storey M-A (2005) Theories, methods and tools in program comprehension: Past, present and future. In: 13th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2005.38
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979) The Belmont report. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empirical Softw Eng 5 (4):309–312. https://doi.org/10.1023/A:1009844119158
Article Google Scholar
von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. Computer 28(8):44–55. https://doi.org/10.1109/2.402076
Article Google Scholar
von Mayrhauser A, Vans AM (1996) On the role of hypotheses during opportunistic understanding while porting large scale code. In: 4th Workshop Program Comprehension. https://doi.org/10.1109/WPC.1996.501122, pp 68–77
von Mayrhauser A, Vans AM (1998) Program understanding behavior during adaptation of large scale software. In: 6th Workshop Program Comprehension. https://doi.org/10.1109/WPC.1998.693345, pp 164–172
von Mayrhauser A, Vans AM, Howe AE (1997) Program understanding behavior during enhancement of large-scale software. J Softw Maintenance: Res Pract 9(5):299–327. https://doi.org/10.1002/(SICI)1096-908X(199709/10)9:5<299::AID-SMR157>3.0.CO;2-S
Article Google Scholar
Weiser M, Shertz J (1983) Programming problem representation in novice and expert programmers. Intl J Man-Machine Studies 19(4):391–398. https://doi.org/10.1016/S0020-7373(83)80061-3
Article Google Scholar
Weissman L (1974) Psychological complexity of computer programs: An experimental methodology. SIGPLAN Notices 9(6):25–36. https://doi.org/10.1145/953233.953237
Article Google Scholar
Wilson LA, Senin Y, Wang Y, Rajlich V (2019) Empirical study of phased model of software change. arXiv:1904:05842 [cs.SE]
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Springer, Berlin
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel
Dror G. Feitelson

Authors

Dror G. Feitelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dror G. Feitelson.

Additional information

Communicated by: Anita Sarma, Fabio Palomba and Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)

Dror Feitelson holds the Berthold Badler chair in Computer Science. This research was supported by the ISRAEL SCIENCE FOUNDATION (grant no. 832/18). This paper is an extended version of an “honorable mention” paper from ICPC 2021.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feitelson, D.G. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension. Empir Software Eng 27, 123 (2022). https://doi.org/10.1007/s10664-022-10160-3

Download citation

Accepted: 23 March 2022
Published: 23 June 2022
DOI: https://doi.org/10.1007/s10664-022-10160-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Questionnaire Design

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Questionnaire Design

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation