Abstract
Understanding program code is a complicated endeavor. As a result, studying code comprehension is also hard. The prevailing approach for such studies is to use controlled experiments, where the difference between treatments sheds light on factors which affect comprehension. But it is hard to conduct controlled experiments with human developers, and we also need to find a way to operationalize what “comprehension” actually means. In addition, myriad different factors can influence the outcome, and seemingly small nuances may be detrimental to the study’s validity. In order to promote the development and use of sound experimental methodology, we discuss both considerations which need to be applied and potential problems that might occur, with regard to the experimental subjects, the code they work on, the tasks they are asked to perform, and the metrics for their performance. A common thread is that decisions that were taken in an effort to avoid one threat to validity may pose a larger threat than the one they removed.
Similar content being viewed by others
Notes
They used 999999, which today looks unjustifiable; it should have been MAXINT.
For example, GazeRecorder https://gazerecorder.com/.
References
Abbes M, Khomh F, Guéhéneuc Y-G, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 15th European Conf. Softw. Maintenance & Reengineering. https://doi.org/10.1109/CSMR.2011.24, pp 181–190
Abrahão S, Gravino C, Insfran E, Scanniello G, Tortora G (2013) Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: Results from a family of five experiments. IEEE Trans Softw Eng 39 (3):327–342. https://doi.org/10.1109/TSE.2012.27
Adams WK, Wieman CE (2011) Development and validation of instruments to measure learning of expert-like thinking. Intl J Science Education 33 (9):1289–1312. https://doi.org/10.1080/09500693.2010.512369https://doi.org/10.1080/09500693.2010.512369
Ajami S, Woodbridge Y, Feitelson DG (2019) Syntax, predicates, idioms — what really affects code complexity?. Empirical Softw Eng 24(1):287–328. https://doi.org/10.1007/s10664-018-9628-3
Arnaoudova V, Di Penta M, Antoniol G (2016) Linguistic antipatterns: What they are and how developers perceive them. Empirical Softw Eng 21(1):104–158. https://doi.org/10.1007/s10664-014-9350-8
Avidan E, Feitelson DG (2017) Effects of variable names on comprehension: An empirical study. In: 25th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2017.27, pp 55–65
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng SE-13(12):1278–1296. https://doi.org/10.1109/TSE.1987.232881
Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng SE-12(7):733–743. https://doi.org/10.1109/TSE.1986.6312975
Basili VR, Zelkowitz MV (2007) Empirical studies to build a science of computer science. Comm ACM 50(11):33–37. https://doi.org/10.1145/1297797.1297819
Bauer J, Siegmund J, Peitek N, Hofmeister JC, Apel S (2019) Indentation: Simply a matter of style or support for program comprehension?. In: 27th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2019.00033, pp 154–164
Bednarik R, Myller N, Sutinen E, Tukiainen M (2005) Effects of experience on gaze behavior during program animation. In: 17th workshop of psychology of programming interest group, pp 49–61
Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing porgram comprehension processes. In: 4th Symp. Eye Tracking Res. & App. https://doi.org/10.1145/1117309.1117356, pp 125–132
Bednarik R et al (2020) EMIP: The eye movements in programming dataset. Sci Comput Programming 198:102520. https://doi.org/10.1016/j.scico.2020.102520
Beniamini G, Gingichashvili S, Klein Orbach A, Feitelson DG (2017) Meaningful identifier names: The case of single-letter variables. In: 25th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2017.18, pp 45–54
Bergersen GR, Gustafsson J.-E. (2011) Programming skill, knowledge, and working memory among professional software developers from an investment theory perspective. J Individual Differences 32(4):201–209. https://doi.org/10.1027/1614-0001/a000052
Bergersen GR, Hannay JE, Sjøberg DIK, Dybå T., Karahasanović A (2011) Inferring skill from tests of programming performance: Combining time and quality. In: 5th Intl. Symp. Empirical Softw. Eng. & Measurement. https://doi.org/10.1109/ESEM.2011.39, pp 305–314
Bergersen GR, Sjøberg DIK (2012) Evaluating methods and technologies in software engineering with respect to developer’s skill level. In: 16th Intl. Conf. Evaluation & Assessment in Softw. Eng. https://doi.org/10.1049/ic.2012.0013, pp 101–110
Bergersen GR, Sjøberg DIK, Dybå T (2014) Construction and validation of an instrument for measuring programming skill. IEEE Trans Softw Eng 40(12):1163–1184. https://doi.org/10.1109/TSE.2014.2348997
Biggerstaff TJ (1989) Design recovery for maintenance and reuse. Computer 22(7):36–49. https://doi.org/10.1109/2.30731
Bishop B, McDaid K (2008) Spreadsheet debugging behaviour of expert and novice end-users. In: 4th Intl. Workshop End-User Software Engineering. https://doi.org/10.1145/1370847.1370860, pp 56–60
Brooks FP Jr (1987) No silver bullet: Essence and accidents of software engineering. Computer 20(4):10–19. https://doi.org/10.1109/MC.1987.1663532
Brooks R (1983) Towards a theory of the comprehension of computer programs. Intl J Man-Machine Studies 18(6):543–554. https://doi.org/10.1016/S0020-7373(83)80031-5
Brooks RE (1980) Studying programmer behavior experimentally: The problems of proper methodology. Comm ACM 23(4):207–213. https://doi.org/10.1145/358841.358847
Buse RPL, Weimer WR (2008) A metric for software readability. Intl. Symp. Softw. Testing & Analysis, 121–130. https://doi.org/10.1145/1390630.1390647
Buse RPL, Weimer WR (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558. https://doi.org/10.1109/TSE.2009.70
Busjahn T, Bednarik R, Begel A, Crosby M, Paterson JH, Schulte C, Sharif B, Tamm S (2015) Eye movements in code reading: Relaxing the linear order. In: 23rd Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC.2015.36, pp 255–265
Campbell JP, McCloy RA, Oppler SH, Sager CE (1993) A theory of performance. In: Schmitt N, Borman WC, Associates (eds) Personnel Selection in Organizations. Jossey-Bass Pub, pp 35–70
Carver J, Shull F, Basili V (2003) Observational studies to accelerate process experience in classroom studies: An evaluation. In: Intl. Symp. Empirical Softw. Eng. https://doi.org/10.1109/ISESE.2003.1237966, pp 72–79
Carver JC, Jaccheri L, Morasca S, Shull F (2010) A checklist for integrating student empirical studies with research and teaching goals. Empirical Softw Eng 15(1):35–59. https://doi.org/10.1007/s10664-009-9109-9
Cates R, Yunik N, Feitelson DG (2021) Does code structure affect comprehension? on using and naming intermediate variables. In: 29th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC52881.2021.00020, pp 118–126
Ceccato M, Di Penta M, Falcarin P, Ricca F, Torchiano M, Tonella P (2014) A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical Softw Eng 19(4):1040–1074. https://doi.org/10.1007/s10664-013-9248-x
Cherubini M, Venolia G, DeLine R, Ko AJ (2007) Let’s go to the whiteboard: How and why software developers use drawings. In: SIGCHI Conf. Human Factors in Comput. Syst. https://doi.org/10.1145/1240624.1240714, pp 557–566
Chikofsky EJ, Cross II JH (1990) Reverse engineering and design recovery: A taxonomy. IEEE Softw 7(1):13–17. https://doi.org/10.1109/52.43044
Cook C, Bregar W, Foote D (1984) A preliminary investigation of the use of the cloze procedure as a measure of program understanding. Inf Process & Management 20(1–2):199–208. https://doi.org/10.1016/0306-4573(84)90050-5
Cornelissen B, Zaidman A, van Deursen A, Moonen L, Koschke R (2009) A systematic survey of program comprehension through dynamic analysis. IEEE Trans Softw Eng 35(5):684–702. https://doi.org/10.1109/TSE.2009.28
Couceiro R, Duarte G, Durães J., Castelhano J, Duarte C, Teixeira C, Castelo Branco M, de Carvalho P, Madeira H (2019) Biofeedback augmented software engineering: Monitoring of programmers’ mental effort. In: 41st Intl. Conf. Softw. Eng. https://doi.org/10.1109/ICSE-NIER.2019.00018. (NIER track)., pp 37–40
Crosby ME, Scholtz J, Wiedenbeck S (2002) The roles beacons play in comprehension for novice and expert programmers. In: 14th workshop psychology of programming interest group, pp 58–73
Curtis B (1981) Substantiating programmer variability. Proc IEEE 69(7):846. https://doi.org/10.1109/PROC.1981.12088
Curtis B (2014) A career spent wading through industry’s empirical ooze. In: 2nd Intl. Workshop Conducting Empirical Studies in Industry. https://doi.org/10.1145/2593690.2593699, pp 1–2
Denaro G, Pezzè M (2002) An empirical evaluation of fault-proneness models. In: 24th Intl. Conf. Softw. Eng. https://doi.org/10.1145/581339.581371, pp 241–251
Dijkstra EW (1968) Go To statement considered harmful. Comm ACM 11(3):147–148. https://doi.org/10.1145/362929.362947
Dreyfus SE, Dreyfus HL (1980) A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. Tech. Rep. ORC-80-2, Operations Research Center. University of California, Berkeley
DuBay WH (2004) The principles of readability. http://www.impact-information.com/impactinfo/readability02.pdf
Dunsmore A, Roper M (2000) A Comparative Evaluation of Program Comprehension Measures. Tech. Rep. EFoCS-35-2000. University of Strathclyde, Glasgow
Dunsmore A, Roper M, Wood M (2000) The role of comprehension in software inspection. J Syst Softw 52(2–3):121–129. https://doi.org/10.1016/S0164-1212(99)00138-7
Ericsson KA, Krampe RT, Tesch-Römer C (1993) The role of deliberate practice in the acquisition of expert performance. Psychological Rev 100 (3):363–406. https://doi.org/10.1037/0033-295X.100.3.363
Ericsson KA, Prietula MJ, Cokely ET (2007) The making of an expert. Harvard Business Rev, Massachusetts
Etgar A, Friedman R, Haiman S, Perez D, Feitelson DG (2022) The effect of information content and length on name recollection. In: 30th Intl Conf Program Comprehension. https://doi.org/10.1145/3524610.3529159
Fakhoury S, Roy D, Ma Y, Arnaoudova V, Adesope O (2020) Measuring the impact of lexical and structural inconsistencies on developers’ cognitive load during bug localization. Empirical Softw Eng 25(3):2140–2178. https://doi.org/10.1007/s10664-019-09751-4
Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2018) Empirical software engineering experts on the use of students and professionals in experiments. Empirical Softw Eng 23(1):452–489. https://doi.org/10.1007/s10664-017-9523-3
Feigenspan J, Apel S, Liebig J, Kästner C (2011) Exploring software measures to assess program comprehension. In: Intl. Symp. Empirical Softw. Eng. & Measurement. https://doi.org/10.1109/ESEM.2011.21, pp 127–136
Feitelson DG (2015) Using students as experimental subjects in software engineering research – a review and discussion of the evidence. arXiv:1512.08409 [cs.SE]
Feitelson DG (2021) Considerations and pitfalls in controlled experiments on code comprehension. In: 29th Intl. Conf. Program Comprehension. https://doi.org/10.1109/ICPC52881.2021.00019, pp 106–117
Feitelson DG, Mizrahi A, Noy N, Ben Shabat A, Eliyahu O, Sheffer R (2022) How developers choose names. IEEE Trans Softw Eng 48(1):37–52. https://doi.org/10.1109/TSE.2020.2976920
Floyd B, Santander T, Weimer W (2017) Decoding the representation of code in the brain: An fMRI study of code review and expertise. In: 39th Intl Conf Softw Eng. https://doi.org/10.1109/ICSE.2017.24, pp 175–186
Fowler M (2019) Refactoring: Improving the Design of Existing Code, 2nd edn. Pearson Education Inc, Boston
Fritz T, Begel A, Muller̈ SC, Yigit-Elliott S, Züger M. (2014) Using psycho-physiological measures to assess task difficulty in software development. In: 36th Intl Conf Softw Eng. https://doi.org/10.1145/2568225.2568266, pp 402–413
Geffen Y, Maoz S (2016) On method ordering. In: 24th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2016.7503711
Gil Y, Lalouche G (2017) On the correlation between size and metric validity. Empirical Softw Eng 22(5):2585–2611. https://doi.org/10.1007/s10664-017-9513-5
Gopstein D, Iannacone J, Yan Y, DeLong L, Zhuang Y, Yeh MK-C, Cappos J (2017) Understanding misunderstanding in source code. In: 11th ESEC/FSE. https://doi.org/10.1145/3106237.3106264, pp 129–139
Graziotin D, Fagerholm F, Wang X, Abrahamsson P (2018) What happens when software developers are (un)happy. J Syst Softw 140:32–47. https://doi.org/10.1016/j.jss.2018.02.041
Graziotin D, Wang X, Abrahamsson P (2014) Software developers, moods, emotions, and performance. IEEE Softw 31(4):24–27. https://doi.org/10.1109/MS.2014.94
Graziotin D, Wang X, Abrahamsson P (2015) How do you feel, developer? an explanatory theory of the impact of affects on programming performance. peerJ Comput Sci 1:e18. https://doi.org/10.7717/peerj-cs.18
Hannay JE (2011). Personality, intelligence, and expertise: Impacts on software development. In: Oram A., Wilson G. (eds) Making Software, pp 79–110. O'Reilly Media Inc, Massachusetts
Hannebauer C, Hesenius M, Gruhn V (2018) Does syntax highlighting help programming novices?. Empirical Softw Eng 23 (5):2795–2828. https://doi.org/10.1007/s10664-017-9579-0
Heathcote A, Brown S, Mewhort DJK (2000) The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review 7 (2):185–207. https://doi.org/10.3758/BF03212979
Hofmeister JC, Siegmund J, Holt DV (2019) Shorter identifier names take longer to comprehend. Empirical Softw Eng 24(1):417–443. https://doi.org/10.1007/s10664-018-9621-x
Hollmann N, Hanenberg S (2017) An empirical study on the readability of regular expressions: Textual versus graphical. In: Working Conf. Softw. Visualization. https://doi.org/10.1109/VISSOFT.2017.27, pp 74–84
Ivanova AA, Srikant S, Sueoka Y, Kean HH, Dhamala R, O’Reilly U.-M., Bers MU, Fedorenko E (2020) Comprehension of computer code relies primarily on domain-general executive brain regions. eLife 9:e58906. https://doi.org/10.7554/eLife.58906
Jansen AR, Blackwell AF, Marriott K (2003) A tool for tracking visual attention: The restricted focus viewer. Behavior Research Methods, Instruments, & Comput 35(1):57–69. https://doi.org/10.3758/BF03195497
Jbara A, Feitelson DG (2014) On the effect of code regularity on comprehension. In: 22nd Intl. Conf. Program Comprehension. https://doi.org/10.1145/2597008.2597140, pp 189–200
Jbara A, Feitelson DG (2017) How programmers read regular code: A controlled experiment using eye tracking. Empirical Softw Eng 22(3):1440–1477. https://doi.org/10.1007/s10664-016-9477-x
Jedlitschka A, Pfahl D (2005) Reporting guidelines for controlled experiments in software engineering. In: Intl Symp Empirical Softw Eng. https://doi.org/10.1109/ISESE.2005.1541818, pp 95–104
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer
Juristo N, Vegas S, Solari M, Abrahao S, Ramos I (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. In: 5th Intl Conf Software Testing, Verification, & Validation. https://doi.org/10.1109/ICST.2012.113, pp 330–339
Kaczmarczyk LC, Petrick ER, East JP, Herman GL (2010) Identifying student misconceptions of programming. In: 41st SIGCSE Tech Symp Comput Sci Ed, pp 107–111
Kahneman D (1973) Attention and Effort. Prantice-Hall, Hoboken
Ko AJ, LaToza TD, Burnett MM (2015) A practical guide to controlled experiments of software engineering tools with human participants. Empirical Softw Eng 20 (1):110–141. https://doi.org/10.1007/s10664-013-9279-3https://doi.org/10.1007/s10664-013-9279-3
Kruchten P (1995) The 4 + 1 view model of architecture. IEEE Softw 12(6):42–50. https://doi.org/10.1109/52.469759https://doi.org/10.1109/52.469759
Krueger R, Huang Y, Liu X, Santander T, Weimer W, Leach K (2020) Neurological divide: An fMRI study of prose and code writing. In: 42nd Intl Conf Softw Eng. https://doi.org/10.1145/3377811.3380348https://doi.org/10.1145/3377811.3380348, pp 678–690
Lawrie D, Morrell C, Field H, Binkley D (2006) What’s in a name? a study of identifiers. In: 14th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2006.51, pp 3–12
Levy O, Feitelson DG (2021) Understanding large-scale software systems — structure and flows. Empirical Softw Eng 26(3):48. https://doi.org/10.1007/s10664-021-09938-8
Lientz BP, Swanson EB, Tompkins GE (1978) Characteristics of application software maintenance. Comm ACM 21(6):466–471. https://doi.org/10.1145/359511.359522
Littman DC, Pinto J, Letovsky S, Soloway E (1987) Mental models and software maintenance. J Syst Softw 7(4):341–355. https://doi.org/10.1016/0164-1212(87)90033-1
Ma L, Ferguson J, Roper M, Wood M (2007) Investigating the viability of mental models held by novice programmers. In: 38th SIGCSE Symp Comput Sci Education. https://doi.org/10.1145/1227504.1227481https://doi.org/10.1145/1227504.1227481, pp 499–503
Madison S, Gifford J (2002) Modular programming: Novice misconceptions. J Res Tech Ed 34(3):217–229. https://doi.org/10.1080/15391523.2002.10782346
Martin RC (2009) Clean Code: A Handbook of Agile Software Craftmanship. Prentice Hall, Hoboken
McCabe T (1976) A complexity measure. IEEE Trans Softw Eng SE-2(4):308–320. https://doi.org/10.1109/TSE.1976.233837https://doi.org/10.1109/TSE.1976.233837
McKeithen KB, Reitman JS, Reuter HH, Hirtle SC (1981) Knowledge organization and skill differences in computer programmers. Cognitive Psychol 13(3):307–325. https://doi.org/10.1016/0010-0285(81)90012-8https://doi.org/10.1016/0010-0285(81)90012-8
McMeekin DA, von Konsky BR, Robey M, Cooper DJA (2009) The significance of participant experience when evaluating software inspection techniques. In: Australian Softw Eng Conf. https://doi.org/10.1109/ASWEC.2009.13, pp 200–209
Meyer B (1992) Applying “design by contract”. Computer 25 (10):40–51. https://doi.org/10.1109/2.161279
Miara RJ, Musselman JA, Navarro JA, Shneiderman B (1983) Program indentation and comprehensibility. Comm ACM 26(11):851–867. https://doi.org/10.1145/182.358437
Nagappan M, Robbes R, Kamei Y, Tanter E, McIntosh S, Mockus A, Hassan AE (2015) An empirical study of goto in C code from GitHub repositories. In: 10th ESEC/FSE. https://doi.org/10.1145/2786805.2786834, pp 404–414
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: 28th Intl Conf Softw Eng. https://doi.org/10.1145/1134285.1134349, pp 452–461
Newell A, Rosenbloom PS (1981) Mechanisms of skill acquisition and the law of practice. In: Anderson JR (ed) Cognitive skills and their acquisition. Lawrence Erlbaum Assoc, pp 1–55
Nyström M., Andersson R, Holmqvist K, van der Weijer J (2013) The influence of calibration method and eye physiology on eyetracking data quality. Behavioral Res Meth 45(1):272–288. https://doi.org/10.3758/s13428-012-0247-4https://doi.org/10.3758/s13428-012-0247-4
Obaidellah U, Al Haek M, Cheng PC-H (2018) A survey on the usage of eye-tracking in computer programming. ACM Comput Surv 51(1):5. https://doi.org/10.1145/3145904
Oliveira D, Bruno R, Madeiral F, Castor F (2020) Evaluating code readability and legibility: An examination of human-centric studies. In: Intl Conf Softw. Maintenance & Evolution. https://doi.org/10.1109/ICSME46990.2020.00041, pp 348–359
Oman PW, Cook CR (1990) Typographic style is more than cosmetic. Comm ACM 33(5):506–520. https://doi.org/10.1145/78607.78611
Orso A, Sinha S, Harrold MJ (2001) Effects of pointers on data dependences. In: 9th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2001.921712, pp 39–49
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Comm ACM 15(12):1053–1058. https://doi.org/10.1145/361598.361623
Parnas DL, Clements PC, Weiss DM (1985) The modular structure of complex systems. IEEE Trans Softw Eng SE-11(3):259–266. https://doi.org/10.1109/TSE.1985.232209
Paulson JW, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256. https://doi.org/10.1109/TSE.2004.1274044
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst MD, Pang D, Keller B (2017) Evaluating and improving fault localization. 39th Intl Conf Softw Eng, 609–620. https://doi.org/10.1109/ICSE.2017.62
Pennington N (1987) Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychology 19(3):295–341. https://doi.org/10.1016/0010-0285(87)90007-7
Politowski C, Khomh F, Romano S, Scanniello G, Petrillo F, Guéhéneuc Y-G, Maiga A (2020) A large scale empirical study of the impact of Spaghetti Code and Blob anti-patterns on program comprehension. InfSoftw Tech 122:106278. https://doi.org/10.1016/j.infsof.2020.106278
Prechelt L (1999) Comparing Java vs. C/C++ efficiency differences to interpersonal differences. Comm ACM 42(10):109–112. https://doi.org/10.1145/317665.317683
Purchase HC, Colpoys L, McGill M, Carrington D (2002) UML collaboration diagram syntax: An empirical study of comprehension. In: 1st Intl Workshop Visualizing Softw for Understanding & Analysis. https://doi.org/10.1109/VISSOF.2002.1019790, pp 13–22
Raghunathan S, Prasad A, Mishra BK, Chang H (2005) Open source versus closed source: Software quality in monopoly and competitive markets. IEEE Trans Syst Man Cybernetics 35(6):903–918. https://doi.org/10.1109/TSMCA.2005.853493
Rajlich V, Cowan GS (1997) Towards standard for experiments in program comprehension. In: 5th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.1997.601284, pp 160–161
Rajlich V, Wilde N (2002) The role of concepts in program comprehension. In: 10th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2002.1021348, pp 271–278
Raymond ES (2000) The cathedral and the bazaar. https://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar
Roehm T, Tiarks R, Koschke R, Maalej W (2012) How do professional developers comprehend software?. In: 34th Intl Conf Softw Eng. https://doi.org/10.1109/ICSE.2012.6227188, pp 255–265
Sackman H, Erikson WJ, Grant EE (1968) Exploratory experimental studies comparing online and offline programming performance. Comm ACM 11 (1):3–11. https://doi.org/10.1145/362851.362858
Salviulo F, Scanniello G (2014) Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In: 18th Intl Conf Evaluation & Assessment in Softw Eng, art. 48. https://doi.org/10.1145/2601248.2601251
Scalabrino S, Bavota G, Vendome C, Linares-Vśquez M, Poshyvanyk D, Oliveto R (2021) Automatically assessing code understandability. IEEE Trans Softw Eng 47(3):595–613. https://doi.org/10.1109/TSE.2019.2901468https://doi.org/10.1109/TSE.2019.2901468.
Scalabrino S, Linares-Vásquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: 24th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2016.7503707https://doi.org/10.1109/ICPC.2016.7503707
Schankin A, Berger A, Holt DV, Hofmeister JC, Riedel T, Beigl M (2018) Descriptive compound identifier names improve source code comprehension. In: 26th Intl Conf Program Comprehension. https://doi.org/10.1145/3196321.3196332, pp 31–40
Schenk KD, Vitalari NP, Davis KS (1998) Differences between novice and expert systems analysts: What do we know and what do we do?. J Mgmt Inf Syst 15(1):9–50. https://doi.org/10.1080/07421222.1998.11518195https://doi.org/10.1080/07421222.1998.11518195
Shaffer TR, Wise JL, Walters BM, Muller̈ SC, Falcone M, Sharif B (2015) iTrace: Enabling eye tracking on software artifacts within the IDE to support software engineering tasks. In: ESEC/FSE. https://doi.org/10.1145/2786805.2803188, pp 954–957
Shaft TM, Vessey I (1998) The relevance of application domain knowledge: Characterizing the computer program comprehension process. J Mgmt Inf Syst 15(1):51–78. https://doi.org/10.1080/07421222.1998.11518196https://doi.org/10.1080/07421222.1998.11518196
Sharafi Z, Huang Y, Leach K, Weimer W (2021) Toward an objective measure of developers’ cognitive activities. ACM Trans Softw Eng Methodology 30 (3):30. https://doi.org/10.1145/3434643
Sharafi Z, Sharif B, Guéhéneuc Y-G, Begel A, Bednarik R, Crosby M (2020) A practical guide on conducting eye tracking studies in software engineering. Empirical Softw Eng 25(5):3128–3174. https://doi.org/10.1007/s10664-020-09829-4
Sharafi Z, Soh Z, Guéhéneuc Y-G (2015) A systematic litareture review on the usage of eye-tracking in software engineering. Inf Softw Tech 67:79–107. https://doi.org/10.1016/j.infsof.2015.06.008
Sharafi Z, Soh Z, Guéhéneuc Y-G, Antoniol G (2012) Women and men — different but equal: On the impact of identifier style on source code reading. In: 20th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2012.6240505, pp 27–36
Sharif B, Maletic JI (2010) An eye tracking study on camelCase and under_score identifier styles. In: 18th Intl Conf Program Comprehension. https://doi.org/10.1109/ICPC.2010.41, pp 196–205
Sharma T, Spinellis D (2018) A survey of code smells. J Syst Softw 138:158–173. https://doi.org/10.1016/j.jss.2017.12.034https://doi.org/10.1016/j.jss.2017.12.034
Shneiderman B (1977) Measuring computer program quality and comprehension. Intl J Man-Machine Studies 9 (4):465–478. https://doi.org/10.1016/S0020-7373(77)80014-X
Shneiderman B, Mayer R (1979) Syntactic/semantic interactions in programmer behavior: A model and experimental results. Intl J Comput Inf Syst 8 (3):219–238. https://doi.org/10.1007/BF00977789
Shull F, Singer J, Sjøberg DIK (eds) (2008) Guide to Advanced Empirical Software Engineering. Springer, Berlin
Siegmund J (2016) Program comprehension: Past, present, and future. In: 23rd Intl Conf Softw Analysis, Evolution, & Reengineering. https://doi.org/10.1109/SANER.2016.35, pp 13–20
Siegmund J, Kästner C, Apel S, Brechmann A, Saake G (2013) Experience from measuring program comprehension—toward a general framework. In: Kowalewski S, Rumpe B (eds) Software Engineering. Gesellschaft für Informatik e.V. LNI, vol P-213, pp 239–257
Siegmund J, Kästner C, Apel S, Parnin C, Bethmann A, Leich T, Saake G, Brechmann A (2014) Understanding understanding source code with functional magnetic resonance imaging. In: 36th Intl Conf Softw Eng. https://doi.org/10.1145/2568225.2568252, pp 378–389
Siegmund J, Kästner C, Liebig J, Apel S, Hanenberg S (2014) Measuring and modeling programming experience. Empirical Softw Eng 19(5):1299–1334. https://doi.org/10.1007/s10664-013-9286-4
Siegmund J, Peitek N, Apel S, Siegmund N (2021) Mastering variation in human studies: The role of aggregation. ACM Trans Softw Eng Methodology 30(1):art. 2. https://doi.org/10.1145/3406544
Siegmund J, Peitek N, Parnin C, Apel S, Hofmeister J, Kästner C, Begel A, Bethmann A, Brechmann A (2017) Measuring neural efficiency of program comprehension. In: 11th ESEC/FSE. https://doi.org/10.1145/3106237.3106268, pp 140–150
Siegmund J, Schumann J (2015) Confounding parameters on program comprehension: A literature survey. Empirical Softw Eng 20(4):1159–1192. https://doi.org/10.1007/s10664-014-9318-8
Simon HA, Chase WG (1973) Skill in chess. American Scientist 61(4):394–403
Sjøberg DIK, Anda B, Arisholm E, Dybå T, Jørgensen M, Karahasanovic A, Koren EF, Vokác M. (2002) Conducting realistic experiments in software engineering. In: Intl Symp Empirical Softw Eng. https://doi.org/10.1109/ISESE.2002.1166921, pp 17–26
Sjøberg DIK, Anda B, Arisholm E, Dybå T, Jørgensen M, Karahasanović A, Vokáč M (2003) Challenges and recommendations when increasing the realism of controlled software engineering experiments. In: Conradi R, Wang AI (eds) Empirical methods and studies in software engineering: experiences from ESERNET, Springer, pp 24–38. https://doi.org/10.1007/978-3-540-45143-3. Lect Notes Comput vol 2765
Sjøberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanović A, Liborg N-K, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Softw Eng 31(9):733–753. https://doi.org/10.1109/TSE.2005.97
Smith M, Taffler R (1992) Readability and understandability: Different measures of the textual complexity of accounting narrative. Accounting, Audting & Accountability J 5(4):84–98. https://doi.org/10.1108/09513579210019549https://doi.org/10.1108/09513579210019549
Sochat VV, Eisenberg IW, Enkavi AZ, Li J, Bissett PG, Poldrack RA (2016) The experiment factory: Standardizing behavioral experiments. Frontiers in Psychology 7:art. 610. https://doi.org/10.3389/fpsyg.2016.00610https://doi.org/10.3389/fpsyg.2016.00610
Soloway E, Ehrlich K (1984) Empirical studies of programming knowledge. IEEE Trans Softw Eng SE-10(5):595–609. https://doi.org/10.1109/TSE.1984.5010283
Sonnentag S (1998) Expertise in professional software design: A process study. J App Psychol 83(5):703–715. https://doi.org/10.1037/0021-9010.83.5.703
Sonnentag S, Niessen C, Volmer J (2006) Expertise in software design. In: Ericsson KA, Charness N, Feltovich PJ, Hoffman RR (eds) The Cambridge Handbook of Expertise and Expert Performance. Cambridge University Press, pp 373–387
Spolsky J (2005) The perils of JavaSchools. https://www.joelonsoftware.com/2005/12/29/the-perils-of-javaschools-2, 29 Dec 2005
Stefik A, Siebert S (2013) An empirical investigation into programing language syntax. ACM Trans Computing Education 13(4) art. 19. https://doi.org/10.1145/2534973
Storey M-A (2005) Theories, methods and tools in program comprehension: Past, present and future. In: 13th IEEE Intl Workshop Program Comprehension. https://doi.org/10.1109/WPC.2005.38
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research (1979) The Belmont report. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empirical Softw Eng 5 (4):309–312. https://doi.org/10.1023/A:1009844119158
von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. Computer 28(8):44–55. https://doi.org/10.1109/2.402076
von Mayrhauser A, Vans AM (1996) On the role of hypotheses during opportunistic understanding while porting large scale code. In: 4th Workshop Program Comprehension. https://doi.org/10.1109/WPC.1996.501122, pp 68–77
von Mayrhauser A, Vans AM (1998) Program understanding behavior during adaptation of large scale software. In: 6th Workshop Program Comprehension. https://doi.org/10.1109/WPC.1998.693345, pp 164–172
von Mayrhauser A, Vans AM, Howe AE (1997) Program understanding behavior during enhancement of large-scale software. J Softw Maintenance: Res Pract 9(5):299–327. https://doi.org/10.1002/(SICI)1096-908X(199709/10)9:5<299::AID-SMR157>3.0.CO;2-S
Weiser M, Shertz J (1983) Programming problem representation in novice and expert programmers. Intl J Man-Machine Studies 19(4):391–398. https://doi.org/10.1016/S0020-7373(83)80061-3
Weissman L (1974) Psychological complexity of computer programs: An experimental methodology. SIGPLAN Notices 9(6):25–36. https://doi.org/10.1145/953233.953237
Wilson LA, Senin Y, Wang Y, Rajlich V (2019) Empirical study of phased model of software change. arXiv:1904:05842 [cs.SE]
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Anita Sarma, Fabio Palomba and Alexander Serebrenik
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)
Dror Feitelson holds the Berthold Badler chair in Computer Science. This research was supported by the ISRAEL SCIENCE FOUNDATION (grant no. 832/18). This paper is an extended version of an “honorable mention” paper from ICPC 2021.
Rights and permissions
About this article
Cite this article
Feitelson, D.G. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension. Empir Software Eng 27, 123 (2022). https://doi.org/10.1007/s10664-022-10160-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10160-3