skip to main content
10.1145/3545945.3569791acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article
Open Access

First Steps Towards Predicting the Readability of Programming Error Messages

Published:03 March 2023Publication History

ABSTRACT

Reading a programming error message is the first step in understanding what it is trying to tell the programmer about how to fix an error in their code. However, these are often difficult to read, especially for novices which is not surprising given that error messages in many of the most popular languages in which novices learn to code were not written with readability in mind. As a result, novices frequently struggle to understand them. This is a long-standing problem, with researchers highlighting concerns about programming error message readability over the last six decades. Very recent work has put forward evidence of the need for measuring readability in error messages and a framework for doing so. This framework consists of four factors of readability for programming error messages: message length, vocabulary, jargon, and sentence construction. We use this framework to implement an approach to automatically assess the readability of programming error messages. Using established readability factors as predictors in a machine learning model, we train several models using a dataset of C and Java error messages. We examine the performance of these models, and apply the best performing model to a previously published set of messages evaluated for readability by experts, non-experts and students. Our results validate the previously proposed readability factors, and our model classifies messages similarly to human raters. Finally, we discuss future work needed to improve the accuracy of the model.

References

  1. Toufique Ahmed, Noah Rose Ledesma, and Premkumar Devanbu. 2021. SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics. arXiv preprint arXiv:2104.14671 (2021). https://doi.org/10.48550/arXiv.2104.14671Google ScholarGoogle ScholarCross RefCross Ref
  2. Titus Barik, Denae Ford, Emerson Murphy-Hill, and Chris Parnin. 2018. How Should Compilers Explain Problems to Developers?. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, NY, NY, USA, 633--643. https://doi.org/10.1145/3236024.3236040Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do Developers Read Compiler Error Messages?. In Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina) (ICSE '17). IEEE Press, Piscataway, NJ, USA, 575--585. https://doi.org/10.1109/ICSE.2017.59Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brett A. Becker. 2015. An Exploration Of The Effects Of Enhanced Compiler Error Messages For Computer Programming Novices. Masters Thesis. Dublin Institute of Technology. https://doi.org/10.13140/RG.2.2.26637.13288Google ScholarGoogle ScholarCross RefCross Ref
  5. Brett A. Becker. 2016. An Effective Approach to Enhancing Compiler Error Messages. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). ACM, NY, NY, USA, 126--131. https://doi.org/10.1145/2839509.2844584Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brett A. Becker. 2016. A New Metric to Quantify Repeated Compiler Errors for Novice Programmers. In Proceedings of the 21st ACM Conference on Innovation and Technology in Computer Science Education (Arequipa, Peru) (ITiCSE '16). ACM, NY, NY, USA, 296--301. https://doi.org/10.1145/2899415.2899463Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brett A. Becker. 2021. What Does Saying That ?Programming is Hard' Really Say, and About Whom? Commun. ACM 64, 8 (jul 2021), 27--29. https://doi.org/10.1145/3469115Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brett A. Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J. Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, Janice L. Pearce, and James Prather. 2019. Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education (Aberdeen, Scotland Uk) (ITiCSE-WGR '19). ACM, NY, NY, USA, 177--210. https://doi.org/10.1145/3344429.3372508Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brett A. Becker, Paul Denny, James Prather, Raymond Pettit, Robert Nix, and Catherine Mooney. 2021. Towards Assessing the Readability of Programming Error Messages. In Australasian Computing Education Conference (Virtual) (ACE'21). ACM, NY, NY, USA. https://doi.org/10.1145/3441636.3442320Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle Goslin, and Catherine Mooney. 2016. Effective Compiler Error Message Enhancement for Novice Programming Students. Computer Science Education 26, 2-3 (2016), 148--175. https://doi.org/10.1080/08993408.2016.1225464Google ScholarGoogle ScholarCross RefCross Ref
  11. Brett A. Becker and Catherine Mooney. 2016. Categorizing Compiler Error Messages With Principal Component Analysis. In 12th China-Europe International Symposium on Software Engineering Education (CEISEE 2016), Shenyang, China, 28--29 May 2016. https://researchrepository.ucd.ie/handle/10197/7889Google ScholarGoogle Scholar
  12. Tao Chen, Ruifeng Xu, and Xuan Wang. 2016. Improving Sentiment Analysis via Sentence Type Classification Using BiLSTM-CRF and CNN. Expert Systems with Applications (11 2016). https://doi.org/10.1016/j.eswa.2016.10.065Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pedro Curto, Nuno Mamede, and Jorge Baptista. 2015. Automatic Text Difficulty Classifier. In Proceedings of the 7th International Conference on Computer Supported Education - Volume 1 (Lisbon, Portugal) (CSEDU 2015). SCITEPRESS - Science and Technology Publications, Lda, Setubal, PRT, 36--44. https://doi.org/10.5220/0005428300360044Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Denny, James Prather, and Brett A. Becker. 2020. Error Message Readability and Novice Debugging Performance. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (Trondheim, Norway) (ITiCSE '20). ACM, NY, NY, USA, 480--486. https://doi.org/10.1145/3341525.3387384Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paul Denny, James Prather, Brett A Becker, Catherine Mooney, John Homer, Zachary C Albrecht, and Garrett B Powell. 2021. On Designing Programming Error Messages for Novices: Readability and its Constituent Factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tao Dong and Kandarp Khandwala. 2019. The Impact of "Cosmetic" Changes on the Usability of Error Messages. In EA of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA '19). ACM, NY, NY, USA, Article LBW0273, 6 pages. https://doi.org/10.1145/3290607.3312978Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common c language errors by deep learning. In Thirty-First AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  18. Björn Hartmann, Daniel MacDougall, Joel Brandt, and Scott R. Klemmer. 2010. What Would Other Programmers Do: Suggesting Solutions to Error Messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI '10). ACM, NY, NY, USA, 1019--1028. https://doi.org/10.1145/1753326.1753478Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Felienne Hermans. 2020. Hedy: A Gradual Language for Programming Education. In Proceedings of the 2020 ACM ICER Conference (Virtual Event, New Zealand) (ICER '20). ACM, NY, NY, USA, 259--270. https://doi.org/10.1145/3372782.3406262Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James J Horning. 1976. What the Compiler Should Tell the User. In Compiler Construction: An Advanced Course, G Goos and J Hartmanis (Eds.). Springer-Verlag, Berlin-Heidelberg, 525--548.Google ScholarGoogle Scholar
  21. Barbara S. Isa, James M. Boyle, Alan S. Neal, and Roger M. Simons. 1983. AMethodology for Objectively Evaluating Error Messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI '83). ACM, NY, NY, USA, 68--71. https://doi.org/10.1145/800045.801583Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. HDLTex: Hierarchical Deep Learning for Text Classification. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (Dec 2017). https://doi.org/10.1109/icmla.2017.0-134Google ScholarGoogle ScholarCross RefCross Ref
  23. Tobias Kuhn. 2014. A Survey and Classification of Controlled Natural Languages. Comput. Linguist. 40, 1 (March 2014), 121--170. https://doi.org/10.1162/COLI_a_00168Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. William Lidwell, Kritina Holden, and Jill Butler. 2010. Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, Make Better Design Decisions, and Teach through Design. Rockport Publishers, Beverly, Massachusetts.Google ScholarGoogle Scholar
  25. Guillaume Marceau, Kathi Fisler, and Shriram Krishnamurthi. 2011. Measuring the Effectiveness of Error Messages Designed for Novice Programmers. In Proceedings of the 42nd ACM SIGCSE TS (Dallas, TX, USA) (SIGCSE '11). ACM, NY, NY, USA, 499--504. https://doi.org/10.1145/1953163.1953308Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Cormac Murray. 2019. An Analysis of Programming Process Data in a CS1 Programming Module: Factors Influencing Success. Masters Thesis. University College Dublin.Google ScholarGoogle Scholar
  27. Charles Kay Ogden. 1930. Basic English: A General Introduction with Rules and Grammar. (1930).Google ScholarGoogle Scholar
  28. Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Error Messages Help Students? Results Inconclusive.. In Proceedings of the 2017 ACM SIGCSE TS (Seattle, Washington, USA) (SIGCSE '17). ACM, NY, NY, USA, 465--470. https://doi.org/10.1145/3017680.3017768Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. James Prather, Brett A Becker, Michelle Craig, Paul Denny, Dastyni Loksa, and Lauren Margulieux. 2020. What DoWe ThinkWe ThinkWe are Doing? Metacognition and Self-regulation in Programming. In Proceedings of the 2020 ACM ICER Conference. 2--13.Google ScholarGoogle Scholar
  30. James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM ICER Conference (Espoo, Finland) (ICER '18). ACM, NY, NY, USA, 41--50. https://doi.org/10.1145/3230977.3230981Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. James Prather, Raymond Pettit, Kayla Holcomb McMurry, Alani Peters, John Homer, Nevan Simone, and Maxine Cohen. 2017. On Novices' Interaction with Compiler Error Messages: A Human Factors Approach. In Proceedings of the 2017 ACM ICER Conference (Tacoma, Washington, USA) (ICER '17). ACM, NY, NY, USA, 74--82. https://doi.org/10.1145/3105726.3106169Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Thomas W Price, David Hovemeyer, Kelly Rivers, Austin Cory Bart, Andrew Petersen, Brett A. Becker, and Jason Lefever. 2019. ProgSnap2: A Flexible Format for Programming Process Data. In Proceedings of the Educational Data Mining in Computer Science Workshop in the Companion Proceedings of the International Conference on Learning Analytics and Knowledge (LAK 2019). Tempe, AZ, USA, 1--7. https://people.engr.ncsu.edu/twprice/website/files/CSEDM2019ProgSnap2.pdfGoogle ScholarGoogle Scholar
  33. Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers' Build Errors: A Case Study (at Google). In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). ACM, NY, NY, USA, 724--734. https://doi.org/10.1145/2568225.2568255Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Simon, Raina Mason, Tom Crick, James H. Davenport, and Ellen Murphy. 2018. Language Choice in Introductory Programming Courses at Australasian and UK Universities. In Proceedings of the 49th ACM SIGCSE TS (Baltimore, Maryland, USA) (SIGCSE '18). ACM, NY, NY, USA, 852--857. https://doi.org/10.1145/3159450.3159547Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Andreas Stefik and Richard Ladner. 2017. The Quorum Programming Language (Abstract Only). In Proceedings of the 2017 ACM SIGCSE TS (Seattle, Washington, USA) (SIGCSE '17). ACM, NY, NY, USA, 641. https://doi.org/10.1145/3017680.3022377Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Andreas Stefik and Susanna Siebert. 2013. An Empirical Investigation into Programming Language Syntax. ACM TOCE 13, 4 (2013), 1--40. https://doi.org/10.1145/2534973Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Emillie Thiselton and Christoph Treude. 2019. Enhancing Python Compiler Error Messages via Stack. In 2019 ACM/IEEE Int. Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, Piscataway, New Jersey, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  38. Alexander William Wong, Amir Salimi, Shaiful Chowdhury, and Abram Hindle. 2019. Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Piscataway, New Jersey, 318--322.Google ScholarGoogle ScholarCross RefCross Ref
  39. John Wrenn and Shriram Krishnamurthi. 2017. Error Messages are Classifiers: A Process to Design and Evaluate Error Messages. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM NY, NY, USA, Vancouver, BC, Canada, 134--147. https://doi.org/10.1145/3133850.3133862Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 1480--1489. https://doi.org/10.18653/v1/N16-1174Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. First Steps Towards Predicting the Readability of Programming Error Messages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader