[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3545945.3569770acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article
Open access

Using Large Language Models to Enhance Programming Error Messages

Published: 03 March 2023 Publication History

Abstract

A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix them. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.

Supplementary Material

MP4 File (SIGCSE23-V1fp171.mp4)
Video presentation of the paper.

References

[1]
Toufique Ahmed, Noah Rose Ledesma, and Premkumar Devanbu. 2021. SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics. arXiv preprint arXiv:2104.14671 (2021).
[2]
Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. ACM, 78--87.
[3]
Titus Barik. 2018. Error Messages as Rational Reconstructions. Ph.,D. Dissertation. North Carolina State University.
[4]
Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do Developers Read Compiler Error Messages?. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 575--585.
[5]
Brett A. Becker. 2016. An Effective Approach to Enhancing Compiler Error Messages. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). ACM, NY, NY, USA, 126--131. https://doi.org/10.1145/2839509.2844584
[6]
Brett A. Becker. 2021. What Does Saying That `Programming is Hard' Really Say, and About Whom? Commun. ACM, Vol. 64, 8 (2021), 27--29.
[7]
Brett A. Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J. Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, Janice L. Pearce, and James Prather. 2019. Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. ACM, 177--210.
[8]
Brett A. Becker, Paul Denny, James Prather, Raymond Pettit, Robert Nix, and Catherine Mooney. 2021. Towards Assessing the Readability of Programming Error Messages. In Australasian Computing Education Conference. ACM, 181--188.
[9]
Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle Goslin, and Catherine Mooney. 2016. Effective Compiler Error Message Enhancement for Novice Programming Students. Computer Science Education, Vol. 26, 2--3 (2016), 148--175.
[10]
Brett A. Becker, Kyle Goslin, and Graham Glanville. 2018. The Effects of Enhanced Compiler Error Messages on a Syntax Error Debugging Test. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. ACM, 640--645.
[11]
Brett A Becker and Catherine Mooney. 2016. Categorizing Compiler Error Messages with Principal Component Analysis. In 12th China-Europe International Symposium on Software Engineering Education (CEISEE 2016).
[12]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-shot Learners. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1877--1901.
[13]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021).
[14]
Paul Denny, Andrew Luxton-Reilly, and Dave Carpenter. 2014. Enhancing Syntax Error Messages Appears Ineffectual. In Proceedings of the 19th Conference on Innovation and Technology in Computer Science Education. ACM, 273--278.
[15]
Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. Understanding the Syntax Barrier for Novices. In Proceedings of the 16th Annual Joint Conference on Innovation and Technology in Computer Science Education. ACM, 208--212.
[16]
Paul Denny, James Prather, and Brett A Becker. 2020. Error Message Readability and Novice Debugging Performance. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 480--486.
[17]
Paul Denny, James Prather, Brett A. Becker, Catherine Mooney, John Homer, Zachary C Albrecht, and Garrett B. Powell. 2021. On Designing Programming Error Messages for Novices: Readability and Its Constituent Factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM.
[18]
Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022. Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).
[19]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.
[20]
Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Deep Reinforcement Learning for Syntactic Error Repair in Student Programs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 930--937.
[21]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common C Language Errors by Deep Learning. In Thirty-First AAAI conference on artificial intelligence.
[22]
Slava Kalyuga. 2009. The Expertise Reversal Effect. In Managing cognitive load in adaptive multimedia learning. IGI Global, 58--80.
[23]
Ioannis Karvelas, Annie Li, and Brett A. Becker. 2020. The Effects of Compilation Mechanisms and Error Message Presentation on Novice Programmer Behavior. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. ACM, 759--765.
[24]
Tobias Kohn. 2019. The Error Behind The Message: Finding the Cause of Error Messages in Python. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. ACM, 524--530.
[25]
J Richard Landis and Gary G Koch. 1977. The Measurement of Observer Agreement for Categorical Data. biometrics (1977), 159--174.
[26]
Hang Li. 2022. Language Models: Past, Present, and Future. Commun. ACM, Vol. 65, 7 (2022), 56--63.
[27]
David Liu and Andrew Petersen. 2019. Static Analyses in Python Programming Courses. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 666--671.
[28]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv preprint arXiv:2107.13586 (2021).
[29]
Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating Diverse Code Explanations using the GPT-3 Large Language Model. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 2. 37--39.
[30]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 754--768.
[31]
Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Error Messages Help Students? Results Inconclusive. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. ACM, 465--470.
[32]
James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 41--50.
[33]
James Prather, Raymond Pettit, Kayla Holcomb McMurry, Alani Peters, John Homer, Nevan Simone, and Maxine Cohen. 2017. On Novices' Interaction with Compiler Error Messages: A Human Factors Approach. In Proceedings of the 2017 ACM Conference on International Computing Education Research. ACM, 74--82.
[34]
Kyle Reestman and Brian Dorn. 2019. Native Language's Effect on Java Compiler Errors. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). ACM, NY, NY, USA, 249--257. https://doi.org/10.1145/3291279.3339423
[35]
Saul Rosen, Robert A. Spurgeon, and Joel K. Donnelly. 1965. PUFFT-The Purdue University Fast FORTRAN Translator. Commun. ACM, Vol. 8, 11 (1965), 661--666.
[36]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research V. 1. 27--43.
[37]
Andreas Stefik and Susanna Siebert. 2013. An Empirical Investigation into Programming Language Syntax. ACM Transactions on Computing Education, Vol. 13, 4 (2013), 1--40.
[38]
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1--7.

Cited By

View all
  • (2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
  • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
  • (2024)Comparative Analysis of Chatbots Using Large Language Models for Web Development TasksApplied Sciences10.3390/app14211004814:21(10048)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1
March 2023
1481 pages
ISBN:9781450394314
DOI:10.1145/3545945
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2023

Check for updates

Author Tags

  1. ai
  2. codex
  3. compiler error messages
  4. large language models
  5. programming error messages
  6. syntax error messages

Qualifiers

  • Research-article

Conference

SIGCSE 2023
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE TS 2025
The 56th ACM Technical Symposium on Computer Science Education
February 26 - March 1, 2025
Pittsburgh , PA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,769
  • Downloads (Last 6 weeks)141
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
  • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
  • (2024)Comparative Analysis of Chatbots Using Large Language Models for Web Development TasksApplied Sciences10.3390/app14211004814:21(10048)Online publication date: 4-Nov-2024
  • (2024)ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skillsPLOS ONE10.1371/journal.pone.030401319:5(e0304013)Online publication date: 24-May-2024
  • (2024)Fine-Tuning Large Language Models for Better Programming Error ExplanationsProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699581(1-2)Online publication date: 12-Nov-2024
  • (2024)Novice Learners of Programming and Generative AI - Prior Knowledge MattersProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699580(1-2)Online publication date: 12-Nov-2024
  • (2024)Unrestricted Use of LLMs in a Software Project Course: Student Perceptions on Learning and Impact on Course PerformanceProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699541(1-7)Online publication date: 12-Nov-2024
  • (2024)Decoding Debugging Instruction: A Systematic Literature Review of Debugging InterventionsACM Transactions on Computing Education10.1145/369065224:4(1-44)Online publication date: 5-Sep-2024
  • (2024)Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in PracticeProceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research10.1145/3689535.3689554(1-7)Online publication date: 5-Sep-2024
  • (2024)An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics QuestionsSIGGRAPH Asia 2024 Educator's Forum10.1145/3680533.3697064(1-8)Online publication date: 3-Dec-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media