More Web Proxy on the site http://driver.im/

research-article

Open access

Using Large Language Models to Enhance Programming Error Messages

Authors:

Brett A. BeckerAuthors Info & Claims

SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1

Pages 563 - 569

https://doi.org/10.1145/3545945.3569770

Published: 03 March 2023 Publication History

Abstract

A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix them. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.

Supplementary Material

MP4 File (SIGCSE23-V1fp171.mp4)

Video presentation of the paper.

Download
382.94 MB

References

[1]

Toufique Ahmed, Noah Rose Ledesma, and Premkumar Devanbu. 2021. SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics. arXiv preprint arXiv:2104.14671 (2021).

[2]

Umair Z Ahmed, Pawan Kumar, Amey Karkare, Purushottam Kar, and Sumit Gulwani. 2018. Compilation error repair: for the student programs, from the student programs. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering Education and Training. ACM, 78--87.

Digital Library

[3]

Titus Barik. 2018. Error Messages as Rational Reconstructions. Ph.,D. Dissertation. North Carolina State University.

[4]

Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do Developers Read Compiler Error Messages?. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 575--585.

Digital Library

[5]

Brett A. Becker. 2016. An Effective Approach to Enhancing Compiler Error Messages. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, Tennessee, USA) (SIGCSE '16). ACM, NY, NY, USA, 126--131. https://doi.org/10.1145/2839509.2844584

Digital Library

[6]

Brett A. Becker. 2021. What Does Saying That `Programming is Hard' Really Say, and About Whom? Commun. ACM, Vol. 64, 8 (2021), 27--29.

Digital Library

[7]

Brett A. Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J. Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, Janice L. Pearce, and James Prather. 2019. Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. ACM, 177--210.

Digital Library

[8]

Brett A. Becker, Paul Denny, James Prather, Raymond Pettit, Robert Nix, and Catherine Mooney. 2021. Towards Assessing the Readability of Programming Error Messages. In Australasian Computing Education Conference. ACM, 181--188.

[9]

Brett A. Becker, Graham Glanville, Ricardo Iwashima, Claire McDonnell, Kyle Goslin, and Catherine Mooney. 2016. Effective Compiler Error Message Enhancement for Novice Programming Students. Computer Science Education, Vol. 26, 2--3 (2016), 148--175.

[10]

Brett A. Becker, Kyle Goslin, and Graham Glanville. 2018. The Effects of Enhanced Compiler Error Messages on a Syntax Error Debugging Test. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. ACM, 640--645.

Digital Library

[11]

Brett A Becker and Catherine Mooney. 2016. Categorizing Compiler Error Messages with Principal Component Analysis. In 12th China-Europe International Symposium on Software Engineering Education (CEISEE 2016).

[12]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-shot Learners. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1877--1901.

[13]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021).

[14]

Paul Denny, Andrew Luxton-Reilly, and Dave Carpenter. 2014. Enhancing Syntax Error Messages Appears Ineffectual. In Proceedings of the 19th Conference on Innovation and Technology in Computer Science Education. ACM, 273--278.

Digital Library

[15]

Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. Understanding the Syntax Barrier for Novices. In Proceedings of the 16th Annual Joint Conference on Innovation and Technology in Computer Science Education. ACM, 208--212.

Digital Library

[16]

Paul Denny, James Prather, and Brett A Becker. 2020. Error Message Readability and Novice Debugging Performance. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 480--486.

Digital Library

[17]

Paul Denny, James Prather, Brett A. Becker, Catherine Mooney, John Homer, Zachary C Albrecht, and Garrett B. Powell. 2021. On Designing Programming Error Messages for Novices: Readability and Its Constituent Factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM.

[18]

Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022. Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).

[19]

James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.

[20]

Rahul Gupta, Aditya Kanade, and Shirish Shevade. 2019. Deep Reinforcement Learning for Syntactic Error Repair in Student Programs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 930--937.

Digital Library

[21]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common C Language Errors by Deep Learning. In Thirty-First AAAI conference on artificial intelligence.

[22]

Slava Kalyuga. 2009. The Expertise Reversal Effect. In Managing cognitive load in adaptive multimedia learning. IGI Global, 58--80.

[23]

Ioannis Karvelas, Annie Li, and Brett A. Becker. 2020. The Effects of Compilation Mechanisms and Error Message Presentation on Novice Programmer Behavior. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. ACM, 759--765.

[24]

Tobias Kohn. 2019. The Error Behind The Message: Finding the Cause of Error Messages in Python. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. ACM, 524--530.

Digital Library

[25]

J Richard Landis and Gary G Koch. 1977. The Measurement of Observer Agreement for Categorical Data. biometrics (1977), 159--174.

[26]

Hang Li. 2022. Language Models: Past, Present, and Future. Commun. ACM, Vol. 65, 7 (2022), 56--63.

Digital Library

[27]

David Liu and Andrew Petersen. 2019. Static Analyses in Python Programming Courses. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education. 666--671.

Digital Library

[28]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv preprint arXiv:2107.13586 (2021).

[29]

Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating Diverse Code Explanations using the GPT-3 Large Language Model. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 2. 37--39.

Digital Library

[30]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 754--768.

[31]

Raymond S. Pettit, John Homer, and Roger Gee. 2017. Do Enhanced Compiler Error Messages Help Students? Results Inconclusive. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. ACM, 465--470.

Digital Library

[32]

James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 41--50.

Digital Library

[33]

James Prather, Raymond Pettit, Kayla Holcomb McMurry, Alani Peters, John Homer, Nevan Simone, and Maxine Cohen. 2017. On Novices' Interaction with Compiler Error Messages: A Human Factors Approach. In Proceedings of the 2017 ACM Conference on International Computing Education Research. ACM, 74--82.

Digital Library

[34]

Kyle Reestman and Brian Dorn. 2019. Native Language's Effect on Java Compiler Errors. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). ACM, NY, NY, USA, 249--257. https://doi.org/10.1145/3291279.3339423

Digital Library

[35]

Saul Rosen, Robert A. Spurgeon, and Joel K. Donnelly. 1965. PUFFT-The Purdue University Fast FORTRAN Translator. Commun. ACM, Vol. 8, 11 (1965), 661--666.

Digital Library

[36]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research V. 1. 27--43.

Digital Library

[37]

Andreas Stefik and Susanna Siebert. 2013. An Empirical Investigation into Programming Language Syntax. ACM Transactions on Computing Education, Vol. 13, 4 (2013), 1--40.

Digital Library

[38]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1--7.

Digital Library

Cited By

Mailach AGorgosch DSiegmund NSiegmund J(2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10561-6
Poitras ECrane BDempsey DBragg TSiegel ALin M(2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
https://doi.org/10.4018/979-8-3693-1066-3.ch013
Smutny PBojko M(2024)Comparative Analysis of Chatbots Using Large Language Models for Web Development TasksApplied Sciences10.3390/app14211004814:21(10048)Online publication date: 4-Nov-2024
https://doi.org/10.3390/app142110048
Show More Cited By

Index Terms

Using Large Language Models to Enhance Programming Error Messages
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education

Recommendations

Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice
UKICER '24: Proceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research

The sudden emergence of large language models (LLMs) such as ChatGPT has had a disruptive impact throughout the computing education community. LLMs have been shown to excel at producing correct code to CS1 and CS2 problems, and can even act as friendly ...
Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research
ITiCSE-WGR '19: Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education

Diagnostic messages generated by compilers and interpreters such as syntax error messages have been researched for over half of a century. Unfortunately, these messages which include error, warning, and run-time messages, present substantial difficulty ...
On Designing Programming Error Messages for Novices: Readability and its Constituent Factors
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Programming error messages play an important role in learning to program. The cycle of program input and error message response completes a loop between the programmer and the compiler/interpreter and is a fundamental interaction between human and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1

March 2023

1481 pages

ISBN:9781450394314

DOI:10.1145/3545945

General Chairs:
Maureen Doyle
Northern Kentucky University, USA
,
Ben Stephenson
University of Calgary, Canada
,
Program Chairs:
Brian Dorn
University of Nebraska at Omaha, USA
,
Leen-Kiat Soh
University of Nebraska-Lincoln, USA
,
Lina Battestilli
North Carolina State University, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGCSE 2023

Sponsor:

SIGCSE

SIGCSE 2023: The 54th ACM Technical Symposium on Computer Science Education

March 15 - 18, 2023

Toronto ON, Canada

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE TS 2025

Sponsor:
sigcse

The 56th ACM Technical Symposium on Computer Science Education

February 26 - March 1, 2025

Pittsburgh , PA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

98
Total Citations
View Citations
3,410
Total Downloads

Downloads (Last 12 months)1,769
Downloads (Last 6 weeks)141

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mailach AGorgosch DSiegmund NSiegmund J(2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10561-6
Poitras ECrane BDempsey DBragg TSiegel ALin M(2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
https://doi.org/10.4018/979-8-3693-1066-3.ch013
Smutny PBojko M(2024)Comparative Analysis of Chatbots Using Large Language Models for Web Development TasksApplied Sciences10.3390/app14211004814:21(10048)Online publication date: 4-Nov-2024
https://doi.org/10.3390/app142110048
Pardos ZBhandari S(2024)ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skillsPLOS ONE10.1371/journal.pone.030401319:5(e0304013)Online publication date: 24-May-2024
https://doi.org/10.1371/journal.pone.0304013
Vassar ARenzella JRoss ETaylor A(2024)Fine-Tuning Large Language Models for Better Programming Error ExplanationsProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699581(1-2)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699581
Kiesler NScholz IAlbrecht JStappert FWienkop U(2024)Novice Learners of Programming and Generative AI - Prior Knowledge MattersProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699580(1-2)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699580
Korpimies KLaaksonen ALuukkainen M(2024)Unrestricted Use of LLMs in a Software Project Course: Student Perceptions on Learning and Impact on Course PerformanceProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699541(1-7)Online publication date: 12-Nov-2024
https://dl.acm.org/doi/10.1145/3699538.3699541
Yang SBaird MO’Rourke EBrennan KSchneider B(2024)Decoding Debugging Instruction: A Systematic Literature Review of Debugging InterventionsACM Transactions on Computing Education10.1145/369065224:4(1-44)Online publication date: 5-Sep-2024
https://dl.acm.org/doi/10.1145/3690652
Santos EBecker B(2024)Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in PracticeProceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research10.1145/3689535.3689554(1-7)Online publication date: 5-Sep-2024
https://dl.acm.org/doi/10.1145/3689535.3689554
Feng TDenny PWünsche BLuxton-Reilly AWhalley J(2024)An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics QuestionsSIGGRAPH Asia 2024 Educator's Forum10.1145/3680533.3697064(1-8)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680533.3697064
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents