[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3587102.3588785acmconferencesArticle/Chapter ViewAbstractPublication PagesiticseConference Proceedingsconference-collections
research-article
Open access

Comparing Code Explanations Created by Students and Large Language Models

Published: 30 June 2023 Publication History

Abstract

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course (n ≈ 1000) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education.

References

[1]
Solmaz Abdi, Hassan Khosravi, Shazia Sadiq, and Gianluca Demartini. 2021. Evaluating the Quality of Learning Resources: A Learnersourcing Approach. IEEE Transactions on Learning Technologies, Vol. 14, 1 (2021), 81--92.
[2]
Siti-Soraya Abdul-Rahman and Benedict du Boulay. 2014. Learning programming via worked-examples: Relation of learning styles to cognitive load. Computers in Human Behavior, Vol. 30 (2014), 286--298. https://doi.org/10.1016/j.chb.2013.09.007
[3]
Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1. ACM.
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[5]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[6]
Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2011. A Controlled Experiment for Program Comprehension through Trace Visualization. IEEE Transactions on Software Engineering, Vol. 37, 3 (2011), 341--355.
[7]
Kathryn Cunningham, Yike Qiao, Alex Feng, and Eleanor O'Rourke. 2022. Bringing "High-Level" Down to Earth: Gaining Clarity in Conversational Programmer Learning Goals. In Proc. of the 53rd ACM Technical Symp. on Computer Science Education V. 1 (Providence, RI, USA) (SIGCSE 2022). ACM, 551--557.
[8]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1.
[9]
Paul Denny, Andrew Luxton-Reilly, and Beth Simon. 2009. Quality of Student Contributed Questions Using PeerWise. In Proc. of the Eleventh Australasian Conf. on Computing Education - Volume 95. Australian Computer Society, Inc.
[10]
Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proc. of the 17th ACM Annual Conf. on Innovation and Technology in Computer Science Education. ACM, New York, NY, USA.
[11]
Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022. Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).
[12]
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proc. of the 20th Australasian Computing Education Conf. 83--89.
[13]
James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conf. ACM, 10--19.
[14]
Jean M. Griffin. 2016. Learning by Taking Apart: Deconstructing Code by Reading, Tracing, and Debugging. In Proc. of the 17th Annual Conf. on Information Technology Education. ACM, 148--153.
[15]
Philip J Guo. 2013. Online python tutor: embeddable web-based program visualization for cs education. In Proc. of the 44th ACM technical Symp. on Computer science education. 579--584.
[16]
Brian Hanks, Sue Fitzgerald, Renée McCauley, Laurie Murphy, and Carol Zander. 2011. Pair programming in education: a literature review. Computer Science Education, Vol. 21, 2 (2011), 135--173. https://doi.org/10.1080/08993408.2011.579808
[17]
Regina Hebig, Truong Ho-Quang, Rodi Jolak, Jan Schröder, Humberto Linero, Magnus Ågren, and Salome Honest Maro. 2020. How do Students Experience and Judge Software Comprehension Techniques?. In Proc. of the 28th Int. Conf. on Program Comprehension. 425--435.
[18]
Julie S Hui, Darren Gergle, and Elizabeth M Gerber. 2018. Introassist: A tool to support writing introductory help requests. In Proc. of the 2018 CHI Conf. on Human Factors in Computing Systems. 1--13.
[19]
Dave S Kerby. 2014. The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, Vol. 3 (2014), 11--IT.
[20]
Teemu Lehtinen, Lassi Haaranen, and Juho Leinonen. 2023. Automated Questionnaires About Students' JavaScript Programs: Towards Gauging Novice Programming Processes. In Proc. of the 25th Australasian Computing Education Conf.
[21]
Teemu Lehtinen, Aleksi Lukkarinen, and Lassi Haaranen. 2021. Students Struggle to Explain Their Own Program Code. In Proc. of the 26th ACM Conf. on Innovation and Technology in Computer Science Education V. 1. ACM, 206--212.
[22]
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1. 563--569.
[23]
Juho Leinonen, Nea Pirttinen, and Arto Hellas. 2020. Crowdsourcing Content Creation for SQL Practice. In Proc. of the 2020 ACM Conf. on Innovation and Technology in Computer Science Education. 349--355.
[24]
Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship between Explaining, Tracing and Writing Skills in Introductory Programming. SIGCSE Bull., Vol. 41, 3 (2009), 161--165.
[25]
Stephen MacNeil, Zijian Ding, Kexin Quan, Thomas j Parashos, Yajie Sun, and Steven P Dow. 2021. Framing Creative Work: Helping Novices Frame Better Problems through Interactive Scaffolding. In Creativity and Cognition. 1--10.
[26]
Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In Proc. of the 54th ACM Technical Symp. on Computer Science Education.
[27]
Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating Diverse Code Explanations Using the GPT-3 Large Language Model. In Proc. of the 2022 ACM Conf. on Int. Computing Education Research - Volume 2. ACM, 37--39.
[28]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.
[29]
Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The Impact of Adding Textual Explanations to Next-Step Hints in a Novice Programming Environment. In Proc. of the 2019 ACM Conf. on Innovation and Technology in Computer Science Education. ACM, 520--526.
[30]
Kenneth O McGraw and Seok P Wong. 1992. A common language effect size statistic. Psychological bulletin, Vol. 111, 2 (1992), 361.
[31]
Laurie Murphy, Sue Fitzgerald, Raymond Lister, and Renée McCauley. 2012. Ability to 'explain in Plain English' Linked to Proficiency in Computer-Based Programming. In Proc. of the Ninth Annual Int. Conf. on Int. Computing Education Research. ACM, 111--118.
[32]
Henrik Nygren, Juho Leinonen, Nea Pirttinen, Antti Leinonen, and Arto Hellas. 2019. Experimenting with model solutions as a support mechanism. In Proc. of the 1st UK & Ireland Computing Education Research Conf. 1--7.
[33]
Steve Oney, Christopher Brooks, and Paul Resnick. 2018. Creating guided code explanations with chat.codes. Proc. of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--20.
[34]
Nea Pirttinen, Vilma Kangas, Irene Nikkarinen, Henrik Nygren, Juho Leinonen, and Arto Hellas. 2018. Crowdsourcing programming assignments with CrowdSorcerer. In Proc. of the 23rd Annual ACM Conf. on Innovation and Technology in Computer Science Education. 326--331.
[35]
Nea Pirttinen and Juho Leinonen. 2022. Can Students Review Their Peers? Comparison of Peer and Instructor Reviews. In Proc. of the 27th ACM Conf. on Innovation and Technology in Computer Science Education Vol 1.
[36]
Margaret M Reek. 1995. A top-down approach to teaching programming. In Proc. of the twenty-sixth SIGCSE technical symp. on Computer science education. 6--9.
[37]
Kate Sanders, Judy Sheard, Brett A Becker, Anna Eckerdal, and Sally Hamouda. 2019. Inferential statistics in computing education research: A methodological review. In Proc. of the 2019 ACM conf. on int. comp. education research. 177--185.
[38]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proc. of the 2022 ACM Conf. on Int. Computing Education Research - Volume 1. ACM, 27--43.
[39]
Judy Sheard, Angela Carbone, Raymond Lister, Beth Simon, Errol Thompson, and Jacqueline L. Whalley. 2008. Going SOLO to Assess Novice Programmers. In Proc. of the 13th Annual Conf. on Innovation and Technology in Computer Science Education. ACM, 209--213.
[40]
Simon and Susan Snowdon. 2011. Explaining Program Code: Giving Students the Answer Helps - but Only Just. In Proc. of the Seventh Int. Workshop on Computing Education Research. ACM, 93--100.
[41]
Leigh Ann Sudol-DeLyser, Mark Stehlik, and Sharon Carver. 2012. Code Comprehension Problems as Learning Events. In Proc. of the 17th ACM Annual Conf. on Innovation and Technology in Computer Science Education. ACM, 81--86.
[42]
Ron Sun, Edward Merrill, and Todd Peterson. 2000. Knowledge Acquisition Via Bottom-up Learning. Knowledge-Based Systems (2000), 249--291.
[43]
Zahid Ullah, Adidah Lajis, Mona Jamjoom, Abdulrahman Altalhi, Abdullah Al-Ghamdi, and Farrukh Saleem. 2018. The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education, Vol. 26, 6 (2018), 2328--2341.
[44]
Arto Vihavainen, Craig S Miller, and Amber Settle. 2015. Benefits of self-explanation in introductory programming. In Proc. of the 46th ACM Technical Symp. on Computer Science Education. 284--289.
[45]
A. Von Mayrhauser and A.M. Vans. 1995. Program comprehension during software maintenance and evolution. Computer, Vol. 28, 8 (1995), 44--55.
[46]
Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA statement on p-values: context, process, and purpose. The American Statistician, Vol. 70, 2 (2016), 129--133.
[47]
Jacqueline L. Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P. K. Ajith Kumar, and Christine Prasad. 2006. An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proc. of the 8th Australasian Conf. on Computing Education - Volume 52. Australian Computer Society, Inc., AUS, 243--252.
[48]
Rui Zhi, Thomas W. Price, Samiha Marwan, Alexandra Milliken, Tiffany Barnes, and Min Chi. 2019. Exploring the Impact of Worked Examples in a Novice Programming Environment. In Proc. of the 50th ACM Technical Symp. on Computer Science Education. ACM, 98--104.

Cited By

View all
  • (2025)PestGPT: Leveraging Large Language Models and IoT for Timely and Customized Recommendation Generation in Sustainable Pest ManagementIEEE Internet of Things Magazine10.1109/IOTM.001.24000368:1(26-33)Online publication date: Jan-2025
  • (2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
  • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1
June 2023
694 pages
ISBN:9798400701382
DOI:10.1145/3587102
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2023

Check for updates

Author Tags

  1. CS1
  2. ChatGPT
  3. GPT-3
  4. GPT-4
  5. code comprehension
  6. code explanations
  7. foundation models
  8. large language models
  9. natural language generation
  10. resource generation

Qualifiers

  • Research-article

Funding Sources

  • Ulla Tuominen Foundation

Conference

ITiCSE 2023
Sponsor:

Acceptance Rates

Overall Acceptance Rate 552 of 1,613 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,285
  • Downloads (Last 6 weeks)150
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)PestGPT: Leveraging Large Language Models and IoT for Timely and Customized Recommendation Generation in Sustainable Pest ManagementIEEE Internet of Things Magazine10.1109/IOTM.001.24000368:1(26-33)Online publication date: Jan-2025
  • (2025)“Ok Pal, we have to code that now”: interaction patterns of programming beginners with a conversational chatbotEmpirical Software Engineering10.1007/s10664-024-10561-630:1Online publication date: 1-Feb-2025
  • (2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
  • (2024)Configuring a GPTs Chatbot to Generate Educational Contents for Programming Novice LearnersThe Journal of Korean Association of Computer Education10.32431/kace.2024.27.4.01627:4(211-224)Online publication date: 31-Jul-2024
  • (2024)Novice Learners of Programming and Generative AI - Prior Knowledge MattersProceedings of the 24th Koli Calling International Conference on Computing Education Research10.1145/3699538.3699580(1-2)Online publication date: 12-Nov-2024
  • (2024)Exploring Human-Centered Approaches in Generative AI and Introductory Programming Research: A Scoping ReviewProceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research10.1145/3689535.3689553(1-7)Online publication date: 5-Sep-2024
  • (2024)Unfolding Programming: How to Use AI Tools in Introductory Computing CoursesProceedings of the 25th Annual Conference on Information Technology Education10.1145/3686852.3687073(49-55)Online publication date: 10-Oct-2024
  • (2024)A Comparative Analysis of Large Language Models for Code Documentation GenerationProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664765(65-73)Online publication date: 10-Jul-2024
  • (2024)Comparing Feedback from Large Language Models and Instructors: Teaching Computer Science at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664660(335-339)Online publication date: 9-Jul-2024
  • (2024)Combining LLM-Generated and Test-Based Feedback in a MOOC for ProgrammingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662040(177-187)Online publication date: 9-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media