More Web Proxy on the site http://driver.im/

research-article

Open access

Speeding Up Automated Assessment of Programming Exercises

Authors:

Charles Koutcheme,

Arto HellasAuthors Info & Claims

UKICER '22: Proceedings of the 2022 Conference on United Kingdom & Ireland Computing Education Research

Article No.: 3, Pages 1 - 7

https://doi.org/10.1145/3555009.3555013

Published: 01 September 2022 Publication History

All formats PDF

Abstract

Introductory programming courses around the world use automatic assessment. Automatic assessment for programming code is typically performed via unit tests which require computation time to execute, at times in significant amounts, leading to computation costs and delay in feedback to students. We present a step-based approach for speeding up automated assessment to address the issue, consisting of (1) a cache of past programming exercise submissions and their associated test results to avoid retesting equivalent new submissions; (2) static analysis to detect e.g. infinite loops (heuristically) ; (3) a machine learning model to evaluate programs without running them ; and (4) a traditional set of unit tests. When a student submits code for an exercise, the code is evaluated sequentially through each step, providing feedback to the student at the earliest possible time, reducing the need to run tests. We evaluate the impact of the proposed approach using data collected from an introductory programming course and demonstrate a considerable reduction in the number of exercise submissions that require running the tests (up to 80% of exercises). Using the approach leads to faster feedback in a more sustainable way, and also provides opportunities for precise non-exercise specific feedback in steps (2) and (3).

References

[1]

Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.

[2]

Joe Michael Allen, Frank Vahid, Alex Edgcomb, Kelly Downey, and Kris Miller. 2019. An analysis of using many small programs in cs1. In Proc. of the 50th ACM Technical Symposium on Computer Science Education. 585–591.

Digital Library

[3]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proc. of the ACM on Programming Languages 3, POPL (2019), 1–29.

Digital Library

[4]

Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proc. of the 46th ACM technical symposium on computer science education. 522–527.

Digital Library

[5]

Nathaniel Ayewah, William Pugh, David Hovemeyer, J David Morgenthaler, and John Penix. 2008. Using static analysis to find bugs. IEEE software 25, 5 (2008).

Digital Library

[6]

Sahar Badihi, Faridah Akinotcho, Yi Li, and Julia Rubin. 2020. ARDiff: scaling program equivalence checking via iterative abstraction and refinement of common code. In Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symposium on the Foundations of Software Engineering. 13–24.

Digital Library

[7]

Brett A Becker. 2016. An effective approach to enhancing compiler error messages. In Proc. of the 47th ACM Technical Symposium on Computing Science Education.

Digital Library

[8]

Brett A Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, 2019. Compiler error messages considered unhelpful: The landscape of text-based programming error message research. In Proc. of the working group reports on innovation and technology in computer science education. 177–210.

Digital Library

[9]

Neil Christopher Charles Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A large scale repository of novice programmers’ activity. In Proc. of the 45th ACM technical symposium on Computer science education. 223–228.

Digital Library

[10]

Jason Brownlee. 2022. Data preparation for machine learning.

[11]

Michael Carbin, Sasa Misailovic, Michael Kling, and Martin C Rinard. 2011. Detecting and escaping infinite loops with jolt. In European Conf. on Object-Oriented Programming. Springer, 609–633.

[12]

Marek Chrobak and John Noga. 1999. LRU is better than FIFO. Algorithmica 23, 2 (1999), 180–185.

[13]

Guillaume Cleuziou and Frédéric Flouvat. 2021. Learning Student Program Embeddings Using Abstract Execution Traces.International Educational Data Mining Society (2021).

[14]

Anusha Damodaran, Fabio Di Troia, Corrado Aaron Visaggio, Thomas H Austin, and Mark Stamp. 2017. A comparison of static, dynamic, and hybrid analysis for malware detection. J. of Computer Virology and Hacking Techniques 13, 1(2017).

[15]

Karol Danutama and Inggriani Liem. 2013. Scalable autograder and LMS integration. Procedia Technology 11(2013), 388–395.

[16]

I.F. Darwin. 1988. Checking C Programs with Lint. O’Reilly & Associates.

[17]

John L Donaldson, Ann-Marie Lancaster, and Paula H Sposato. 1981. A plagiarism detection system. In Proc. of the twelfth SIGCSE technical symposium on Computer science education. 21–25.

Digital Library

[18]

Stephen H Edwards. 2003. Improving student performance by evaluating how well students test their own programs. J. on Educational Resources in Computing (JERIC) 3, 3 (2003).

[19]

Stephen H Edwards, Nischel Kandru, and Mukund BM Rajagopal. 2017. Investigating static analysis errors in student Java programs. In Proc. of the 2017 ACM Conf. on International Computing Education Research. 65–73.

Digital Library

[20]

George E Forsythe and Niklaus Wirth. 1965. Automatic grading programs. Commun. ACM 8, 5 (1965), 275–278.

Digital Library

[21]

Elena L Glassman, Jeremy Scott, Rishabh Singh, Philip J Guo, and Robert C Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2(2015), 1–35.

Digital Library

[22]

John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.

[23]

Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, and Björn Hartmann. 2017. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proc. of the Fourth (2017) ACM Conf. on Learning@ Scale. 89–98.

Digital Library

[24]

Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proc. of the 2017 11th Joint Meeting on Foundations of Software Engineering. 763–773.

Digital Library

[25]

Jack Hollingsworth. 1960. Automatic graders for programming classes. Commun. ACM 3, 10 (1960), 528–529.

Digital Library

[26]

Andreas Ibing and Alexandra Mai. 2015. A fixed-point algorithm for automated static detection of infinite loops. In 2015 IEEE 16th International Symposium on High Assurance Systems Engineering. IEEE, 44–51.

Digital Library

[27]

Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proc. of the 10th Koli calling int. conf. on computing education research. 86–93.

Digital Library

[28]

Petri Ihantola and Andrew Petersen. 2019. Code Complexity in Introductory Programming Courses. In Proc. of the 52nd Hawaii International Conf. on System Sciences, HICSS 2019. 7662–7670.

[29]

Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. Proc. of the 2015 ITiCSE on Working Group Reports (2015), 41–63.

Digital Library

[30]

David Jackson and Michelle Usher. 1997. Grading student programs using ASSYST. In Proc. of the twenty-eighth SIGCSE technical symposium on Computer science education. 335–339.

Digital Library

[31]

Sonja Johnson-Yu, Nicholas Bowman, Mehran Sahami, and Chris Piech. 2021. SimGrade: Using Code Similarity Measures for More Accurate Human Grading. In EDM.

[32]

Mike Joy, Nathan Griffiths, and Russell Boyatt. 2005. The boss online submission and assessment system. J. on Educ. Resources in Computing (JERIC) 5, 3 (2005).

[33]

Mike Joy and Michael Luck. 1999. Plagiarism in programming assignments. IEEE Transactions on education 42, 2 (1999), 129–133.

Digital Library

[34]

Hong Jin Kang, Tegawendé F. Bissyandé, and David Lo. 2019. Assessing the Generalizability of Code2vec Token Embeddings. In 2019 34th IEEE/ACM International Conf. on Automated Software Engineering (ASE). 1–12.

Digital Library

[35]

Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proc. of the computer science education research conf.43–54.

Digital Library

[36]

William Landi. 1992. Undecidability of static analysis. ACM Letters on Programming Languages and Systems (LOPLAS) 1, 4(1992), 323–337.

Digital Library

[37]

Ronald J Leach. 1995. Using metrics to evaluate student programs. ACM SIGCSE Bulletin 27, 2 (1995), 41–43.

Digital Library

[38]

David Liu and Andrew Petersen. 2019. Static analyses in python programming courses. In Proc. of the 50th ACM Technical Symposium on Computer Science Education. 666–671.

Digital Library

[39]

Samuel Mann and Zelda Frew. 2006. Similarity and originality in code: plagiarism and normal variation in student assignments. In Proc. of the 8th Australasian Conf. on Computing Education-Volume 52. 143–150.

[40]

Susan A Mengel and Vinay Yerramilli. 1999. A case study of the static analysis of the quality of novice student programs. In Proc. of the thirtieth SIGCSE technical symposium on Computer science education. 78–82.

Digital Library

[41]

Peter Naur. 1964. Automatic grading of students’ ALGOL programming. BIT Numerical Mathematics 4, 3 (1964), 177–188.

Digital Library

[42]

José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Trans. Comput. Educ. 22, 3, Article 34 (jun 2022), 40 pages.

Digital Library

[43]

Andrei Papancea, Jaime Spacco, and David Hovemeyer. 2013. An open platform for managing short programming exercises. In Proc. of the ninth annual int. ACM conf. on International computing education research. 47–52.

Digital Library

[44]

Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas Guibas. 2015. Learning program embeddings to propagate feedback on student code. In International conf. on machine Learning. PMLR, 1093–1102.

[45]

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. J. of Universal Computer Science 8, 11 (2002), 1016–1038.

[46]

Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50.

[47]

Francisco Rosales, Antonio García, Santiago Rodríguez, José L Pedraza, Rafael Méndez, and Manuel M Nieto. 2008. Detection of plagiarism in programming assignments. IEEE Transactions on Education 51, 2 (2008), 174–183.

Digital Library

[48]

Simon, Oscar Karnalim, Judy Sheard, Ilir Dema, Amey Karkare, Juho Leinonen, Michael Liut, and Renée McCauley. 2020. Choosing code segments to exclude from code similarity detection. In Proc. of the Working Group Reports on Innovation and Technology in Computer Science Education. 1–19.

[49]

Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2012. Equivalence checking of static affine programs using widening to handle recurrences. ACM Transactions on Programming Languages and Systems (TOPLAS) 34, 3(2012), 1–35.

Digital Library

[50]

Arto Vihavainen, Thomas Vikberg, Matti Luukkainen, and Martin Pärtel. 2013. Scaffolding students’ learning using test my code. In Proc. of the 18th ACM conf. on Innovation and technology in computer science education. 117–122.

Digital Library

[51]

Michael J Wise. 1992. Detection of Similarities in Student Programs: YAP’ing may be Preferable to Plague’ing. Acm Sigcse Bulletin 24, 1 (1992), 268–271.

Digital Library

[52]

Mike Wu, Milan Mosse, Noah Goodman, and Chris Piech. 2019. Zero shot learning for code education: Rubric sampling with deep learning inference. In Proc. of the AAAI Conf. on Artificial Intelligence, Vol. 33. 782–790.

Digital Library

[53]

Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS. 362–373.

[54]

Daniel Zingaro, Yuliya Cherenkova, Olessia Karpova, and Andrew Petersen. 2013. Facilitating code-writing in PI classes. In Proceeding of the 44th ACM technical symposium on Computer science education. 585–590.

Cited By

Mohamed KYousef MMedhat WMohamed EKhoriba GArafa T(2025)Hands-on analysis of using large language models for the auto evaluation of programming assignmentsInformation Systems10.1016/j.is.2024.102473128(102473)Online publication date: Feb-2025
https://doi.org/10.1016/j.is.2024.102473
Koutcheme CHellas AJoyner DKim MWang XXia M(2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664665

Index Terms

Speeding Up Automated Assessment of Programming Exercises
1. Applied computing
  1. Education
    1. Interactive learning environments
2. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

Feedback on Student Programming Assignments: Teaching Assistants vs Automated Assessment Tool
Koli Calling '23: Proceedings of the 23rd Koli Calling International Conference on Computing Education Research

Existing research does not quantify and compare the differences between automated and manual assessment in the context of feedback on programming assignments. This makes it hard to reason about the effects of adopting automated assessment at the expense ...
Automated Assessment of Programming Assignments
CSERC '13: Proceedings of the 3rd Computer Science Education Research Conference on Computer Science Education Research

This is a position paper in which I argue that massive open online programming courses can benefit by the application of automated assessment of programming assignments.

I gathered success factors and identified concerns related to automatic assessment ...
Automatic assessment of program visualization exercises
Koli '08: Proceedings of the 8th International Conference on Computing Education Research

ViLLE is a visualization tool for teaching programming to novice programmers. It has an extendable support for multiple programming languages which enables language-independent learning of programming. As a new feature, ViLLE supports automatically ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

UKICER '22: Proceedings of the 2022 Conference on United Kingdom & Ireland Computing Education Research

September 2022

90 pages

ISBN:9781450397421

DOI:10.1145/3555009

Editors:
Keith Quille
Technological University Dublin
,
Joseph Maguire
University of Glasgow
,
Brett A. Becker
University College Dublin

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

UKICER2022

UKICER2022: The United Kingdom and Ireland Computing Education Research Conference

September 1 - 2, 2022

Dublin, Ireland

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
452
Total Downloads

Downloads (Last 12 months)173
Downloads (Last 6 weeks)19

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohamed KYousef MMedhat WMohamed EKhoriba GArafa T(2025)Hands-on analysis of using large language models for the auto evaluation of programming assignmentsInformation Systems10.1016/j.is.2024.102473128(102473)Online publication date: Feb-2025
https://doi.org/10.1016/j.is.2024.102473
Koutcheme CHellas AJoyner DKim MWang XXia M(2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664665

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents