[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3555009.3555013acmotherconferencesArticle/Chapter ViewAbstractPublication PagesukicerConference Proceedingsconference-collections
research-article
Open access

Speeding Up Automated Assessment of Programming Exercises

Published: 01 September 2022 Publication History

Abstract

Introductory programming courses around the world use automatic assessment. Automatic assessment for programming code is typically performed via unit tests which require computation time to execute, at times in significant amounts, leading to computation costs and delay in feedback to students. We present a step-based approach for speeding up automated assessment to address the issue, consisting of (1) a cache of past programming exercise submissions and their associated test results to avoid retesting equivalent new submissions; (2) static analysis to detect e.g. infinite loops (heuristically) ; (3) a machine learning model to evaluate programs without running them ; and (4) a traditional set of unit tests. When a student submits code for an exercise, the code is evaluated sequentially through each step, providing feedback to the student at the earliest possible time, reducing the need to run tests. We evaluate the impact of the proposed approach using data collected from an introductory programming course and demonstrate a considerable reduction in the number of exercise submissions that require running the tests (up to 80% of exercises). Using the approach leads to faster feedback in a more sustainable way, and also provides opportunities for precise non-exercise specific feedback in steps (2) and (3).

References

[1]
Kirsti M Ala-Mutka. 2005. A survey of automated assessment approaches for programming assignments. Computer science education 15, 2 (2005), 83–102.
[2]
Joe Michael Allen, Frank Vahid, Alex Edgcomb, Kelly Downey, and Kris Miller. 2019. An analysis of using many small programs in cs1. In Proc. of the 50th ACM Technical Symposium on Computer Science Education. 585–591.
[3]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proc. of the ACM on Programming Languages 3, POPL (2019), 1–29.
[4]
Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proc. of the 46th ACM technical symposium on computer science education. 522–527.
[5]
Nathaniel Ayewah, William Pugh, David Hovemeyer, J David Morgenthaler, and John Penix. 2008. Using static analysis to find bugs. IEEE software 25, 5 (2008).
[6]
Sahar Badihi, Faridah Akinotcho, Yi Li, and Julia Rubin. 2020. ARDiff: scaling program equivalence checking via iterative abstraction and refinement of common code. In Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symposium on the Foundations of Software Engineering. 13–24.
[7]
Brett A Becker. 2016. An effective approach to enhancing compiler error messages. In Proc. of the 47th ACM Technical Symposium on Computing Science Education.
[8]
Brett A Becker, Paul Denny, Raymond Pettit, Durell Bouchard, Dennis J Bouvier, Brian Harrington, Amir Kamil, Amey Karkare, Chris McDonald, Peter-Michael Osera, 2019. Compiler error messages considered unhelpful: The landscape of text-based programming error message research. In Proc. of the working group reports on innovation and technology in computer science education. 177–210.
[9]
Neil Christopher Charles Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A large scale repository of novice programmers’ activity. In Proc. of the 45th ACM technical symposium on Computer science education. 223–228.
[10]
Jason Brownlee. 2022. Data preparation for machine learning.
[11]
Michael Carbin, Sasa Misailovic, Michael Kling, and Martin C Rinard. 2011. Detecting and escaping infinite loops with jolt. In European Conf. on Object-Oriented Programming. Springer, 609–633.
[12]
Marek Chrobak and John Noga. 1999. LRU is better than FIFO. Algorithmica 23, 2 (1999), 180–185.
[13]
Guillaume Cleuziou and Frédéric Flouvat. 2021. Learning Student Program Embeddings Using Abstract Execution Traces.International Educational Data Mining Society (2021).
[14]
Anusha Damodaran, Fabio Di Troia, Corrado Aaron Visaggio, Thomas H Austin, and Mark Stamp. 2017. A comparison of static, dynamic, and hybrid analysis for malware detection. J. of Computer Virology and Hacking Techniques 13, 1(2017).
[15]
Karol Danutama and Inggriani Liem. 2013. Scalable autograder and LMS integration. Procedia Technology 11(2013), 388–395.
[16]
I.F. Darwin. 1988. Checking C Programs with Lint. O’Reilly & Associates.
[17]
John L Donaldson, Ann-Marie Lancaster, and Paula H Sposato. 1981. A plagiarism detection system. In Proc. of the twelfth SIGCSE technical symposium on Computer science education. 21–25.
[18]
Stephen H Edwards. 2003. Improving student performance by evaluating how well students test their own programs. J. on Educational Resources in Computing (JERIC) 3, 3 (2003).
[19]
Stephen H Edwards, Nischel Kandru, and Mukund BM Rajagopal. 2017. Investigating static analysis errors in student Java programs. In Proc. of the 2017 ACM Conf. on International Computing Education Research. 65–73.
[20]
George E Forsythe and Niklaus Wirth. 1965. Automatic grading programs. Commun. ACM 8, 5 (1965), 275–278.
[21]
Elena L Glassman, Jeremy Scott, Rishabh Singh, Philip J Guo, and Robert C Miller. 2015. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2(2015), 1–35.
[22]
John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.
[23]
Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, and Björn Hartmann. 2017. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proc. of the Fourth (2017) ACM Conf. on Learning@ Scale. 89–98.
[24]
Vincent J Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In Proc. of the 2017 11th Joint Meeting on Foundations of Software Engineering. 763–773.
[25]
Jack Hollingsworth. 1960. Automatic graders for programming classes. Commun. ACM 3, 10 (1960), 528–529.
[26]
Andreas Ibing and Alexandra Mai. 2015. A fixed-point algorithm for automated static detection of infinite loops. In 2015 IEEE 16th International Symposium on High Assurance Systems Engineering. IEEE, 44–51.
[27]
Petri Ihantola, Tuukka Ahoniemi, Ville Karavirta, and Otto Seppälä. 2010. Review of recent systems for automatic assessment of programming assignments. In Proc. of the 10th Koli calling int. conf. on computing education research. 86–93.
[28]
Petri Ihantola and Andrew Petersen. 2019. Code Complexity in Introductory Programming Courses. In Proc. of the 52nd Hawaii International Conf. on System Sciences, HICSS 2019. 7662–7670.
[29]
Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, 2015. Educational data mining and learning analytics in programming: Literature review and case studies. Proc. of the 2015 ITiCSE on Working Group Reports (2015), 41–63.
[30]
David Jackson and Michelle Usher. 1997. Grading student programs using ASSYST. In Proc. of the twenty-eighth SIGCSE technical symposium on Computer science education. 335–339.
[31]
Sonja Johnson-Yu, Nicholas Bowman, Mehran Sahami, and Chris Piech. 2021. SimGrade: Using Code Similarity Measures for More Accurate Human Grading. In EDM.
[32]
Mike Joy, Nathan Griffiths, and Russell Boyatt. 2005. The boss online submission and assessment system. J. on Educ. Resources in Computing (JERIC) 5, 3 (2005).
[33]
Mike Joy and Michael Luck. 1999. Plagiarism in programming assignments. IEEE Transactions on education 42, 2 (1999), 129–133.
[34]
Hong Jin Kang, Tegawendé F. Bissyandé, and David Lo. 2019. Assessing the Generalizability of Code2vec Token Embeddings. In 2019 34th IEEE/ACM International Conf. on Automated Software Engineering (ASE). 1–12.
[35]
Hieke Keuning, Bastiaan Heeren, and Johan Jeuring. 2014. Strategy-based feedback in a programming tutor. In Proc. of the computer science education research conf.43–54.
[36]
William Landi. 1992. Undecidability of static analysis. ACM Letters on Programming Languages and Systems (LOPLAS) 1, 4(1992), 323–337.
[37]
Ronald J Leach. 1995. Using metrics to evaluate student programs. ACM SIGCSE Bulletin 27, 2 (1995), 41–43.
[38]
David Liu and Andrew Petersen. 2019. Static analyses in python programming courses. In Proc. of the 50th ACM Technical Symposium on Computer Science Education. 666–671.
[39]
Samuel Mann and Zelda Frew. 2006. Similarity and originality in code: plagiarism and normal variation in student assignments. In Proc. of the 8th Australasian Conf. on Computing Education-Volume 52. 143–150.
[40]
Susan A Mengel and Vinay Yerramilli. 1999. A case study of the static analysis of the quality of novice student programs. In Proc. of the thirtieth SIGCSE technical symposium on Computer science education. 78–82.
[41]
Peter Naur. 1964. Automatic grading of students’ ALGOL programming. BIT Numerical Mathematics 4, 3 (1964), 177–188.
[42]
José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated Assessment in Computer Science Education: A State-of-the-Art Review. ACM Trans. Comput. Educ. 22, 3, Article 34 (jun 2022), 40 pages.
[43]
Andrei Papancea, Jaime Spacco, and David Hovemeyer. 2013. An open platform for managing short programming exercises. In Proc. of the ninth annual int. ACM conf. on International computing education research. 47–52.
[44]
Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas Guibas. 2015. Learning program embeddings to propagate feedback on student code. In International conf. on machine Learning. PMLR, 1093–1102.
[45]
Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. J. of Universal Computer Science 8, 11 (2002), 1016–1038.
[46]
Kelly Rivers and Kenneth R Koedinger. 2013. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), Vol. 50.
[47]
Francisco Rosales, Antonio García, Santiago Rodríguez, José L Pedraza, Rafael Méndez, and Manuel M Nieto. 2008. Detection of plagiarism in programming assignments. IEEE Transactions on Education 51, 2 (2008), 174–183.
[48]
Simon, Oscar Karnalim, Judy Sheard, Ilir Dema, Amey Karkare, Juho Leinonen, Michael Liut, and Renée McCauley. 2020. Choosing code segments to exclude from code similarity detection. In Proc. of the Working Group Reports on Innovation and Technology in Computer Science Education. 1–19.
[49]
Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2012. Equivalence checking of static affine programs using widening to handle recurrences. ACM Transactions on Programming Languages and Systems (TOPLAS) 34, 3(2012), 1–35.
[50]
Arto Vihavainen, Thomas Vikberg, Matti Luukkainen, and Martin Pärtel. 2013. Scaffolding students’ learning using test my code. In Proc. of the 18th ACM conf. on Innovation and technology in computer science education. 117–122.
[51]
Michael J Wise. 1992. Detection of Similarities in Student Programs: YAP’ing may be Preferable to Plague’ing. Acm Sigcse Bulletin 24, 1 (1992), 268–271.
[52]
Mike Wu, Milan Mosse, Noah Goodman, and Chris Piech. 2019. Zero shot learning for code education: Rubric sampling with deep learning inference. In Proc. of the AAAI Conf. on Artificial Intelligence, Vol. 33. 782–790.
[53]
Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS. 362–373.
[54]
Daniel Zingaro, Yuliya Cherenkova, Olessia Karpova, and Andrew Petersen. 2013. Facilitating code-writing in PI classes. In Proceeding of the 44th ACM technical symposium on Computer science education. 585–590.

Cited By

View all
  • (2025)Hands-on analysis of using large language models for the auto evaluation of programming assignmentsInformation Systems10.1016/j.is.2024.102473128(102473)Online publication date: Feb-2025
  • (2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
UKICER '22: Proceedings of the 2022 Conference on United Kingdom & Ireland Computing Education Research
September 2022
90 pages
ISBN:9781450397421
DOI:10.1145/3555009
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2022

Check for updates

Author Tags

  1. automated assessment
  2. automatic assessment
  3. educational data mining
  4. feedback
  5. machine learning
  6. source code
  7. static analysis
  8. sustainability

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

UKICER2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)173
  • Downloads (Last 6 weeks)19
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Hands-on analysis of using large language models for the auto evaluation of programming assignmentsInformation Systems10.1016/j.is.2024.102473128(102473)Online publication date: Feb-2025
  • (2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media