More Web Proxy on the site http://driver.im/

Article

Public Access

Autograding "Explain in Plain English" questions using NLP

Authors:

Craig ZillesAuthors Info & Claims

SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education

Pages 1163 - 1169

https://doi.org/10.1145/3408877.3432539

Published: 05 March 2021 Publication History

Abstract

Previous research suggests that "Explain in Plain English" (EiPE) code reading activities could play an important role in the development of novice programmers, but EiPE questions aren't heavily used in introductory programming courses because they (traditionally) required manual grading. We present what we believe to be the first automatic grader for EiPE questions and its deployment in a large-enrollment introductory programming course. Based on a set of questions deployed on a computer-based exam, we find that our implementation has an accuracy of 87-89%, which is similar in performance to course teaching assistants trained to perform this task and compares favorably to automatic short answer grading algorithms developed for other domains. In addition, we briefly characterize the kinds of answers that the current autograder fails to score correctly and the kinds of errors made by students.

References

[1]

Owen Astrachan and David Reed. 1995. AAA and CS 1: The Applied Apprenticeship Approach to CS 1. In Proceedings of the Twenty-sixth SIGCSE Technical Symposium on Computer Science Education (SIGCSE '95). ACM, New York, NY, USA, 1--5. https://doi.org/10.1145/199688.199694

Digital Library

[2]

Sushmita Azad, Binglin Chen, Maxwell Fowler, Matthew West, and Craig Zilles. 2020. Strategies for Deploying Unreliable AI Graders in High-Transparency High-Stakes Exams. In International Conference on Artificial Intelligence in Education. Springer, 16--28.

[3]

Steven Burrows, Iryna Gurevych, and Benno Stein. 2015. The Eras and Trends of Automatic Short Answer Grading. International Journal of Artificial Intelligence in Education, Vol. 25, 1 (01 Mar 2015), 60--117. https://doi.org/10.1007/s40593-014-0026--8

[4]

Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A Validated Scoring Rubric for Explain-in-Plain-English Questions. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE) .

Digital Library

[5]

Binglin Chen, Matthew West, and Craig Zilles. 2018. How Much Randomization is Needed to Deter Collaborative Cheating on Asynchronous Exams?. In Learning at Scale .

[6]

Michael J. Clancy and Marcia C. Linn. 1999. Patterns and Pedagogy. In The Proceedings of the Thirtieth SIGCSE Technical Symposium on Computer Science Education (SIGCSE '99). ACM, New York, NY, USA, 37--42.

[7]

Malcolm Corney, Sue Fitzgerald, Brian Hanks, Raymond Lister, Renee McCauley, and Laurie Murphy. 2014. 'Explain in Plain English' Questions Revisited: Data Structures Problems. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE '14). ACM, New York, NY, USA, 591--596. http://doi.acm.org/10.1145/2538862.2538911

Digital Library

[8]

Malcolm Corney, Raymond Lister, and Donna Teague. 2011. Early Relational Reasoning and the Novice Programmer: Swapping As the "Hello World" of Relational Reasoning. In Proceedings of the Thirteenth Australasian Computing Education Conference - Volume 114 (ACE '11). 95--104.

Digital Library

[9]

M. O. Dzikovska et almbox. 2013. SemEval-2013 task 7: The joint student response analysis and eighth recognizing textual entailment challenge. In Proceedings of the 2nd joint conference on lexical and computational semantics, M. Diab, T. Baldwin, and M. Baroni (Eds.). 1--12.

[10]

Hewlett Foundation. 2012. Automated student assessment prize: Phase two -- short answer scoring, Kaggle Competition.

[11]

Lucas Busatta Galhardi and Jacques Duílio Brancher. 2018. Machine Learning Approach for Automatic Short Answer Grading: A Systematic Review. In Advances in Artificial Intelligence - IBERAMIA 2018, Guillermo R. Simari, Eduardo Fermé, Flabio Gutiérrez Segura, and José Antonio Rodríguez Melquiades (Eds.). Springer International Publishing, Cham, 380--391.

[12]

Fernand Gobet, Peter CR Lane, Steve Croker, Peter CH Cheng, Gary Jones, Iain Oliver, and Julian M Pine. 2001. Chunking mechanisms in human learning. Trends in cognitive sciences, Vol. 5, 6 (2001), 236--243.

[13]

Wael Hassan Gomaa and Aly Aly Fahmy. 2020. Ans2vec: A Scoring System for Short Answers. In The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019), Aboul Ella Hassanien, Ahmad Taher Azar, Tarek Gaber, Roheet Bhatnagar, and Mohamed F. Tolba (Eds.). Springer International Publishing, Cham, 586--595.

[14]

Sarah J. Hatteberg and Kody Steffy. 2013. Increasing Reading Compliance of Undergraduates: An Evaluation of Compliance Methods. Teaching Sociology, Vol. 41, 4 (2013), 346--352. https://doi.org/10.1177/0092055X13490752

[15]

Vighnesh Iyer and Craig Zilles. 2021. Pattern Census: A Characterization of Pattern Usage in Early Programming Courses. In Proceedings of the SIGCSE Technical Symposium (SIGCSE) .

Digital Library

[16]

Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv Ratn Shah, Ponnurangam Kumaraguru, and Roger Zimmermann. 2019. Get IT Scored Using AutoSAS ? An Automated System for Scoring Short Answers . Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (July 2019), 9662--9669. https://doi.org/10.1609/aaai.v33i01.33019662

Digital Library

[17]

Raymond Lister, Elizabeth S Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney, Jan Erik Moström, Kate Sanders, Otto Sepp"al"a, Beth Simon, and Lynda Thomas. 2004. A multi-national study of reading and tracing skills in novice programmers. ACM SIGCSE Bulletin, Vol. 36, 4 (2004), 119--150.

[18]

Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship Between Explaining, Tracing and Writing Skills in Introductory Programming. In Proceedings of the 14th Annual ACM SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE '09). ACM, New York, NY, USA, 161--165. https://doi.org/10.1145/1562877.1562930

Digital Library

[19]

Tiaoqiao Liu, Wenbiao Ding, Zhiwei Wang, Jiliang Tang, Gale Yan Huang, and Zitao Liu. 2019. Automatic Short Answer Grading via Multiway Attention Networks . arXiv:1909.10166 [cs] (2019). http://arxiv.org/abs/1909.10166

[20]

Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the Fourth International Workshop on Computing Education Research. ACM, 101--112.

Digital Library

[21]

Ahmed Magooda, Mohamed A. Zahran, Mohsen Rashwan, Hazem M. Raafat, and Magda B. Fayek. 2016. Vector Based Techniques for Short Answer Grading. In FLAIRS Conference .

[22]

Alvarado Mantecon and Jesus Gerardo. 2019. Towards the Automatic Classification of Student Answers to Open-ended Questions. Thesis. Université d'Ottawa / University of Ottawa. https://doi.org/10.20381/ruor-23341

[23]

Sandra P Marshall. 1995. Schemas in problem solving .Cambridge University Press.

[24]

Michael McCracken et almbox. 2001. A Multi-national, Multi-institutional Study of Assessment of Programming Skills of First-year CS Students. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (ITiCSE-WGR '01). ACM, New York, NY, USA, 125--180.

[25]

Katherine B McKeithen, Judith Spencer Reitman, Henry H Rueter, and Stephen C Hirtle. 1981. Knowledge organization and skill differences in computer programmers. Cognitive Psychology, Vol. 13, 3 (1981), 307--325.

[26]

M. Mohler, R. Bunescu, and R. Mihalcea. 2011. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies . 752--762.

[27]

Laurie Murphy, Renée McCauley, and Sue Fitzgerald. 2012. 'Explain in Plain English' Questions: Implications for Teaching. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE '12). ACM, New York, NY, USA, 385--390. https://doi.org/10.1145/2157136.2157249

Digital Library

[28]

Ifeanyi G. Ndukwe, Ben K. Daniel, and Chukwudi E. Amadi. 2019. A Machine Learning Grading System Using Chatbots. In Artificial Intelligence in Education (Lecture Notes in Computer Science ). Springer International Publishing, Cham, 365--368.

[29]

Bertram Opitz, Nicola K Ferdinand, and Axel Mecklinger. 2011. Timing matters: the impact of immediate and delayed feedback on artificial language learning. Frontiers in human neuroscience, Vol. 5 (2011), 8.

[30]

Alexander Renkl, Robin Stark, Hans Gruber, and Heinz Mandl. 1998. Learning from Worked-Out Examples: The Effects of Example Variability and Elicited Self-Explanations. Contemporary educational psychology, Vol. 23 (01 1998), 90--108. https://doi.org/10.1006/ceps.1997.0959

[31]

Robert S Rist. 1989. Schema creation in programming. Cognitive Science, Vol. 13, 3 (1989), 389--414.

[32]

Swarnadeep Saha, Tejas I. Dhamecha, Smit Marvaniya, Renuka Sindhgatta, and Bikram Sengupta. 2018. Sentence Level or Token Level Features for Automatic Short Answer Grading?: Use Both. In Artificial Intelligence in Education (Lecture Notes in Computer Science ). Springer International Publishing, Cham, 503--517.

[33]

Chul Sung, Tejas Indulal Dhamecha, and Nirmal Mukhi. 2019. Improving Short Answer Grading Using Transformer-Based Pre-training. In Artificial Intelligence in Education . Vol. 11625. Springer International Publishing, Cham, 469--481.

[34]

Neslihan Suzen, Alexander Gorban, Jeremy Levesley, and Evgeny Mirkes. 2019. Automatic Short Answer Grading and Feedback Using Text Mining Methods . CoRR (2019). http://arxiv.org/abs/1807.10543 arXiv: 1807.10543.

[35]

John Sweller. 2011. Cognitive Load Theory . In Psychology of learning and motivation . Vol. 55. Elsevier, 37--76.

[36]

John Sweller, Jeroen JG Van Merrienboer, and Fred GWC Paas. 1998. Cognitive architecture and instructional design. Educational psychology review, Vol. 10, 3 (1998), 251--296.

[37]

Anne Venables, Grace Tan, and Raymond Lister. 2009. A Closer Look at Tracing, Explaining and Code Writing Skills in the Novice Programmer. In Proceedings of the Fifth International workshop on Computing Education Research. ACM, 117--128.

Digital Library

[38]

Anthony J Viera, Joanne M Garrett, et almbox. 2005. Understanding interobserver agreement: the kappa statistic. Fam med, Vol. 37, 5 (2005), 360--363.

[39]

Jacqueline Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P K Ajith Kumar, and Christine Prasad. 2006. An Australasian study of Reading and Comprehension Skills in Novice Programmers, using the Bloom and SOLO Taxonomies . Eighth Australasian Computing Education Conference (ACE2006) (2006).

Digital Library

[40]

Susan Wiedenbeck. 1985. Novice/expert differences in programming skills. International Journal of Man-Machine Studies, Vol. 23, 4 (1985), 383 -- 390. https://doi.org/10.1016/S0020--7373(85)80041--9

Digital Library

[41]

Leon E. Winslow. 1996. Programming Pedagogy-- a Psychological Overview. SIGCSE Bull., Vol. 28, 3 (Sept. 1996), 17--22. https://doi.org/10.1145/234867.234872

Digital Library

[42]

Benjamin Xie, Dastyni Loksa, Greg L Nelson, Matthew J Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Andrew J Ko. 2019. A theory of instruction for introductory programming skills. Computer Science Education, Vol. 29, 2--3 (2019), 205--253.

[43]

Xi Yang, Yuwei Huang, Fuzhen Zhuang, Lishan Zhang, and Shengquan Yu. 2018. Automatic Chinese Short Answer Grading with Deep Autoencoder. In Artificial Intelligence in Education (Lecture Notes in Computer Science ). Springer International Publishing, Cham, 399--404.

[44]

Craig Zilles, Matthew West, Geoffrey Herman, and Timothy Bretl. 2019. Every university should have a computer-based testing facility. In Proceedings of the 11th International Conference on Computer Supported Education (CSEDU) .

Cited By

Feldman MAnderson C(2024)Non-Expert Programmers in the Generative AI FutureProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work10.1145/3663384.3663393(1-19)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3663384.3663393
Smith DDenny PFowler MJoyner DKim MWang XXia M(2024)Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt WritingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662039(39-50)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662039
Denny PSmith DFowler MPrather JBecker BLeinonen JMonga MLonati VBarendsen ESheard JPaterson J(2024)Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting SkillsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653587(283-289)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653587
Show More Cited By

Index Terms

Autograding "Explain in Plain English" questions using NLP
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education
        CS1
      2. Student assessment

Recommendations

Attitudes Surrounding an Imperfect AI Autograder
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Deployment of AI assessment tools in education is widespread, but work on students’ interactions and attitudes towards imperfect autograders is comparatively lacking. This paper presents students’ perceptions surrounding a ∼ 90% accurate automated ...
How should we ‘Explain in plain English’? Voices from the Community
ICER 2021: Proceedings of the 17th ACM Conference on International Computing Education Research

“Explain in plain English” (EipE) questions are seen as an important developmental activity and assessment tool in the research community studying how people learn to program, but they aren’t widely used in practice because of difficulty of grading and ...
A Validated Scoring Rubric for Explain-in-Plain-English Questions
SIGCSE '20: Proceedings of the 51st ACM Technical Symposium on Computer Science Education

Previous research has identified the ability to read code and understand its high-level purpose as an important developmental skill that is harder to do (for a given piece of code) than executing code in one's head for a given input ("code tracing"), ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education

March 2021

1454 pages

ISBN:9781450380621

DOI:10.1145/3408877

General Chairs:
Mark Sherriff
University of Virginia, USA
,
Laurence D. Merkle
Air Force Institute of Technology, USA
,
Program Chairs:
Pamela Cutter
Kalamazoo College, USA
,
Alvaro Monge
California State University, Long Beach, USA
,
Judithe Sheard
Monash University, Australia

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Funding Sources

National Science Foundation

Conference

SIGCSE '21

Sponsor:

SIGCSE

SIGCSE '21: The 52nd ACM Technical Symposium on Computer Science Education

March 13 - 20, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE TS 2025

Sponsor:
sigcse

The 56th ACM Technical Symposium on Computer Science Education

February 26 - March 1, 2025

Pittsburgh , PA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
958
Total Downloads

Downloads (Last 12 months)252
Downloads (Last 6 weeks)30

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feldman MAnderson C(2024)Non-Expert Programmers in the Generative AI FutureProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work10.1145/3663384.3663393(1-19)Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3663384.3663393
Smith DDenny PFowler MJoyner DKim MWang XXia M(2024)Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt WritingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662039(39-50)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662039
Denny PSmith DFowler MPrather JBecker BLeinonen JMonga MLonati VBarendsen ESheard JPaterson J(2024)Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting SkillsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653587(283-289)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653587
Smith DZilles CMonga MLonati VBarendsen ESheard JPaterson J(2024)Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" QuestionsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653582(171-177)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653582
Kerslake CDenny PSmith DPrather JLeinonen JLuxton-Reilly AMacNeil SDorodchi MZhange MCooper S(2024)Integrating Natural Language Prompting Tasks in Introductory Programming CoursesProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 110.1145/3649165.3690125(88-94)Online publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1145/3649165.3690125
Smith DZilles CStephenson BStone JBattestilli LRebelsky SShoop L(2024)Evaluating Large Language Model Code Generation as an Autograding Mechanism for "Explain in Plain English" QuestionsProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 210.1145/3626253.3635542(1824-1825)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3626253.3635542
Nguyen SBabe HZi YGuha AAnderson CFeldman M(2024)How Beginning Programmers and Code LLMs (Mis)read Each OtherProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642706(1-26)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642706
Oli PBanjade RLekshmi Narayanan ABrusilovsky PRus VHong JPark J(2024)Exploring The Effectiveness of Reading vs. Tutoring For Enhancing Code Comprehension For NovicesProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636007(38-47)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3636007
Naikar MKhandagale SJadhav VJadhav GKhade A(2024)Design of an Auto Evaluation Model for Subjective Answers Using Natural Language Processing and Machine Learning TechniquesProceedings of 4th International Conference on Artificial Intelligence and Smart Energy10.1007/978-3-031-61471-2_14(200-209)Online publication date: 12-Jun-2024
https://doi.org/10.1007/978-3-031-61471-2_14
Lawrence RFoss SUrazova T(2023)Evaluation of Submission Limits and Regression Penalties to Improve Student Behavior with Automatic Assessment SystemsACM Transactions on Computing Education10.1145/359121023:3(1-24)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3591210
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten