More Web Proxy on the site http://driver.im/

research-article

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

Authors:

Prateksha Udhayanan,

Rahul Murali Shankar,

Sujit Kumar ChakrabartiAuthors Info & Claims

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

Article No.: 8, Pages 1 - 7

https://doi.org/10.1145/3486001.3486228

Published: 22 October 2021 Publication History

Abstract

A majority of the current automated evaluation tools focus on grading a program based only on functionally testing the outputs. This approach suffers both false positives (i.e. finding errors where there are not any) and false negatives (missing out on actual errors). In this paper, we present a novel system which emulates manual evaluation of programming assignments based on the structure and not the functional output of the program using structural similarity between the given program and a reference solution. We propose an evaluation rubric for scoring structural similarity with respect to a reference solution. We present an ML based approach to map the system predicted scores to the scores computed using the rubric. Empirical evaluation of the system is done on a corpus of Python programs extracted from the popular programming platform, HackerRank, in combination with programming assignments submitted by students undertaking an undergraduate Python programming course. The preliminary results have been encouraging with the errors reported being as low as 12 percent with a deviation of about 3 percent, showing that the automatically generated scores are in high correlation with the instructor assigned scores.

References

[1]

Kirsti Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education(2005), 83–102. https://doi.org/10.1080/08993400500150747

[2]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. https://doi.org/10.1145/1961189.1961199

Digital Library

[3]

Michel Chilowicz and Gilles Roussel. 2009. Syntax tree fingerprinting for source code similarity detection. In 2009 IEEE 17th International Conference on Program Comprehension. 243–247. https://doi.org/10.1109/ICPC.2009.5090050

[4]

David Gitchell and Nicholas Tran. 1999. Sim: A Utility for Detecting Similarity in Computer Programs. SIGCSE Bull (1999), 266–270. https://doi.org/10.1145/384266.299783

Digital Library

[5]

Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 872–881. https://doi.org/10.1145/1150402.1150522

Digital Library

[6]

Nikhila K N, Sujit Kumar Chakrabarti, and Manish Gupta. 2021. Discovering Multiple Design Approaches in Programming Assignment Submissions. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (Virtual Event, Republic of Korea) (SAC ’21). Association for Computing Machinery, New York, NY, USA, 1841–1845. https://doi.org/10.1145/3412841.3442140

Digital Library

[7]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825–2830.

[8]

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science(2002), 1016–1038.

[9]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering. Association for Computing Machinery, 1157–1168. https://doi.org/10.1145/2884781.2884877

Digital Library

[10]

Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 76–85. https://doi.org/10.1145/872757.872770

Digital Library

[11]

Gursimran Singh, Shashank Srikant, and Varun Aggarwal. 2016. Question Independent Grading using Machine Learning: The Case of Computer Program Grading. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272. https://doi.org/10.1145/2939672.2939696

Digital Library

[12]

Shashank Srikant and Varun Aggarwal. 2014. A system to grade computer programming skills using machine learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014). https://doi.org/10.1145/2623330.2623377

Digital Library

[13]

Tiantian Wang, Xiaohong Su, Yuying Wang, and Peijun Ma. 2007. Semantic Similarity-Based Grading of Student Programs. Inf. Softw. Technol.(2007), 99–107. https://doi.org/10.1016/j.infsof.2006.03.001

Digital Library

[14]

Michael Wise. 1993. String Similarity via Greedy String Tiling and Running Karp –Rabin Matching. Unpublished Basser Department of Computer Science Report (1993).

[15]

Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS.

Cited By

Dong DLiang Y(2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674426
Messer MBrown NKölling MShi M(2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1145/3636515
Xia ZChen YNie PWang Z(2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00018
Show More Cited By

Recommendations

Discovering multiple design approaches in programming assignment submissions
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

In this paper, we present a novel approach of automated evaluation of programming assignments (AEPA) the highlight of which is that it automatically identifies multiple solution approaches to the programming question from the set of submitted solutions. ...
LetGrade: An Automated Grading System for Programming Assignments
Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium
Abstract
Manually grading programming assignments is time consuming and tedious, especially if they are incorrect and incomplete. Most existing automated grading systems use testing or program analysis. These systems rely on a single reference solution and ...
Analysis of Automated Evaluation for Multi-document Summarization Using Content-Based Similarity
ICDS '08: Proceedings of the Second International Conference on Digital Society

We introduce an automated evaluation method based on content similarity, and construct a vector space of words, on which we compute cosine similarity of automated summaries and human summaries. The method is tested on DUC 2005 data, and produces ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

October 2021

170 pages

ISBN:9781450385947

DOI:10.1145/3486001

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Machine Intelligence and Robotics (MINRO) Center, a K-Tech Center of Excellence at IIIT Bangalore through a grant from the Department of Government of Electronics Information Technology Biotechnology and Science & Technology, Government of Karnataka.

Conference

AIMLSystems 2021

AIMLSystems 2021: The First International Conference on AI-ML-Systems

October 21 - 23, 2021

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dong DLiang Y(2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674426
Messer MBrown NKölling MShi M(2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1145/3636515
Xia ZChen YNie PWang Z(2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00018
Xiang CWang YZhou QYu Z(2024)Graph semantic similarity-based automatic assessment for programming exercisesScientific Reports10.1038/s41598-024-61219-814:1Online publication date: 8-May-2024
https://doi.org/10.1038/s41598-024-61219-8
Narmada NPati P(2023)Autograding of Programming Skills2023 IEEE 8th International Conference for Convergence in Technology (I2CT)10.1109/I2CT57861.2023.10126211(1-6)Online publication date: 7-Apr-2023
https://doi.org/10.1109/I2CT57861.2023.10126211

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents