[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3486001.3486228acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

Published: 22 October 2021 Publication History

Abstract

A majority of the current automated evaluation tools focus on grading a program based only on functionally testing the outputs. This approach suffers both false positives (i.e. finding errors where there are not any) and false negatives (missing out on actual errors). In this paper, we present a novel system which emulates manual evaluation of programming assignments based on the structure and not the functional output of the program using structural similarity between the given program and a reference solution. We propose an evaluation rubric for scoring structural similarity with respect to a reference solution. We present an ML based approach to map the system predicted scores to the scores computed using the rubric. Empirical evaluation of the system is done on a corpus of Python programs extracted from the popular programming platform, HackerRank, in combination with programming assignments submitted by students undertaking an undergraduate Python programming course. The preliminary results have been encouraging with the errors reported being as low as 12 percent with a deviation of about 3 percent, showing that the automatically generated scores are in high correlation with the instructor assigned scores.

References

[1]
Kirsti Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer Science Education(2005), 83–102. https://doi.org/10.1080/08993400500150747
[2]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. https://doi.org/10.1145/1961189.1961199
[3]
Michel Chilowicz and Gilles Roussel. 2009. Syntax tree fingerprinting for source code similarity detection. In 2009 IEEE 17th International Conference on Program Comprehension. 243–247. https://doi.org/10.1109/ICPC.2009.5090050
[4]
David Gitchell and Nicholas Tran. 1999. Sim: A Utility for Detecting Similarity in Computer Programs. SIGCSE Bull (1999), 266–270. https://doi.org/10.1145/384266.299783
[5]
Chao Liu, Chen Chen, Jiawei Han, and Philip S. Yu. 2006. GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 872–881. https://doi.org/10.1145/1150402.1150522
[6]
Nikhila K N, Sujit Kumar Chakrabarti, and Manish Gupta. 2021. Discovering Multiple Design Approaches in Programming Assignment Submissions. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (Virtual Event, Republic of Korea) (SAC ’21). Association for Computing Machinery, New York, NY, USA, 1841–1845. https://doi.org/10.1145/3412841.3442140
[7]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825–2830.
[8]
Lutz Prechelt, Guido Malpohl, and Michael Philippsen. 2002. Finding Plagiarisms among a Set of Programs with JPlag. Journal of Universal Computer Science(2002), 1016–1038.
[9]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering. Association for Computing Machinery, 1157–1168. https://doi.org/10.1145/2884781.2884877
[10]
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 76–85. https://doi.org/10.1145/872757.872770
[11]
Gursimran Singh, Shashank Srikant, and Varun Aggarwal. 2016. Question Independent Grading using Machine Learning: The Case of Computer Program Grading. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272. https://doi.org/10.1145/2939672.2939696
[12]
Shashank Srikant and Varun Aggarwal. 2014. A system to grade computer programming skills using machine learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014). https://doi.org/10.1145/2623330.2623377
[13]
Tiantian Wang, Xiaohong Su, Yuying Wang, and Peijun Ma. 2007. Semantic Similarity-Based Grading of Student Programs. Inf. Softw. Technol.(2007), 99–107. https://doi.org/10.1016/j.infsof.2006.03.001
[14]
Michael Wise. 1993. String Similarity via Greedy String Tiling and Running Karp –Rabin Matching. Unpublished Basser Department of Computer Science Report (1993).
[15]
Mengya Zheng, Xingyu Pan, and David Lillis. 2018. CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree. In AICS.

Cited By

View all
  • (2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
  • (2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
  • (2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
October 2021
170 pages
ISBN:9781450385947
DOI:10.1145/3486001
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automated Evaluation
  2. Evaluation Rubric
  3. Program Structural Similarity
  4. Syntax Tree Fingerprinting

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Machine Intelligence and Robotics (MINRO) Center, a K-Tech Center of Excellence at IIIT Bangalore through a grant from the Department of Government of Electronics Information Technology Biotechnology and Science & Technology, Government of Karnataka.

Conference

AIMLSystems 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Grading Programming Assignments by SummarizationProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674426(53-58)Online publication date: 5-Jul-2024
  • (2024)Automated Grading and Feedback Tools for Programming Education: A Systematic ReviewACM Transactions on Computing Education10.1145/363651524:1(1-43)Online publication date: 19-Feb-2024
  • (2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
  • (2024)Graph semantic similarity-based automatic assessment for programming exercisesScientific Reports10.1038/s41598-024-61219-814:1Online publication date: 8-May-2024
  • (2023)Autograding of Programming Skills2023 IEEE 8th International Conference for Convergence in Technology (I2CT)10.1109/I2CT57861.2023.10126211(1-6)Online publication date: 7-Apr-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media