[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2824864.2824878acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

On the Detection of SOurce COde Re-use

Published: 05 December 2014 Publication History

Abstract

This paper summarizes the goals, organization and results of the first SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. The detection of source code re-use is an important research field for both software industry and academia fields. Accordingly, PAN@FIRE track, named SOurce COde Re-use (SOCO) focused on the detection of re-used source codes in C/C++ and Java programming languages. Participant systems were asked to annotate several source codes whether or not they represent cases of source code re-use. In total five teams submitted 17 runs. The training set consisted of annotations made by several experts, a feature which turns the SOCO 2014 collection in a useful data set for future evaluations and, at the same time, it establishes a standard evaluation framework for future research works on the posed shared task.

References

[1]
C. Arwin and S. Tahaghoghi. Plagiarism detection across programming languages. Proceedings of the 29th Australian Computer Science Conference, Australian Computer Society, 48:277--286, 2006.
[2]
N. Baer and R. Zeidman. Measuring whitespace pattern sequence as an indication of plagiarism. Journal of Software Engineering and Applications, 5(4):249--254, 2012.
[3]
M. Chilowicz, E. Duris, and G. Roussel. Syntax tree fingerprinting for source code similarity detection. In Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on, pages 243--247, 2009.
[4]
D. Chuda, P. Navrat, B. Kovacova, and P. Humay. The issue of (software) plagiarism: A student view. Education, IEEE Transactions on, 55(1):22--28, 2012.
[5]
G. Cosma and M. Joy. Evaluating the performance of lsa for source-code plagiarism detection. Informatica, 36(4):409--424, 2013.
[6]
B. Cui, J. Li, T. Guo, J. Wang, and D. Ma. Code comparison system based on abstract syntax tree. In Broadband Network and Multimedia Technology (IC-BNMT), 3rd IEEE International Conference on, pages 668--673, Oct 2010.
[7]
J. A. W. Faidhi and S. K. Robinson. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ., 11(1):11--19, Jan. 1987.
[8]
Fire, editor. FIRE 2014 Working Notes. Sixth International Workshop of the Forum for Information Retrieval Evaluation, Bangalore, India, 5--7 December, 2014.
[9]
J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
[10]
E. Flores, A. Barrón-Cedeño, L. Moreno, and P. Rosso. Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education, pages n/a--n/a, 2014.
[11]
E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. DeSoCoRe: Detecting source code re-use across programming languages. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session, NAACL-HLT, pages 1--4. Association for Computational Linguistics, 2012.
[12]
E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. Towards the Detection of Cross-Language Source Code Reuse. Proceedings of 16th International Conference on Applications of Natural Language to Information Systems, NLDB-2011, Springer-Verlag, LNCS(6716), pages 250--253, 2011.
[13]
E. Flores, M. Ibarra-Romero, L. Moreno, G. Sidorov, and P. Rosso. Modelos de recuperación de información basados en n-gramas aplicados a la reutilización de código fuente. In Proc. 3rd Spanish Conf. on Information Retrieval, pages 185--188, 2014.
[14]
D. Ganguly and G. J. Jones. Dcu@ fire-2014: an information retrieval approach for source code plagiarism detection. In Fire {8}.
[15]
R. García-Hernández and Y. Lendeneva. Identification of similar source codes based on longest common substrings. In Fire {8}.
[16]
M. Joy and M. Luck. Plagiarism in programming assignments. Education, IEEE Transactions on, 42(2):129--133, May 1999.
[17]
A. Marcus, A. Sergeyev, V. Rajlich, and J. Maletic. An information retrieval approach to concept location in source code. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on, pages 214--223, Nov 2004.
[18]
S. Narayanan and S. Simi. Source code plagiarism detection and performance analysis using fingerprint based distance measure method. In Proc. of 7th International Conference on Computer Science Education, ICCSE '12, pages 1065--1068, July 2012.
[19]
M. Potthast, M. Hagen, A. Beyer, M. Busse, M. Tippmann, P. Rosso, and B. Stein. Overview of the 6th international competition on plagiarism detection. In L. Cappellato, N. Ferro, M. Halvey, and W. Kraaij, editors, Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., volume 1180 of CEUR Workshop Proceedings, pages 845--876. CEUR-WS.org, 2014.
[20]
L. Prechelt, G. Malpohl, and M. Philippsen. Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science, 8(11):1016--1038, 2002.
[21]
I. Rahal and C. Wielga. Source code plagiarism detection using biological string similarity algorithms. Journal of Information & Knowledge Management, 13(3), 2014.
[22]
A. Ramírez-de-la Cruz, G. Ramírez-de-la Rosa, C. Sánchez-Sánchez, W. A. Luna-Ramírez, H. Jiménez-Salazar, and C. Rodríguez-Lucatero. Uam@soco 2014: Detection of source code reuse by means of combining different types of representations. In Fire {8}.
[23]
F. Rosales, A. García, S. Rodríguez, J. L. Pedraza, R. Méndez, and M. M. Nieto. Detection of plagiarism in programming assignments. IEEE Transactions on Education, 51(2):174--183, 2008.
[24]
K. Sparck and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report, 5266, University of Cambridge, 1975.
[25]
G. Whale. Software metrics and plagiarism detection. Journal of Systems and Software, 13(2):131--138, 1990.

Cited By

View all
  • (2023)SimilaCode: Programming Source Code Similarity Detection System Based on NLP2023 15th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter)10.1109/IIAI-AAI-Winter61682.2023.00040(171-178)Online publication date: 11-Dec-2023
  • (2023)A systematic literature review on source code similarity measurement and clone detectionJournal of Systems and Software10.1016/j.jss.2023.111796204:COnline publication date: 20-Sep-2023
  • (2022)Classification feature sets for source code plagiarism detection in JavaJournal of Engineering and Applied Science10.1186/s44147-022-00155-869:1Online publication date: 8-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2014
151 pages
ISBN:9781450337557
DOI:10.1145/2824864
  • Editors:
  • Prasenjit Majumder,
  • Mandar Mitra,
  • Sukomal Pal,
  • Madhulika Agrawal,
  • Parth Mehta
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Evaluation framework
  2. Plagiarism detection
  3. SOCO
  4. Source code re-use
  5. Test collections

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FIRE '14
FIRE '14: Forum for Information Retrieval Evaluation
December 5 - 7, 2014
Bangalore, India

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SimilaCode: Programming Source Code Similarity Detection System Based on NLP2023 15th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter)10.1109/IIAI-AAI-Winter61682.2023.00040(171-178)Online publication date: 11-Dec-2023
  • (2023)A systematic literature review on source code similarity measurement and clone detectionJournal of Systems and Software10.1016/j.jss.2023.111796204:COnline publication date: 20-Sep-2023
  • (2022)Classification feature sets for source code plagiarism detection in JavaJournal of Engineering and Applied Science10.1186/s44147-022-00155-869:1Online publication date: 8-Nov-2022
  • (2022)Dolos: Language‐agnostic plagiarism detection in source codeJournal of Computer Assisted Learning10.1111/jcal.1266238:4(1046-1061)Online publication date: 9-Mar-2022
  • (2022)Unification of Source-Code Re-Use Similarity MeasuresAdvances in Computational Intelligence10.1007/978-3-031-19493-1_31(397-409)Online publication date: 24-Oct-2022
  • (2022)Linear Optimization for Solving Other NLP TasksEvaluation of Text Summaries Based on Linear Optimization of Content Metrics10.1007/978-3-031-07214-7_5(137-148)Online publication date: 19-Aug-2022
  • (2021)Report on the FIRE 2020 evaluation initiativeACM SIGIR Forum10.1145/3476415.347641855:1(1-11)Online publication date: 16-Jul-2021
  • (2021)A Proposed Model for Source Code Reuse Detection in Computer ProgramsIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-020-00403-845:3(1001-1014)Online publication date: 20-Jan-2021
  • (2021)Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modificationsEmpirical Software Engineering10.1007/s10664-021-09990-426:5Online publication date: 1-Sep-2021
  • (2021)Code Similarity in Clone DetectionCode Clone Analysis10.1007/978-981-16-1927-4_10(135-150)Online publication date: 4-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media