[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2351676.2351725acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

Boreas: an accurate and scalable token-based approach to code clone detection

Published: 03 September 2012 Publication History

Abstract

Detecting code clones in a program has many applications in software engineering and other related fields. In this paper, we present Boreas, an accurate and scalable token-based approach for code clone detection. Boreas introduces a novel counting-based method to define the characteristic matrices, which are able to describe the program segments distinctly and effectively for the purpose of clone detection. We conducted experiments on JDK 7 and Linux kernel 2.6.38.6 source code. Experimental results show that Boreas is able to match the detecting accuracy of a recently proposed syntactic-based tool Deckard, with the execution time reduced by more than an order of magnitude.

References

[1]
I. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, 1998.
[2]
M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE 2008, pages 321–330, 2008.
[3]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. In ICSE, pages 96–105, 2007.
[4]
E. Jürgens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE, pages 485–495, 2009.
[5]
T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Eng, 28(7):654–670, 2002.
[6]
J. Krinke. Identifying similar code with program dependence graphs. In WCRE, pages 301–309, 2001.
[7]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Trans. Software Eng, 32(3):176–192, 2006.
[8]
C. Liu, C. Chen, J. Han, and P. S. Yu. Gplag: detection of software plagiarism by program dependence graph analysis. In KDD ’06, pages 872–881, 2006.
[9]
C. K. Roy and J. R. Cordy. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In ICPC, pages 172–181, 2008.
[10]
C. K. Roy, J. R. Cordy, and R. Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci. Comput. Program, 74(7):470–495, 2009.
[11]
Y. Yuan and Y. Guo. CMCD: Count Matrix based Code Clone Detection. In APSEC, pages 250–257, 2011.

Cited By

View all
  • (2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
  • (2024)A Similarity-based Stacked Deep Learning Architectures for Detection of Software Clones2024 4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)10.1109/ICECCME62383.2024.10796546(1-6)Online publication date: 4-Nov-2024
  • (2024)Out of Step: Code Clone Detection for Mobile Apps Across Different Language CodebasesScience of Computer Programming10.1016/j.scico.2024.103112(103112)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
September 2012
409 pages
ISBN:9781450312042
DOI:10.1145/2351676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code clone detection
  2. count matrix
  3. count vector

Qualifiers

  • Article

Conference

ASE'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
  • (2024)A Similarity-based Stacked Deep Learning Architectures for Detection of Software Clones2024 4th International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)10.1109/ICECCME62383.2024.10796546(1-6)Online publication date: 4-Nov-2024
  • (2024)Out of Step: Code Clone Detection for Mobile Apps Across Different Language CodebasesScience of Computer Programming10.1016/j.scico.2024.103112(103112)Online publication date: Apr-2024
  • (2023)TCCCD: Triplet-Based Cross-Language Code Clone DetectionApplied Sciences10.3390/app13211208413:21(12084)Online publication date: 6-Nov-2023
  • (2023)Graph-of-Code: Semantic Clone Detection Using Graph FingerprintsIEEE Transactions on Software Engineering10.1109/TSE.2023.3276780(1-18)Online publication date: 2023
  • (2023)Using the uniqueness of global identifiers to determine the provenance of Python software source codeEmpirical Software Engineering10.1007/s10664-023-10317-828:5Online publication date: 20-Jul-2023
  • (2022)An Experimental Comparison of Clone Detection Techniques using Java Bytecode2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00026(139-148)Online publication date: Dec-2022
  • (2022)Software system comparison with semantic source code embeddingsEmpirical Software Engineering10.1007/s10664-022-10122-927:3Online publication date: 17-Mar-2022
  • (2022)Definition, approaches, and analysis of code duplication detection (2006–2020): a critical reviewNeural Computing and Applications10.1007/s00521-022-07707-234:23(20507-20537)Online publication date: 1-Dec-2022
  • (2022)A Study on the Metrics-Based Duplicated Code Type Smell Detection Techniques Relating the Metrics to Its QualityInventive Communication and Computational Technologies10.1007/978-981-19-4960-9_41(515-532)Online publication date: 14-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media