[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3324884.3416541acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

CCGraph: a PDG-based code clone detector with approximate graph matching

Published: 27 January 2021 Publication History

Abstract

Software clone detection is an active research area, which is very important for software maintenance, bug detection, etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG, etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDG-based tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using an approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools.

References

[1]
Audris Mockus. 2007. Large-scale code reuse in open source software. In First International Workshop on Emerging Trends in FLOSS Research and Development. IEEE, 7--7.
[2]
Suresh Thummalapenta, Luigi Cerulo, Lerina Aversano, and Massimiliano Di Penta. 2010. An empirical study on the maintenance of source code clones. Empirical Software Engineering 15, 1 (2010), 1--34.
[3]
Yun Lin, Zhenchang Xing, Xin Peng, Yang Liu, Jun Sun, Wenyun Zhao and Jinsong Dong. 2014. Clonepedia: Summarizing code clones by common syntactic context for software maintenance. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 341--350.
[4]
Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned buggy code detector. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 310--320.
[5]
Chao Liu, Chen Chen, Jiawei Han, and Philip S Yu. 2006. GPLAG: detection of software plagiarism by program dependence graph analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 872--881.
[6]
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on software engineering, 33(9), 577--591.
[7]
David S Johnson. 1987. The NP-completeness column: An ongoing guide. Journal of algorithms, 8(2), 285--303.
[8]
Mark Gabel, Lingxiao Jiang and Zhendong Su. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international conference on Software engineering. ACM, 321--330.
[9]
Min Wang, Pengcheng Wang, Yun Xu. 2017. CCSharp: An efficient three-phase code clone detector using modified PDGs. In 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 100--109.
[10]
Navarin N, Sperduti A. 2017. Approximated Neighbours MinHash Graph Node Kernel. European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning, ESANN, 281--286.
[11]
Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep), 2539--2561.
[12]
Foggia Pasquale, Percannella G, Vento M. 2014. Graph matching and learning in pattern recognition in the last 10 years. International Journal of Pattern Recognition and Artificial Intelligence, 28(01), 1450001.
[13]
Ya Jun, Liu Z S, Chang, Q. 2016. The network attack graph analysis based on graph kernel. Journal of Military Communications Technology, Vol.37: 20--25.
[14]
Gaüzère B, Brun L, Villemin D. 2011. Two New Graph Kernels and Applications to Chemoinformatics. In International Conference on Graph-based Representations in Pattern Recognition.
[15]
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina Lopes. 2018. Oreo: Detection of Clones in the Twilight Zone. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'18). ACM, 354--365.
[16]
Krinke Jens. 2001. Identifying similar code with program dependence graphs. In Proceedings Eighth Working Conference on Reverse Engineering. IEEE, 301--309.
[17]
Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, Chanchal K Roy. 2018. CCAligner: a token based large-gap clone detector. In Proceedings of the 40th International Conference on Software Engineering. ACM, 1066--1077.
[18]
William W Cohen, Pradeep Ravikumar, and Stephen E Fienberg. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks. IIWeb, Vol. 73--78.
[19]
Van der Loo and Mark PJ. 2014. The stringdist package for approximate string matching. The R Journal 6.1, 111--122.
[20]
Feigenbaum James. 2016. JAROWINKLER: Stata module to calculate the Jaro-Winkler distance between strings.
[21]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern information retrieval. ACM press New York, Vol. 463.
[22]
Gang Zhao and Jeff Huang. 2018. Deepsim: deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 141--151.
[23]
Cuoq P., Kirchner F., Kosmatov N., Prevosto V., Signoles J. and Yakobowski B. 2012. Frama-c. In International Conference on Software Engineering and Formal Methods. Springer, 233--247.
[24]
Higo Yoshiki, and Shinji Kusumoto. 2011. Code clone detection on specialized PDGs with heuristics. In European Conference on Software Maintenance and Reengineering. IEEE, 75--84.
[25]
Gang Zhang, Xin Peng, Zhenchang Xing, and Wenyun Zhao. 2012. Cloning practices: Why developers clone and what can be changed. In IEEE International Conference on Software Maintenance (ICSM). IEEE, 285--294.
[26]
Yun Lin, Zhenchang Xing, Yinxing Xue, Yang Liu, Xin Peng, Jun Sun. 2014. Detecting differences across multiple instances of code clones. In Proceedings of the 36th International Conference on Software Engineering. ACM, 164--174.
[27]
Jeffrey Svajlenko and Chanchal K. Roy. 2015. Evaluating Clone Detection Tools with BigCloneBench. In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015). IEEE, 131--140.
[28]
Ambient Software Evoluton Group. (2013). IJaDataset 2.0. http://secold.org/projects/seclone.
[29]
Svajlenko Jeffrey and Chanchal K. Roy. 2016. Bigcloneeval: A clone detection tool evaluation framework with bigclonebench. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 596--600.
[30]
Rattan Dhavleesh, Rajesh Bhatia, and Maninder Singh. 2013. Software clone detection: A systematic review. Information and Software Technology 55.7 (2013), 1165--1199.
[31]
Baker Brenda S. 1997. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM Journal on Computing 26.5 (1997), 1343--1362.
[32]
Baker Brenda S. 1995. On finding duplication and near-duplication in large software systems. In Proceedings of 2nd Working Conference on Reverse Engineering. IEEE, 86--95.
[33]
Cordy James R. and Chanchal K Roy. 2011. The NiCad clone detector. In 2011 IEEE 19th International Conference on Program Comprehension. IEEE, 219--220.
[34]
Johnson J Howard. 1994. Substring Matching for Clone Detection and Change Tracking. ICSM. Vol. 94, 120--126.
[35]
Kamiya Toshihiro, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28.7 (2002), 654--670.
[36]
Zhenmin Li, Shan Lu, Suvda Myagmar and Yuanyuan Zhou. 2004. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code.OSdi. Vol. 4. No. 19, 289--302.
[37]
Göde Nils and Rainer Koschke. 2009. Incremental clone detection. In 2009 13th European Conference on Software Maintenance and Reengineering. IEEE, 219--228.
[38]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 1157--1168.
[39]
Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant Anna and Lorraine Bier. 1998. Clone detection using abstract syntax trees. In Proceedings of the International Conference on Software Maintenance. IEEE, 368--377.
[40]
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society, 96--105.
[41]
Jens Krinke. 2001. Identifying similar code with program dependence graphs. In Proceedings of Eighth Working Conference on Reverse Engineering. IEEE, 301--309.
[42]
Raghavan Komondoor and Susan Horwitz. 2001. Using slicing to identify duplication in source code. In International Static Analysis Symposium. Springer, 40--56.
[43]
Sargsyan, Sevak, Kurmangaleev S, Belevantsev A and Avetisyan A. 2016. Scalable and accurate detection of code clones. Programming and Computer Software 42.1 (2016), 27--33.
[44]
J-F Patenaude, Ettore Merlo, Michel Dagenais, and Bruno Laguë. 1999. Extending software quality assessment techniques to java systems. In Proceedings of the 7th International Workshop on Program Comprehension. IEEE, 49--56.
[45]
Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Kostas Kontogiannis. 1999. Measuring clone based reengineering opportunities. In Proceedings of the 6th International Software Metrics Symposium. IEEE, 292--303.
[46]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2015. A comparative study on the bug-proneness of different types of code clones. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME 15). IEEE, 91--100.
[47]
Huihui Wei and Ming Li. 2017. Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 17). 3034--3040.
[48]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 87--98.
[49]
Heejung Kim, Yungbum Jung, Sunghun Kim, and Kwankeun Yi. 2011. MeCC:memory comparison-based clone detector. In Proceedings of the 33rd International on Software Engineering. ACM, 301--310.
[50]
Hao Yu, Wing Lam, Long Chen, Ge Li, Tao Xie and Qianxiang Wang. 2019. Neural detection of semantic code clones via tree-based convolution. In IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019: 70--80.

Cited By

View all
  • (2024)A Longitudinal Analysis Of Replicas in the Wild Wild AndroidProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695546(1821-1833)Online publication date: 27-Oct-2024
  • (2024)SICode: Embedding-Based Subgraph Isomorphism Identification for Bug DetectionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3646556(304-315)Online publication date: 15-Apr-2024
  • (2024)Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644420(125-136)Online publication date: 15-Apr-2024
  • Show More Cited By

Index Terms

  1. CCGraph: a PDG-based code clone detector with approximate graph matching

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
    December 2020
    1449 pages
    ISBN:9781450367684
    DOI:10.1145/3324884
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. WL graph kernel
    2. clone detection
    3. program dependence graph

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    ASE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)106
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Longitudinal Analysis Of Replicas in the Wild Wild AndroidProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695546(1821-1833)Online publication date: 27-Oct-2024
    • (2024)SICode: Embedding-Based Subgraph Isomorphism Identification for Bug DetectionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3646556(304-315)Online publication date: 15-Apr-2024
    • (2024)Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644420(125-136)Online publication date: 15-Apr-2024
    • (2024)DSFM: Enhancing Functional Code Clone Detection with Deep Subtree InteractionsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639215(1-12)Online publication date: 20-May-2024
    • (2024)Machine Learning is All You Need: A Simple Token-based Approach for Effective Code Clone DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639114(1-13)Online publication date: 20-May-2024
    • (2024)Semantic Code Clone Detection Based on Community DetectionInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450032334:10(1661-1692)Online publication date: 26-Jul-2024
    • (2024)FCNN: Simple neural networks for complex code tasksJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10197036:2(101970)Online publication date: Feb-2024
    • (2024)CloneRipples: predicting change propagation between code clone instances by graph-based deep learningEmpirical Software Engineering10.1007/s10664-024-10567-030:1Online publication date: 30-Oct-2024
    • (2023)A Novel Source Code Clone Detection Method Based on Dual-GCN and IVHFSElectronics10.3390/electronics1206131512:6(1315)Online publication date: 9-Mar-2023
    • (2023)TCCCD: Triplet-Based Cross-Language Code Clone DetectionApplied Sciences10.3390/app13211208413:21(12084)Online publication date: 6-Nov-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media