Functional code clone detection with syntax and semantics fusion learning
Proceedings of the 29th ACM SIGSOFT international symposium on software …, 2020•dl.acm.org
Clone detection of source code is among the most fundamental software engineering
techniques. Despite intensive research in the past decade, existing techniques are still
unsatisfactory in detecting" functional" code clones. In particular, existing techniques cannot
efficiently extract syntax and semantics information from source code. In this paper, we
propose a novel joint code representation that applies fusion embedding techniques to learn
hidden syntactic and semantic features of source codes. Besides, we introduce a new …
techniques. Despite intensive research in the past decade, existing techniques are still
unsatisfactory in detecting" functional" code clones. In particular, existing techniques cannot
efficiently extract syntax and semantics information from source code. In this paper, we
propose a novel joint code representation that applies fusion embedding techniques to learn
hidden syntactic and semantic features of source codes. Besides, we introduce a new …
Clone detection of source code is among the most fundamental software engineering techniques. Despite intensive research in the past decade, existing techniques are still unsatisfactory in detecting "functional" code clones. In particular, existing techniques cannot efficiently extract syntax and semantics information from source code. In this paper, we propose a novel joint code representation that applies fusion embedding techniques to learn hidden syntactic and semantic features of source codes. Besides, we introduce a new granularity for functional code clone detection. Our approach regards the connected methods with caller-callee relationships as a functionality and the method without any caller-callee relationship with other methods represents a single functionality. Then we train a supervised deep learning model to detect functional code clones. We conduct evaluations on a large dataset of C++ programs and the experimental results show that fusion learning can significantly outperform the state-of-the-art techniques in detecting functional code clones.
ACM Digital Library