Instant code clone search

MW Lee, JW Roh, S Hwang, S Kim - Proceedings of the eighteenth ACM …, 2010 - dl.acm.org
MW Lee, JW Roh, S Hwang, S Kim
Proceedings of the eighteenth ACM SIGSOFT international symposium on …, 2010dl.acm.org
In this paper, we propose a scalable instant code clone search engine for large-scale
software repositories. While there are commercial code search engines available, they treat
software as text and often fail to find semantically related code. Meanwhile, existing tools for
semantic code clone searches take a" post-mortem" approach involving the detection of
clones" after" the code development is completed, and hence, fail to return the results
instantly. In clear contrast, we combine the strength of these two lines of existing research …
In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.
ACM Digital Library