[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICSE.2019.00023acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Towards automating precision studies of clone detectors

Published: 25 May 2019 Publication History

Abstract

Current research in clone detection suffers from poor ecosystems for evaluating precision of clone detection tools. Corpora of labeled clones are scarce and incomplete, making evaluation labor intensive and idiosyncratic, and limiting inter-tool comparison. Precision-assessment tools are simply lacking.
We present a semiautomated approach to facilitate precision studies of clone detection tools. The approach merges automatic mechanisms of clone classification with manual validation of clone pairs. We demonstrate that the proposed automatic approach has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, we aggregate the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research.

References

[1]
B. Hummel, E. Juergens, L. Heinemann, and M. Conradt, "Index-based code clone detection: incremental, distributed, scalable," in IEEE International Conference on Software Maintenance (ICSM 2010), pp. 1--9, IEEE, 2010.
[2]
C. K. Roy, J. R. Cordy, and R. Koschke, "Comparison and evaluation of code clone detection techniques and tools: A qualitative approach," Science of Computer Programming, vol. 74, no. 7, pp. 470 -- 495, 2009.
[3]
P. Weissgerber and S. Diehl, "Identifying refactorings from source-code changes," in 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), pp. 231--240, IEEE, 2006.
[4]
S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, and H. Iida, "Shinobi: A tool for automatic code clone detection in the ide," in 2009 16th Working Conference on Reverse Engineering, pp. 313--314, Oct 2009.
[5]
T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, S. Kawaguchi, and H. Iida, "Shinobi: A real-time code clone detection tool for software maintenance," Nara Institute of Science and Technology, p. 26, 2008.
[6]
D. Rattan, R. Bhatia, and M. Singh, "Software clone detection: A systematic review," Information and Software Technology, vol. 55, no. 7, pp. 1165 -- 1199, 2013.
[7]
M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, (New York, NY, USA), pp. 87--98, ACM, 2016.
[8]
V. Saini, F. Farmahinifarahani, Y. Lu, P. Baldi, and C. Lopes, "Oreo: Detection of clones in the twilight zone," in Proceedings of the 2018 26th ACM SIGSOFT International Symposium on Foundations of Software Engineering (To Appear), FSE 2018, (New York, NY, USA), ACM, 2018. https://arxiv.org/abs/1806.05837.
[9]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou, "Cp-miner: finding copy-paste and related bugs in large-scale software code," IEEE Transactions on Software Engineering, vol. 32, pp. 176--192, March 2006.
[10]
H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes, "Sourcer-ercc: Scaling code clone detection to big-code," in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 1157--1168, May 2016.
[11]
J. Svajlenko and C. K. Roy, "Fast and flexible large-scale clone detection with cloneworks," in 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 27--30, May 2017.
[12]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu, "Deckard: Scalable and accurate tree-based detection of code clones," in Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), pp. 96--105, IEEE Computer Society, 2007.
[13]
I. D. Baxter, A. Yahin, L. Moura, M. Sant' Anna, and L. Bier, "Clone detection using abstract syntax trees," in Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272), pp. 368--377, Nov 1998.
[14]
M. Gabel, L. Jiang, and Z. Su, "Scalable detection of semantic clones," in Proceedings of the 30th International Conference on Software Engineering, ICSE '08, (New York, NY, USA), pp. 321--330, ACM, 2008.
[15]
C. K. Roy and J. R. Cordy, "Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization," in 2008 16th IEEE International Conference on Program Comprehension, pp. 172--181, June 2008.
[16]
S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, "Comparison and evaluation of clone detection tools," IEEE Transactions on Software Engineering, vol. 33, pp. 577--591, Sept 2007.
[17]
H. Murakami, Y. Higo, and S. Kusumoto, "A dataset of clone references with gaps," in Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, (New York, NY, USA), pp. 412--415, ACM, 2014.
[18]
E. Flores, P. Rosso, L. Moreno, and E. Villatoro-Tello, "On the detection of source code re-use," in Proceedings of the Forum for Information Retrieval Evaluation, FIRE '14, (New York, NY, USA), pp. 21--30, ACM, 2015.
[19]
J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, "Towards a big data curated benchmark of inter-project code clones," in 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 476--480, Sept 2014.
[20]
J. Svajlenko and C. K. Roy, "Bigcloneeval: A clone detection tool evaluation framework with bigclonebench," in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 596--600, Oct 2016.
[21]
P. Wang, J. Svajlenko, Y. Wu, Y. Xu, and C. K. Roy, "Ccaligner: A token based large-gap clone detector," in Proceedings of the 40th International Conference on Software Engineering, ICSE '18, (New York, NY, USA), pp. 1066--1077, ACM, 2018.
[22]
J. Svajlenko and C. K. Roy, "Evaluating clone detection tools with bigclonebench," in 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 131--140, Sept 2015.
[23]
V. Saini, H. Sajnani, and C. Lopes, "Cloned and non-cloned java methods: a comparative study," Empirical Software Engineering, vol. 23, pp. 2232--2278, Aug 2018.
[24]
V. Saini, F. Farmahinifarahani, Y. Lu, P. Baldi, and C. Lopes, "Mondego/oreo-artifact: Oreo first release," July 2018.
[25]
P. Baldi and Y. Chauvin, "Neural networks for fingerprint recognition," Neural Computation, vol. 5, no. 3, pp. 402--418, 1993.
[26]
N. Göde and R. Koschke, "Incremental clone detection," in 13th European Conference on Software Maintenance and Reengineering (CSMR), pp. 219--228, IEEE, 2009.
[27]
M. S. Uddin, C. K. Roy, and K. A. Schneider, "Simcad: An extensible and faster clone detection tool for large scale software systems," in IEEE 21st International Conference on Program Comprehension (ICPC), pp. 236--238, IEEE, 2013.
[28]
"Copy/paste detector (cpd)." https://pmd.github.io/pmd-6.6.0/pmd_userdocs_cpd.html. Accessed: 2018-08-23.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering
May 2019
1318 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Badges

Author Tags

  1. clone detection
  2. machine learning
  3. open source labeled datasets
  4. precision evaluation

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 272
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media