[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3674805.3690757acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Code Clone Configuration as a Multi-Objective Search Problem

Published: 24 October 2024 Publication History

Abstract

Clone detection is an automated process for finding duplicated code within a project’s code base or between online sources. Nowadays, the code cloning community advocates that developers must be aware of the clones they may have in their code bases. In modern clone detection, rank-based tools appear as the ones able to handle the large code corpora that are necessary to identify online clones. However, such tools are sensitive to their parameters, which directly affects their clone detection abilities. Moreover, existing parameter optimization approaches for clone detectors are not meant for rank-based tools. To overcome this issue and facilitate empirical studies of code clones, we introduce Multi-objective Code Clone Configuration, a new approach based on multi-objective optimization to search for an optimal set of parameters for a rank-based clone detection tool. In our empirical evaluation, we ran 3 baseline search algorithms and NSGA-II to assess their performance in this new optimization problem. Additionally, we compared the optimized configurations with the default one. Our results show that NSGA-II was the algorithm that achieved the best performance, finding better configurations than those of the baseline algorithms. Finally, the optimized configurations achieved improvements of 71.08% and 46.29% for our fitness functions.

References

[1]
2015. Simian - Similarity Analyzer. http://www.harukizaemon.com/simian.
[2]
Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289–305.
[3]
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering 33, 9 (2007), 577–591.
[4]
Nicola Beume, Boris Naujoks, and Michael Emmerich. 2007. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research 181, 3 (2007), 1653–1669.
[5]
Magiel Bruntink, Arie Van Deursen, Remco Van Engelen, and Tom Tourwe. 2005. On the use of clone detection for identifying crosscutting concern code. IEEE Transactions on Software Engineering 31, 10 (2005), 804–818.
[6]
Muslim Chochlov, Gul Aftab Ahmed, James Vincent Patten, Guoxian Lu, Wei Hou, David Gregg, and Jim Buckley. 2022. Using a nearest-neighbour, BERT-based approach for scalable clone detection. In ICSME ’22. IEEE, 582–591.
[7]
James R Cordy and Chanchal K Roy. 2011. The NiCad clone detector. In 2011 IEEE 19th International Conference on Program Comprehension. IEEE, 219–220.
[8]
KAPSER Cory. 2006. Cloning Considered Harmful’Considered Harmful. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006).
[9]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197.
[10]
Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2015. Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the aaai conference on artificial intelligence, Vol. 29.
[11]
Nils Göde and Rainer Koschke. 2009. Incremental clone detection. In CSMR ’09. IEEE, 219–228.
[12]
Muhammad Hammad, Onder Babur, Hamid Abdul Basit, and Mark van den Brand. 2021. Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval. PeerJ Computer Science 7 (2021), e737.
[13]
Muntasir Hoq, Yang Shi, Juho Leinonen, Damilola Babalola, Collin Lynch, Thomas Price, and Bita Akram. 2024. Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, Vol. 3487. ACM, 526–532.
[14]
Elmar Juergens, Florian Deissenboeck, and Benjamin Hummel. 2009. Clonedetective-a workbench for clone detection research. In ICSE ’09. IEEE, 603–606.
[15]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654–670.
[16]
Cory J. Kapser and Michael W. Godfrey. 2008. Cloning considered harmful considered harmful: patterns of cloning in software. Empirical Software Engineering 13, 6 (Dec 2008), 645–692.
[17]
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting working code examples. In ICSE ’14. 664–675.
[18]
Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY – A Code-to-Code Search Engine. In ICSE ’18. 946–957.
[19]
Rainer Koschke. 2007. Survey of research on software clones. Duplication, Redundancy, and Similarity in Software - Dagstuhl Seminar 06301 (2007), 24.
[20]
Jens Krinke and Chaiyong Ragkhitwetsagul. 2021. Code Similarity in Clone Detection. In Code Clone Analysis. Springer Singapore, 135–160.
[21]
Jingyue Li and Michael D Ernst. 2012. CBCD: Cloned buggy code detector. In ICSE ’12. 310–320.
[22]
Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. OOPSLA ’17 1 (Oct 2017), 1–28.
[23]
Adriaan Lotter, Sherlock A Licorish, Bastin Tony Roy Savarimuthu, and Sarah Meldrum. 2018. Code reuse in stack overflow and popular open source java projects. In ASWEC ’18. 141–150.
[24]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: code recommendation via structural code search. OOPSLA ’19 3 (Oct 2019), 1–28.
[25]
Christopher Manning, Raghavan Prabhakar, and Hinrich Schütze. 2009. An Introduction to Information Retrieval. Vol. 21. Cambridge University Press.
[26]
Akito Monden, Daikai Nakae, Toshihiro Kamiya, Shin-ichi Sato, and Ken-ichi Matsumoto. 2002. Software quality analysis by code clones in industrial legacy software. In Proceedings Eighth IEEE Symposium on Software Metrics. IEEE, 87–94.
[27]
Jin-woo Park, Mu-Woong Lee, Jong-Won Roh, Seung-won Hwang, and Sunghun Kim. 2014. Surfacing code in the dark: an instant clone search approach. Knowledge and Information Systems 41, 3 (Dec 2014), 727–759.
[28]
PMD. 2012. PMD’s Copy/Paste Detector (CPD) 5.0. July 14 2012.
[29]
Chaiyong Ragkhitwetsagul. 2018. Code Similarity and Clone Search in Large-Scale Source Code Data. Ph. D. Dissertation. University College London.
[30]
Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.
[31]
Chaiyong Ragkhitwetsagul, Jens Krinke, and David Clark. 2018. A comparison of code similarity analysers. Empirical Software Engineering 23, 4 (Aug 2018), 2464–2519.
[32]
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering 47, 3 (2019), 560–581.
[33]
Chaiyong Ragkhitwetsagul and Matheus Paixao. 2022. Recommending Code Improvements Based on Stack Overflow Answer Edits. arXiv preprint arXiv:2204.06773 (2022).
[34]
Chaiyong Ragkhitwetsagul, Matheus Paixao, Manal Adham, Saheed Busari, Jens Krinke, and John H Drake. 2016. Searching for configurations in clone evaluation–a replication study. In SSBSE ’16. 250–256.
[35]
Dhavleesh Rattan, Rajesh Bhatia, and Maninder Singh. 2013. Software clone detection: A systematic review. Information and Software Technology 55, 7 (Jul 2013), 1165–1199.
[36]
J. Regehr. 2010. Static Analysis Fatigue. https://blog.regehr.org/archives/259.
[37]
Chanchal K. Roy and James R Cordy. 2007. A Survey on Software Clone Detection Research. Technical Report. 115 pages.
[38]
Chanchal K. Roy, James R. Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470–495.
[39]
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: detection of clones in the twilight zone. In ESEC/FSE 2018. 354–365.
[40]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In ICSE ’16. 1157–1168.
[41]
Denis Sousa, Matheus Paixao, Ragkhitwetsagul Chaiyong, and Uchoa Italo. 2024. Replication Package for the paper: “Code Clone Configuration as a Multi-Objective Search Problem”. https://zenodo.org/records/13694413
[42]
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In APSEC ’10. 336–345.
[43]
Ellen M Voorhees, Dawn M Tice, 1999. The TREC-8 Question Answering Track Evaluation. In TREC, Vol. 1999. 82.
[44]
Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke. 2013. Searching for better configurations: a rigorous approach to clone evaluation. In FSE ’13. 455–465.
[45]
Martin Wistuba, Nicolas Schilling, and Lars Schmidt-Thieme. 2016. Two-stage transfer surrogate model for automatic hyperparameter optimization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 199–214.
[46]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.
[47]
Jia Wu, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, and Si-Hao Deng. 2019. Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology 17, 1 (2019), 26–40.
[48]
Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow?Empirical Software Engineering 24 (2019), 637–673.
[49]
Di Yang, Pedro Martins, Vaibhav Saini, and Cristina Lopes. 2017. Stack Overflow in Github: Any Snippets There?. In MSR ’17.
[50]
Tong Yu and Hong Zhu. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).
[51]
Ahmed Zerouali, Camilo Velázquez-Rodríguez, and Coen De Roover. 2021. Identifying versions of libraries used in stack overflow code snippets. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 341–345.
[52]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. In ICSE ’19, Vol. 2019-May. 783–794.
[53]
Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online q&a forum reliable?: a study of api misuse on stack overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 886–896.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
October 2024
633 pages
ISBN:9798400710476
DOI:10.1145/3674805
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clone Detection
  2. Multi-objective Optimization
  3. Search-based Software Engineering

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • CAPES
  • CNPq

Conference

ESEM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 29
    Total Downloads
  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)14
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media