More Web Proxy on the site http://driver.im/

research-article

Code Clone Configuration as a Multi-Objective Search Problem

Authors:

Matheus Paixao,

Chaiyong Ragkhitwetsagul,

Italo UchoaAuthors Info & Claims

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 503 - 509

https://doi.org/10.1145/3674805.3690757

Published: 24 October 2024 Publication History

Abstract

Clone detection is an automated process for finding duplicated code within a project’s code base or between online sources. Nowadays, the code cloning community advocates that developers must be aware of the clones they may have in their code bases. In modern clone detection, rank-based tools appear as the ones able to handle the large code corpora that are necessary to identify online clones. However, such tools are sensitive to their parameters, which directly affects their clone detection abilities. Moreover, existing parameter optimization approaches for clone detectors are not meant for rank-based tools. To overcome this issue and facilitate empirical studies of code clones, we introduce Multi-objective Code Clone Configuration, a new approach based on multi-objective optimization to search for an optimal set of parameters for a rank-based clone detection tool. In our empirical evaluation, we ran 3 baseline search algorithms and NSGA-II to assess their performance in this new optimization problem. Additionally, we compared the optimized configurations with the default one. Our results show that NSGA-II was the algorithm that achieved the best performance, finding better configurations than those of the baseline algorithms. Finally, the optimized configurations achieved improvements of 71.08% and 46.29% for our fitness functions.

References

[1]

2015. Simian - Similarity Analyzer. http://www.harukizaemon.com/simian.

[2]

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289–305.

[3]

Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering 33, 9 (2007), 577–591.

Digital Library

[4]

Nicola Beume, Boris Naujoks, and Michael Emmerich. 2007. SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research 181, 3 (2007), 1653–1669.

[5]

Magiel Bruntink, Arie Van Deursen, Remco Van Engelen, and Tom Tourwe. 2005. On the use of clone detection for identifying crosscutting concern code. IEEE Transactions on Software Engineering 31, 10 (2005), 804–818.

Digital Library

[6]

Muslim Chochlov, Gul Aftab Ahmed, James Vincent Patten, Guoxian Lu, Wei Hou, David Gregg, and Jim Buckley. 2022. Using a nearest-neighbour, BERT-based approach for scalable clone detection. In ICSME ’22. IEEE, 582–591.

[7]

James R Cordy and Chanchal K Roy. 2011. The NiCad clone detector. In 2011 IEEE 19th International Conference on Program Comprehension. IEEE, 219–220.

Digital Library

[8]

KAPSER Cory. 2006. Cloning Considered Harmful’Considered Harmful. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006).

[9]

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197.

Digital Library

[10]

Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2015. Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the aaai conference on artificial intelligence, Vol. 29.

[11]

Nils Göde and Rainer Koschke. 2009. Incremental clone detection. In CSMR ’09. IEEE, 219–228.

[12]

Muhammad Hammad, Onder Babur, Hamid Abdul Basit, and Mark van den Brand. 2021. Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval. PeerJ Computer Science 7 (2021), e737.

[13]

Muntasir Hoq, Yang Shi, Juho Leinonen, Damilola Babalola, Collin Lynch, Thomas Price, and Bita Akram. 2024. Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, Vol. 3487. ACM, 526–532.

Digital Library

[14]

Elmar Juergens, Florian Deissenboeck, and Benjamin Hummel. 2009. Clonedetective-a workbench for clone detection research. In ICSE ’09. IEEE, 603–606.

[15]

Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654–670.

Digital Library

[16]

Cory J. Kapser and Michael W. Godfrey. 2008. Cloning considered harmful considered harmful: patterns of cloning in software. Empirical Software Engineering 13, 6 (Dec 2008), 645–692.

Digital Library

[17]

Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting working code examples. In ICSE ’14. 664–675.

[18]

Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY – A Code-to-Code Search Engine. In ICSE ’18. 946–957.

[19]

Rainer Koschke. 2007. Survey of research on software clones. Duplication, Redundancy, and Similarity in Software - Dagstuhl Seminar 06301 (2007), 24.

[20]

Jens Krinke and Chaiyong Ragkhitwetsagul. 2021. Code Similarity in Clone Detection. In Code Clone Analysis. Springer Singapore, 135–160.

[21]

Jingyue Li and Michael D Ernst. 2012. CBCD: Cloned buggy code detector. In ICSE ’12. 310–320.

[22]

Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. OOPSLA ’17 1 (Oct 2017), 1–28.

[23]

Adriaan Lotter, Sherlock A Licorish, Bastin Tony Roy Savarimuthu, and Sarah Meldrum. 2018. Code reuse in stack overflow and popular open source java projects. In ASWEC ’18. 141–150.

[24]

Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: code recommendation via structural code search. OOPSLA ’19 3 (Oct 2019), 1–28.

[25]

Christopher Manning, Raghavan Prabhakar, and Hinrich Schütze. 2009. An Introduction to Information Retrieval. Vol. 21. Cambridge University Press.

[26]

Akito Monden, Daikai Nakae, Toshihiro Kamiya, Shin-ichi Sato, and Ken-ichi Matsumoto. 2002. Software quality analysis by code clones in industrial legacy software. In Proceedings Eighth IEEE Symposium on Software Metrics. IEEE, 87–94.

[27]

Jin-woo Park, Mu-Woong Lee, Jong-Won Roh, Seung-won Hwang, and Sunghun Kim. 2014. Surfacing code in the dark: an instant clone search approach. Knowledge and Information Systems 41, 3 (Dec 2014), 727–759.

[28]

PMD. 2012. PMD’s Copy/Paste Detector (CPD) 5.0. July 14 2012.

[29]

Chaiyong Ragkhitwetsagul. 2018. Code Similarity and Clone Search in Large-Scale Source Code Data. Ph. D. Dissertation. University College London.

[30]

Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.

Digital Library

[31]

Chaiyong Ragkhitwetsagul, Jens Krinke, and David Clark. 2018. A comparison of code similarity analysers. Empirical Software Engineering 23, 4 (Aug 2018), 2464–2519.

Digital Library

[32]

Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering 47, 3 (2019), 560–581.

[33]

Chaiyong Ragkhitwetsagul and Matheus Paixao. 2022. Recommending Code Improvements Based on Stack Overflow Answer Edits. arXiv preprint arXiv:2204.06773 (2022).

[34]

Chaiyong Ragkhitwetsagul, Matheus Paixao, Manal Adham, Saheed Busari, Jens Krinke, and John H Drake. 2016. Searching for configurations in clone evaluation–a replication study. In SSBSE ’16. 250–256.

[35]

Dhavleesh Rattan, Rajesh Bhatia, and Maninder Singh. 2013. Software clone detection: A systematic review. Information and Software Technology 55, 7 (Jul 2013), 1165–1199.

[36]

J. Regehr. 2010. Static Analysis Fatigue. https://blog.regehr.org/archives/259.

[37]

Chanchal K. Roy and James R Cordy. 2007. A Survey on Software Clone Detection Research. Technical Report. 115 pages.

[38]

Chanchal K. Roy, James R. Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming 74, 7 (2009), 470–495.

Digital Library

[39]

Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V. Lopes. 2018. Oreo: detection of clones in the twilight zone. In ESEC/FSE 2018. 354–365.

Digital Library

[40]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. In ICSE ’16. 1157–1168.

Digital Library

[41]

Denis Sousa, Matheus Paixao, Ragkhitwetsagul Chaiyong, and Uchoa Italo. 2024. Replication Package for the paper: “Code Clone Configuration as a Multi-Objective Search Problem”. https://zenodo.org/records/13694413

[42]

Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In APSEC ’10. 336–345.

[43]

Ellen M Voorhees, Dawn M Tice, 1999. The TREC-8 Question Answering Track Evaluation. In TREC, Vol. 1999. 82.

[44]

Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke. 2013. Searching for better configurations: a rigorous approach to clone evaluation. In FSE ’13. 455–465.

[45]

Martin Wistuba, Nicolas Schilling, and Lars Schmidt-Thieme. 2016. Two-stage transfer surrogate model for automatic hyperparameter optimization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 199–214.

Digital Library

[46]

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.

[47]

Jia Wu, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, and Si-Hao Deng. 2019. Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology 17, 1 (2019), 26–40.

[48]

Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow?Empirical Software Engineering 24 (2019), 637–673.

[49]

Di Yang, Pedro Martins, Vaibhav Saini, and Cristina Lopes. 2017. Stack Overflow in Github: Any Snippets There?. In MSR ’17.

[50]

Tong Yu and Hong Zhu. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).

[51]

Ahmed Zerouali, Camilo Velázquez-Rodríguez, and Coen De Roover. 2021. Identifying versions of libraries used in stack overflow code snippets. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 341–345.

[52]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. In ICSE ’19, Vol. 2019-May. 783–794.

[53]

Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online q&a forum reliable?: a study of api misuse on stack overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 886–896.

Digital Library

Index Terms

Code Clone Configuration as a Multi-Objective Search Problem

Index terms have been assigned to the content through auto-classification.

Recommendations

Recommending relevant classes for bug reports using multi-objective search
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Developers may follow a tedious process to find the cause of a bug based on code reviews and reproducing the abnormal behavior. In this paper, we propose an automated approach to finding and ranking potential classes with the respect to the probability ...
Inverted PBI in MOEA/D and its impact on the search performance on multi and many-objective optimization
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

MOEA/D decomposes a multi-objective optimization problem into a number of single objective optimization problems. Each single objective optimization problem is defined by a scalarizing function using a weight vector. In MOEA/D, there are several ...
Preference-based multi-objective software modelling
CMSBSE '13: Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering

In this paper, we propose the use of preference-based evolutionary multi-objective optimization techniques (P-EMO) to address various software modelling challenges. P-EMO allows the incorporation of decision maker (i.e., designer) preferences (e.g., ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

October 2024

633 pages

ISBN:9798400710476

DOI:10.1145/3674805

Copyright © 2024 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

CAPES
CNPq

Conference

ESEM '24

Sponsor:

SIGSOFT

ESEM '24: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
58
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)21

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten