More Web Proxy on the site http://driver.im/

Article

Can I clone this piece of code here?

Authors:

Hong MeiAuthors Info & Claims

ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

Pages 170 - 179

https://doi.org/10.1145/2351676.2351701

Published: 03 September 2012 Publication History

Abstract

While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.

References

[1]

Anvik, J., Hiew, L., Murphy, G. Who should ﬁx this bug? In ICSE, 361–370, 2006.

Digital Library

[2]

Pearl, J. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning, Proceedings of the 7th Conference of the Cognitive Science Society, 329–334, 1988.

[3]

Friedman, N., Geiger, D., and Goldszmidt, M. Bayesian Network Classiﬁers, Machine Learning, 29(2-3), 131–163, 1997.

Digital Library

[4]

Kim, M., Sazawal, V., Notkin, D., and Murphy, G. An empirical study of code clone genealogies, In FSE, 187–196, 2005.

Digital Library

[5]

Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., and Trigg, L. Weka, Data Mining and Knowledge Discovery Handbook, 1305-1314, 2005.

[6]

Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, In IJCAI, 1137–1145, 1995.

Digital Library

[7]

Gode, N., Koschke, R. Frequency and risks of changes to clones, In ICSE, 311–320, 2011.

Digital Library

[8]

Thummalapenta, S., Cerulo, L., Aversano, L., and Penta, M. D. An empirical study on the maintenance of source code clones, Emprical Software Engineering, 15(1), 1–34, 2010.

Digital Library

[9]

Cai, D., and Kim, M. An Empirical Study of Long-Lived Code Clones, In FASE, 432–446, 2011.

Digital Library

[10]

Baker, B. S. On ﬁnding Duplication and Near-Duplication in Large Software System, In WCRE, 86–95, 1995.

Digital Library

[11]

Kamiya, T., Kusumoto, S., Inoue, K. CCFinder: a multilinguistic token-based code clone detection system for large scale source code, TSE, 28(7), 654–670, 2002.

Digital Library

[12]

Li, Z., Lu, S., Myagmar, S., and Zhou, Y. CP-Miner: ﬁnding copy-paste and related bugs in large-scale software code, TSE, 32(3), 176–192, 2006.

Digital Library

[13]

Jiang, L., Misherghi, G., Su, Z., and Glondu, S. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones, In ICSE, 96–105, 2007.

Digital Library

[14]

Gabel, M., Jiang, L., and Su, Z. Scalable detection of semantic clones, In ICSE, 321–330, 2008.

Digital Library

[15]

Wang, X., Lo, D., Cheng, J., Zhang, L., Mei, H., and Yu, J. X.: Matching dependence-related queries in the system dependence graph In ASE, 457–466, 2010.

Digital Library

[16]

Navarro, G. A guided tour to approximate string matching, ACM Computing Surveys, 33(1), 31–88, 2001.

Digital Library

[17]

Kim, H., Jung, Y., Kim, S., and Yi, K. MeCC: memory comparison-based clone detector, In ICSE, 301-310, 2011.

Digital Library

[18]

Dang, Y., Song, G., Huang, R., and Zhang, D. Code Clone Detection Experience at Microsoft, Proceedings of International Workshop on Software Clones, 63–64, 2011.

Digital Library

[19]

Menzies, T., Greenwald, J., and Frank, A. Data Mining Static Code Attributes to Learn Defect Predictors, TSE, 33(1), 2–13, 2007.

Digital Library

[20]

Emam, K., Benlarbib, S., Goelb, N., Raic, S. N. Comparing case-based reasoning classiﬁers for predicting high risk software components, JSS, 55(3), 301–320, 2001.

Digital Library

[21]

Kim, S., Zhang, H., Wu, R., and Gong, L. Dealing with noise in defect prediction, In ICSE, 481–490, 2011.

Digital Library

[22]

Jiang, L., Su, Z., Chiu, E. Context-based detection of clone-related bugs, In FSE, 55–64, 2007.

Digital Library

[23]

Roy, C. and Cordy, J. An Empirical Study of Function Clones in Open Source Software, In WCRE, 81–90, 2008.

Digital Library

[24]

Kapser, C. and Godfrey M. "Cloning considered harmful" considered harmful: patterns of cloning in software Empirical Software Engineering, 13(6), 645–692, 2008.

Digital Library

[25]

Juergens, E., Deissenboeck, F., Feilkas, M., Hummel, B., Schaetz, B., Wagner, S., Domann, C., Streit, J.: Can Clone Detection Support Quality Assessments of Requirements Speciﬁcations? In ICSE, 79–88, 2010.

Digital Library

[26]

Cohen, J.: Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit Psychological Bulletin, 70 (4), 213–220, 1968.

Cited By

Yadav PRao RMishra AGupta M(2024)Machine Learning-Based Methods for Code Smell Detection: A SurveyApplied Sciences10.3390/app1414614914:14(6149)Online publication date: 15-Jul-2024
https://doi.org/10.3390/app14146149
Rani PPetrulio FBacchelli A(2024)On Refining the SZZ Algorithm with Bug Discussion DataEmpirical Software Engineering10.1007/s10664-024-10511-229:5Online publication date: 24-Jul-2024
https://doi.org/10.1007/s10664-024-10511-2
Alazba AAljamaan HAlshayeb M(2024)CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detectionEmpirical Software Engineering10.1007/s10664-024-10445-929:3Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1007/s10664-024-10445-9
Show More Cited By

Index Terms

Can I clone this piece of code here?
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management
        Software maintenance
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues

Recommendations

Code clone discovery based on functional behavior
Code Clone Graph Metrics for Detecting Diffused Code Clones
APSEC '09: Proceedings of the 2009 16th Asia-Pacific Software Engineering Conference

Code clones (duplicated source code in a software system) are one of the major factors in decreasing maintainability. Many code clone detection methods have been proposed to find code clones automatically from large-scale software. However, it is still ...
Incremental Code Clone Detection: A PDG-based Approach
WCRE '11: Proceedings of the 2011 18th Working Conference on Reverse Engineering

It has been noted in recent years that the presence of code clones makes software maintenance more difficult. Unintended code inconsistencies may occur due to the presence of code clones. In order to avoid problems caused by code clones, it is necessary ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering

September 2012

409 pages

ISBN:9781450312042

DOI:10.1145/2351676

General Chair:
Michael Goedicke,
Program Chairs:
Tim Menzies,
Motoshi Saeki

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGAI: ACM Special Interest Group on Artificial Intelligence
Universität Duisburg Essen: Universität Duisburg Essen
TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ASE'12

Sponsor:

SIGSOFT

ASE'12: IEEE/ACM International Conference on Automated Software Engineering

September 3 - 7, 2012

Essen, Germany

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
428
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yadav PRao RMishra AGupta M(2024)Machine Learning-Based Methods for Code Smell Detection: A SurveyApplied Sciences10.3390/app1414614914:14(6149)Online publication date: 15-Jul-2024
https://doi.org/10.3390/app14146149
Rani PPetrulio FBacchelli A(2024)On Refining the SZZ Algorithm with Bug Discussion DataEmpirical Software Engineering10.1007/s10664-024-10511-229:5Online publication date: 24-Jul-2024
https://doi.org/10.1007/s10664-024-10511-2
Alazba AAljamaan HAlshayeb M(2024)CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detectionEmpirical Software Engineering10.1007/s10664-024-10445-929:3Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1007/s10664-024-10445-9
Yang XLiu JZhang D(2023)A Comprehensive Taxonomy for Prediction Models in Software EngineeringInformation10.3390/info1402011114:2(111)Online publication date: 10-Feb-2023
https://doi.org/10.3390/info14020111
Zakeri-Nasrabadi MParsa SEsmaili EPalomba F(2023)A Systematic Literature Review on the Code Smells Datasets and Validation MechanismsACM Computing Surveys10.1145/359690855:13s(1-48)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3596908
Kaur MRattan D(2023)A systematic literature review on the use of machine learning in code clone researchComputer Science Review10.1016/j.cosrev.2022.10052847:COnline publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.cosrev.2022.100528
Gupta RSingh S(2023)A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility StudySN Computer Science10.1007/s42979-023-02157-64:6Online publication date: 28-Sep-2023
https://doi.org/10.1007/s42979-023-02157-6
Zhang CShen WLiu YZhang FLi YZhang ZCui JMao X(2023)Smart Contract Code Clone Detection Based on Pre-training TechniquesBlockchain and Trustworthy Systems10.1007/978-981-99-8104-5_4(44-57)Online publication date: 25-Nov-2023
https://doi.org/10.1007/978-981-99-8104-5_4
Zhong YZhang XTao WZhang Y(2022)A systematic literature review of clone evolutionProceedings of the 5th International Conference on Computer Science and Software Engineering10.1145/3569966.3570091(461-473)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3569966.3570091
Yang YXia XLo DBi TGrundy JYang X(2022)Predictive Models in Software Engineering: Challenges and OpportunitiesACM Transactions on Software Engineering and Methodology10.1145/350350931:3(1-72)Online publication date: 9-Apr-2022
https://dl.acm.org/doi/10.1145/3503509
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents