[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2351676.2351701acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

Can I clone this piece of code here?

Published: 03 September 2012 Publication History

Abstract

While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.

References

[1]
Anvik, J., Hiew, L., Murphy, G. Who should fix this bug? In ICSE, 361–370, 2006.
[2]
Pearl, J. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning, Proceedings of the 7th Conference of the Cognitive Science Society, 329–334, 1988.
[3]
Friedman, N., Geiger, D., and Goldszmidt, M. Bayesian Network Classifiers, Machine Learning, 29(2-3), 131–163, 1997.
[4]
Kim, M., Sazawal, V., Notkin, D., and Murphy, G. An empirical study of code clone genealogies, In FSE, 187–196, 2005.
[5]
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., and Trigg, L. Weka, Data Mining and Knowledge Discovery Handbook, 1305-1314, 2005.
[6]
Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, In IJCAI, 1137–1145, 1995.
[7]
Gode, N., Koschke, R. Frequency and risks of changes to clones, In ICSE, 311–320, 2011.
[8]
Thummalapenta, S., Cerulo, L., Aversano, L., and Penta, M. D. An empirical study on the maintenance of source code clones, Emprical Software Engineering, 15(1), 1–34, 2010.
[9]
Cai, D., and Kim, M. An Empirical Study of Long-Lived Code Clones, In FASE, 432–446, 2011.
[10]
Baker, B. S. On finding Duplication and Near-Duplication in Large Software System, In WCRE, 86–95, 1995.
[11]
Kamiya, T., Kusumoto, S., Inoue, K. CCFinder: a multilinguistic token-based code clone detection system for large scale source code, TSE, 28(7), 654–670, 2002.
[12]
Li, Z., Lu, S., Myagmar, S., and Zhou, Y. CP-Miner: finding copy-paste and related bugs in large-scale software code, TSE, 32(3), 176–192, 2006.
[13]
Jiang, L., Misherghi, G., Su, Z., and Glondu, S. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones, In ICSE, 96–105, 2007.
[14]
Gabel, M., Jiang, L., and Su, Z. Scalable detection of semantic clones, In ICSE, 321–330, 2008.
[15]
Wang, X., Lo, D., Cheng, J., Zhang, L., Mei, H., and Yu, J. X.: Matching dependence-related queries in the system dependence graph In ASE, 457–466, 2010.
[16]
Navarro, G. A guided tour to approximate string matching, ACM Computing Surveys, 33(1), 31–88, 2001.
[17]
Kim, H., Jung, Y., Kim, S., and Yi, K. MeCC: memory comparison-based clone detector, In ICSE, 301-310, 2011.
[18]
Dang, Y., Song, G., Huang, R., and Zhang, D. Code Clone Detection Experience at Microsoft, Proceedings of International Workshop on Software Clones, 63–64, 2011.
[19]
Menzies, T., Greenwald, J., and Frank, A. Data Mining Static Code Attributes to Learn Defect Predictors, TSE, 33(1), 2–13, 2007.
[20]
Emam, K., Benlarbib, S., Goelb, N., Raic, S. N. Comparing case-based reasoning classifiers for predicting high risk software components, JSS, 55(3), 301–320, 2001.
[21]
Kim, S., Zhang, H., Wu, R., and Gong, L. Dealing with noise in defect prediction, In ICSE, 481–490, 2011.
[22]
Jiang, L., Su, Z., Chiu, E. Context-based detection of clone-related bugs, In FSE, 55–64, 2007.
[23]
Roy, C. and Cordy, J. An Empirical Study of Function Clones in Open Source Software, In WCRE, 81–90, 2008.
[24]
Kapser, C. and Godfrey M. "Cloning considered harmful" considered harmful: patterns of cloning in software Empirical Software Engineering, 13(6), 645–692, 2008.
[25]
Juergens, E., Deissenboeck, F., Feilkas, M., Hummel, B., Schaetz, B., Wagner, S., Domann, C., Streit, J.: Can Clone Detection Support Quality Assessments of Requirements Specifications? In ICSE, 79–88, 2010.
[26]
Cohen, J.: Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit Psychological Bulletin, 70 (4), 213–220, 1968.

Cited By

View all
  • (2024)Machine Learning-Based Methods for Code Smell Detection: A SurveyApplied Sciences10.3390/app1414614914:14(6149)Online publication date: 15-Jul-2024
  • (2024)On Refining the SZZ Algorithm with Bug Discussion DataEmpirical Software Engineering10.1007/s10664-024-10511-229:5Online publication date: 24-Jul-2024
  • (2024)CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detectionEmpirical Software Engineering10.1007/s10664-024-10445-929:3Online publication date: 8-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
September 2012
409 pages
ISBN:9781450312042
DOI:10.1145/2351676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian networks
  2. Code cloning
  3. Harmfulness prediction
  4. Programming aid

Qualifiers

  • Article

Conference

ASE'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Machine Learning-Based Methods for Code Smell Detection: A SurveyApplied Sciences10.3390/app1414614914:14(6149)Online publication date: 15-Jul-2024
  • (2024)On Refining the SZZ Algorithm with Bug Discussion DataEmpirical Software Engineering10.1007/s10664-024-10511-229:5Online publication date: 24-Jul-2024
  • (2024)CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detectionEmpirical Software Engineering10.1007/s10664-024-10445-929:3Online publication date: 8-Apr-2024
  • (2023)A Comprehensive Taxonomy for Prediction Models in Software EngineeringInformation10.3390/info1402011114:2(111)Online publication date: 10-Feb-2023
  • (2023)A Systematic Literature Review on the Code Smells Datasets and Validation MechanismsACM Computing Surveys10.1145/359690855:13s(1-48)Online publication date: 13-Jul-2023
  • (2023)A systematic literature review on the use of machine learning in code clone researchComputer Science Review10.1016/j.cosrev.2022.10052847:COnline publication date: 1-Feb-2023
  • (2023)A Novel Transfer Learning Method for Code Smell Detection on Heterogeneous Data: A Feasibility StudySN Computer Science10.1007/s42979-023-02157-64:6Online publication date: 28-Sep-2023
  • (2023)Smart Contract Code Clone Detection Based on Pre-training TechniquesBlockchain and Trustworthy Systems10.1007/978-981-99-8104-5_4(44-57)Online publication date: 25-Nov-2023
  • (2022)A systematic literature review of clone evolutionProceedings of the 5th International Conference on Computer Science and Software Engineering10.1145/3569966.3570091(461-473)Online publication date: 21-Oct-2022
  • (2022)Predictive Models in Software Engineering: Challenges and OpportunitiesACM Transactions on Software Engineering and Methodology10.1145/350350931:3(1-72)Online publication date: 9-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media