[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2568225.2568298acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Detecting differences across multiple instances of code clones

Published: 31 May 2014 Publication History

Abstract

Clone detectors find similar code fragments (i.e., instances of code clones) and report large numbers of them for industrial systems. To maintain or manage code clones, developers often have to investigate differences of multiple cloned code fragments. However,existing program differencing techniques compare only two code fragments at a time. Developers then have to manually combine several pairwise differencing results. In this paper, we present an approach to automatically detecting differences across multiple clone instances. We have implemented our approach as an Eclipse plugin and evaluated its accuracy with three Java software systems. Our evaluation shows that our algorithm has precision over 97.66% and recall over 95.63% in three open source Java projects. We also conducted a user study of 18 developers to evaluate the usefulness of our approach for eight clone-related refactoring tasks. Our study shows that our approach can significantly improve developers’performance in refactoring decisions, refactoring details, and task completion time on clone-related refactoring tasks. Automatically detecting differences across multiple clone instances also opens opportunities for building practical applications of code clones in software maintenance, such as auto-generation of application skeleton, intelligent simultaneous code editing.

References

[1]
H. A. Basit and S. Jarzabek. Efficient token based clone detection with flexible tokenization. In ESEC/SIGSOFT FSE, pages 513–516, 2007.
[2]
I. D. Baxter, A. Yahin, L. M. de Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, pages 368–377, 1998.
[3]
S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo. Comparison and evaluation of clone detection tools. IEEE Trans. Software Eng., 33(9):577–591, 2007.
[4]
R. Cottrell, J. J. C. Chang, R. J. Walker, and J. Denzinger. Determining detailed structural correspondence for generalization tasks. In ESEC-FSE ’07, pages 165–174, 2007.
[5]
E. D. Ekoko and M. P. Robillard. Clonetracker: tool support for code clone management. In ICSE’08, pages 843–846, 2008.
[6]
B. Fluri, M. Würsch, M. Pinzger, and H. Gall. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Trans. Software Eng., 33(11):725–743, 2007.
[7]
B. Fluri, M. Wursch, M. Pinzger, and H. Gall. Change distilling:tree differencing for fine-grained source code change extraction. TSE’07, 33(11):725–743, 2007.
[8]
M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE, pages 321–330, 2008.
[9]
J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley. The Java Language Specification: Java Se 7 Ed. Java Series. Prentice Hall PTR, 2013.
[10]
D. G. Higgins and P. M. Sharp. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73(1):237–244, 1988.
[11]
Y. Higo and S. Kusumoto. Enhancing quality of code clone detection with program dependency graph. In WCRE, pages 315–316, 2009.
[12]
Y. Higo, Y. Ueda, S. Kusumoto, and K. Inoue. Simultaneous modification support based on code clone analysis. In APSEC ’07, pages 262–269, 2007.
[13]
D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341–343.
[14]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In ICSE, pages 96–105, 2007.
[15]
L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In ESEC-FSE ’07, pages 55–64, 2007.
[16]
E. Jürgens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE, pages 485–495, 2009.
[17]
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Eng., 28(7):654–670, 2002.
[18]
C. Kapser and M. W. Godfrey. Aiding comprehension of cloning through categorization. In IWPSE, pages 85–94, 2004.
[19]
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In In Proceedings of the 8th International Symposium on Static Analysis, pages 40–56, 2001.
[20]
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In SAS, pages 40–56, 2001.
[21]
R. Koschke. Survey of research on software clones. In Duplication, Redundancy, and Similarity in Software.
[22]
J. Krinke. Identifying similar code with program dependence graphs. In WCRE’01, pages 301–310, 2001.
[23]
J. Krinke. Identifying similar code with program dependence graphs. In WCRE, pages 301–309, 2001.
[24]
J. Krinke. A study of consistent and inconsistent changes to code clones. In WCRE ’07, pages 170–178, 2007.
[25]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Trans. Software Eng., 32(3):176–192, 2006.
[26]
G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2):81–97, 1956.
[27]
L. A. Newberg. Significance of gapped sequence alignments. J. Comput Biol., 15(9):1187–1194, 2008.
[28]
H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. Al-Kofahi, and T. N. Nguyen. Clone management for evolving software. TSE, 38(5):1008–1026, 2012.
[29]
C. K. Roy and J. R. Cordy. A survey on software clone detection research. Queen’s Technical Report:541, pages 0–115, 2007.
[30]
S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, Dec. 1965.
[31]
M. Toomim, A. Begel, and S. L. Graham. Managing duplicated code with linked editing. In VLHCC ’04, pages 173–180, 2004.
[32]
Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue. Gemini: Maintenance support environment based on code clone analysis. In IEEE METRICS, pages 67–76, 2002.
[33]
R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168–173, 1974.
[34]
L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337–348, 1994.
[35]
Z. Xing. Model comparison with genericdiff. In ASE, pages 135–138, 2010.
[36]
Z. Xing and E. Stroulia. Differencing logical uml models. Autom. Softw. Eng., 14(2):215–259, 2007.
[37]
Z. Xing, Y. Xue, and S. Jarzabek. Clonedifferentiator: Analyzing clones by differentiation. In ASE, pages 576–579, 2011.
[38]
G. Zhang, X. Peng, Z. Xing, and W. Zhao. Cloning practices: Why developers clone and what can be changed. In ICSM’12, pages 285–294, 2012.
[39]
Z. Zhang, S. Schwartz, L. Wagner, and W. Miller. A greedy algorithm for aligning dna sequences. IEEE Trans. Software Eng., 7(1):203–214, 2000.
[40]
M. F. Zibran and C. K. Roy. Towards flexible code clone detection, management, and refactoring in ide. In IWSC ’11, pages 75–76, 2011.

Cited By

View all
  • (2022)Synchronizing software variantsProceedings of the 26th ACM International Systems and Software Product Line Conference - Volume B10.1145/3503229.3547053(82-89)Online publication date: 12-Sep-2022
  • (2022)Will Dependency Conflicts Affect My Program's Semantics?IEEE Transactions on Software Engineering10.1109/TSE.2021.305776748:7(2295-2316)Online publication date: 1-Jul-2022
  • (2022)Comparing Execution Traces of Jupyter Notebook for Checking Correctness of Refactoring2022 IEEE 16th International Workshop on Software Clones (IWSC)10.1109/IWSC55060.2022.00019(62-68)Online publication date: Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
ISBN:9781450327565
DOI:10.1145/2568225
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code clone
  2. Human study
  3. Program differencing

Qualifiers

  • Research-article

Conference

ICSE '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Synchronizing software variantsProceedings of the 26th ACM International Systems and Software Product Line Conference - Volume B10.1145/3503229.3547053(82-89)Online publication date: 12-Sep-2022
  • (2022)Will Dependency Conflicts Affect My Program's Semantics?IEEE Transactions on Software Engineering10.1109/TSE.2021.305776748:7(2295-2316)Online publication date: 1-Jul-2022
  • (2022)Comparing Execution Traces of Jupyter Notebook for Checking Correctness of Refactoring2022 IEEE 16th International Workshop on Software Clones (IWSC)10.1109/IWSC55060.2022.00019(62-68)Online publication date: Oct-2022
  • (2022)RepChaBug: Automatically Repairing Incorrect Change Bugs in Software Evolution2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC54236.2022.00242(1525-1530)Online publication date: Jun-2022
  • (2021)Explaining Regressions via Alignment Slicing and MendingIEEE Transactions on Software Engineering10.1109/TSE.2019.294956847:11(2421-2437)Online publication date: 1-Nov-2021
  • (2021)A Systematic Literature Review on Bad Smells–5 W's: Which, When, What, Who, WhereIEEE Transactions on Software Engineering10.1109/TSE.2018.288097747:1(17-66)Online publication date: 1-Jan-2021
  • (2021)Context-based intelligent recommendation by code reuse for smart decision support and cognitive adaptive systemsInternational Journal of Intelligent Unmanned Systems10.1108/IJIUS-07-2021-005511:1(75-87)Online publication date: 28-Oct-2021
  • (2020)ASPDup: AST-Sequence-based Progressive Duplicate Code Detection Tool for Onsite Programming CodeProceedings of the 12th Asia-Pacific Symposium on Internetware10.1145/3457913.3457938(260-264)Online publication date: 1-Nov-2020
  • (2020)CCGraphProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416541(931-942)Online publication date: 21-Dec-2020
  • (2019)An Exploratory Study on Detection of Cloned Code in Information SystemsProceedings of the XV Brazilian Symposium on Information Systems10.1145/3330204.3330277(1-8)Online publication date: 20-May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media