[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2597073.2597111acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Incremental origin analysis of source code files

Published: 31 May 2014 Publication History

Abstract

The history of software systems tracked by version control systems is often incomplete because many file movements are not recorded. However, static code analyses that mine the file history, such as change frequency or code churn, produce precise results only if the complete history of a source code file is available. In this paper, we show that up to 38.9% of the files in open source systems have an incomplete history, and we propose an incremental, commit-based approach to reconstruct the history based on clone information and name similarity. With this approach, the history of a file can be reconstructed across repository boundaries and thus provides accurate information for any source code analysis. We evaluate the approach in terms of correctness, completeness, performance, and relevance with a case study among seven open source systems and a developer survey.

References

[1]
Git. http://www.git-scm.com/. {Online; accessed 2013-12-03}.
[2]
G. Antoniol, M. D. Penta, and E. Merlo. An Automatic Approach to Identify Class Evolution Discontinuities. In IWPSE ’04, 2004.
[3]
M. Asaduzzaman, C. Roy, K. Schneider, and M. D. Penta. LHDiff: A Language-Independent Hybrid Approach for Tracking Source Code Lines. In ICSM’13, 2013.
[4]
V. Bauer, L. Heinemann, B. Hummel, E. Juergens, and M. Conradt. A Framework for Incremental Quality Analysis of Large Software Systems. In ICSM’12, 2012.
[5]
G. Canfora, L. Cerulo, and M. D. Penta. Identifying Changed Source Code Lines from Version Repositories. In MSR’07, 2007.
[6]
Davies, J. and German, D. M. and Godfrey, M. W. and Hindle, A. Software bertillonage: finding the provenance of an entity. In MSR’11, 2011.
[7]
S. Demeyer, S. Ducasse, and O. Nierstrasz. Finding refactorings via change metrics. In OOPSLA’00, 2000.
[8]
N. Göde and J. Harder. Clone stability. In CSMR’11, 2011.
[9]
M. Godfrey and Q. Tu. Tracking structural evolution using origin analysis. In IWPSE’02, 2002.
[10]
M. Godfrey and L. Z. Using origin analysis to detect merging and splitting of source code entities. Software Engineering, IEEE Transactions on, 31(2), 2005.
[11]
Godfrey, M. W. and German, D. M. and Davies, J. and Hindle, A. Determining the provenance of software artifacts. In IWSC’11, 2011.
[12]
L. Heinemann, B. Hummel, and D. Steidl. Teamscale: Software Quality Control in Real-Time. In ICSE ’14, 2014.
[13]
B. Hummel, E. Juergens, L. Heinemann, and M. Conradt. Index-based code clone detection: incremental, distributed, scalable. In ICSM’10, 2010.
[14]
J. Hunt and W. Tichy. Extensible language-aware merging. In ICSM’02, 2002.
[15]
T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. Detection of software modules with high debug code churn in a very large legacy system. In ISSRE’96, 1996.
[16]
M. Kim and D. Notkin. Program Element Matching for Multi-version Program Analyses. In MSR’06, 2006.
[17]
M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An Empirical Study of Code Clone Genealogies. In FSE’05, 2005.
[18]
S. Kim, K. Pan, and E. J. Whitehead, Jr. When Functions Change Their Names: Automatic Detection of Origin Relationships. In WCRE’05, 2005.
[19]
S. Kpodjedo, F. Ricca, P. Galinier, and G. Antoniol. Recovering the Evolution Stable Part Using an ECGM Algorithm: Is There a Tunnel in Mozilla? In CSMR’09, 2009.
[20]
J. Krinke. Is Cloned Code More Stable than Non-cloned Code? In SCAM’08, 2008.
[21]
J. Krinke. Is cloned code older than non-cloned code? In IWSC’11, 2011.
[22]
T. Lavoie, F. Khomh, E. Merlo, and Y. Zou. Inferring Repository File Structure Modifications Using Nearest-Neighbor Clone Detection. In WCRE’12, 2012.
[23]
R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In ICSE’08, 2008.
[24]
N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In ICSE’05.
[25]
Q. Tu and M. W. Godfrey. An Integrated Approach for Studying Architectural Evolution. In IWPC’02, 2002.
[26]
F. Van Rysselberghe and S. Demeyer. Reconstruction of Successful Software Evolution Using Clone Detection. In IWPSE’03, 2003.

Cited By

View all
  • (2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
  • (2024)Refactoring-aware Block Tracking in Commit HistoryIEEE Transactions on Software Engineering10.1109/TSE.2024.3484586(1-20)Online publication date: 2024
  • (2024)An empirical study on bug severity estimation using source code metrics and static analysisJournal of Systems and Software10.1016/j.jss.2024.112179(112179)Online publication date: Aug-2024
  • Show More Cited By

Index Terms

  1. Incremental origin analysis of source code files

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories
    May 2014
    427 pages
    ISBN:9781450328630
    DOI:10.1145/2597073
    • General Chair:
    • Premkumar Devanbu,
    • Program Chairs:
    • Sung Kim,
    • Martin Pinzger
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clone Detection
    2. Origin Analysis
    3. Software Evolution

    Qualifiers

    • Article

    Conference

    ICSE '14
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Impact of methodological choices on the analysis of code metrics and maintenanceJournal of Systems and Software10.1016/j.jss.2024.112263220(112263)Online publication date: Feb-2025
    • (2024)Refactoring-aware Block Tracking in Commit HistoryIEEE Transactions on Software Engineering10.1109/TSE.2024.3484586(1-20)Online publication date: 2024
    • (2024)An empirical study on bug severity estimation using source code metrics and static analysisJournal of Systems and Software10.1016/j.jss.2024.112179(112179)Online publication date: Aug-2024
    • (2023)Method-Level Bug Severity Prediction using Source Code Metrics and LLMs2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00055(635-646)Online publication date: 9-Oct-2023
    • (2023)A Language-agnostic Framework for Mining Static Analysis Rules from Code Changes2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00035(327-339)Online publication date: May-2023
    • (2022)Accurate method and variable tracking in commit historyProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549079(183-195)Online publication date: 7-Nov-2022
    • (2021)CodeShovelProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00135(1510-1522)Online publication date: 22-May-2021
    • (2021)Same File, Different ChangesProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00076(773-784)Online publication date: 22-May-2021
    • (2021)An empirical study on the use of SZZ for identifying inducing changes of non-functional bugsEmpirical Software Engineering10.1007/s10664-021-09970-826:4Online publication date: 19-May-2021
    • (2020)SAGA: Efficient and Large-Scale Detection of Near-Miss Clones with GPU Acceleration2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER48275.2020.9054832(272-283)Online publication date: Feb-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media