[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2901739.2901749acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automatic clustering of code changes

Published: 14 May 2016 Publication History

Abstract

Several research tools and projects require groups of similar code changes as input. Examples are recommendation and bug finding tools that can provide valuable information to developers based on such data. With the help of similar code changes they can simplify the application of bug fixes and code changes to multiple locations in a project. But despite their benefit, the practical value of existing tools is limited, as users need to manually specify the input data, i.e., the groups of similar code changes.
To overcome this drawback, this paper presents and evaluates two syntactical similarity metrics, one of them is specifically designed to run fast, in combination with two carefully selected and self-tuning clustering algorithms to automatically detect groups of similar code changes.
We evaluate the combinations of metrics and clustering algorithms by applying them to several open source projects and also publish the detected groups of similar code changes online as a reference dataset. The automatically detected groups of similar code changes work well when used as input for LASE, a recommendation system for code changes.

References

[1]
G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In SJCC'67: Spring Joint Computer Conf., pages 483--485, Atlantic City, NJ, Apr. 1967.
[2]
P. Andritsos and V. Tzerpos. Information-Theoretic Software Clustering. IEEE Trans. on Softw. Eng., 31(2):150--165, Feb. 2005.
[3]
N. Anquetil, C. Fourrier, and T. C. Lethbridge. Experiments with Clustering As a Software Remodularization Method. In WCRE'99: Working Conf. on Reverse Eng., pages 235--255, Atlanta, GA, Oct. 1999.
[4]
B. S. Baker. On Finding Duplication and Near-duplication in Large Software Systems. In WCRE'95: Working Conf. on Reverse Eng., pages 86--95, Toronto, Canada, July 1995.
[5]
M. Barnett, C. Bird, J. Brunet, and S. Lahiri. Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets. In ICSE'15: Intl. Conf. on Softw. Eng., Florence, Italy, May 2015.
[6]
I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. Clone Detection Using Abstract Syntax Trees. In ICSM'98: Intl. Conf. on Softw. Maintenance, pages 368--377, Bethesda, MD, Nov. 1998.
[7]
M. Belkin and P. Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6): 1373--1396, June 2003.
[8]
L. Bergroth, H. Hakonen, and T. Raita. A Survey of Longest Common Subsequence Algorithms. In SPIRE'00: String Processing and Inf. Retrieval Symp., pages 39--48, A Coruna, Spain, Sep. 2000.
[9]
D. Beyer and A. Noack. Clustering Software Artifacts Based on Frequent Common Changes. In IWPC'05: Intl. Workshop on Program Comprehension, pages 259--268, St. Louis, MO, May 2005.
[10]
A. Corazza, S. D. Martino, V. Maggio, and G. Scanniello. Investigating the Use of Lexical Information for Software System Clustering. In CSMR'11: European Conf. on Soft. Maintenance and Reengineering, pages 35--44, Oldenburg, Germany, March 2011.
[11]
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. The MIT Press, Cambridge, MA, 3rd edition, 2009.
[12]
M. Dias, A. Bacchelli, G. Gousios, D. Cassou, and S. Ducasse. Untangling Fine-Grained Code Changes. In SANER'15: Intl. Conf. on Softw. Analysis, Evolution, and Reengineering, pages 341--350, Montréal, Canada, March 2015.
[13]
N. Dragan, M. L. Collard, M. Hammad, and J. I. Maletic. Using Stereotypes to Help Characterize Commits. In ICSM'11: Intl. Conf. on Softw. Maintenance, pages 520--523, Williamsburg, VA, Sep. 2011.
[14]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In KDD'96: Intl. Conf. on Knowledge Discovery and Data Mining, pages 226--231, Portland, OR, Aug. 1996.
[15]
J. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. Fine-grained and accurate source code differencing. In ASE'14: Intl. Conf. Automated Softw. Eng., pages 313--324, Västerås, Sweden, Sep. 2014.
[16]
B. Fluri and H. C. Gall. Classifying Change Types for Qualifying Change Couplings. In ICPC'06: Intl. Conf. on Program Comprehension, pages 35--45, Athens, Greece, June 2006.
[17]
B. Fluri, E. Giger, and H. C. Gall. Discovering Patterns of Change Types. In ASE'08: Intl. Conf. on Automated Softw. Eng., pages 463--466, L'Aquila, Italy, Sep. 2008.
[18]
B. Fluri, M. Wuersch, M. Pinzger, and H. Gall. Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Trans. Softw. Eng., 33(11):725--743, Nov. 2007.
[19]
A. L. N. Fred and A. K. Jain. Combining Multiple Clusterings Using Evidence Accumulation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(6):835--850, June 2005.
[20]
W. Gan and D. Li. Optimal Choice of Parameters for a Density-Based Clustering Algorithm. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, volume 2639 of Lecture Notes in Computer Science, pages 603--606. Springer Berlin Heidelberg, 2003.
[21]
J. C. Gower and G. J. S. Ross. Minimum Spanning Trees and Single Linkage Cluster Analysis. Applied Statistics, 18(1):54--64, 1969.
[22]
M. Hashimoto, A. Mori, and T. Izumida. A Comprehensive and Scalable Method for Analyzing Fine-Grained Source Code Change Patterns. In SANER'15: Intl. Conf. on Softw. Analysis, Evolution and Reengineering, pages 351--360, Montréal, Canada, March 2015.
[23]
K. Herzig and A. Zeller. The Impact of Tangled Code Changes. In MSR'13: Working Conf. on Mining Software Repositories, pages 121--130, San Francisco, CA, May 2013.
[24]
Y. Higo and S. Kusumoto. Identifying Clone Removal Opportunities Based on Co-evolution Analysis. In IWPSE'13: Intl. Workshop on Principles on Software Evolution, pages 28--37, Saint Petersburg, Russia, Aug. 2013.
[25]
A. Hindle, D. M. German, M. W. Godfrey, and R. C. Holt. Automatic Classication of Large Changes into Maintenance Categories. In ICPC'09: Intl. Conf. on Program Comprehension, pages 30--39, Vancouver, Canada, May 2009.
[26]
A. Hinneburg and H.-H. Gabriel. DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In IDA'07: Intl. Conf. on Intelligent Data Analysis, pages 70--80, Ljubljana, Slovenia, Sep. 2007.
[27]
D. S. Hirschberg. A Linear Space Algorithm for Computing Maximal Common Subsequences. Commun. ACM, 18(6):341--343, June 1975.
[28]
J. Hopcroft and R. Tarjan. Algorithm 447: Efficient Algorithms for Graph Manipulation. Commun. ACM, 16(6):372--378, June 1973.
[29]
A. Hora, N. Anquetil, S. Ducasse, and M. Valente. Mining System Specific Rules from Change Patterns. In WCRE'13: Working Conf. on Reverse Eng., pages 331--340, Koblenz, Germany, Oct. 2013.
[30]
J. Jacobellis, N. Meng, and M. Kim. Cookbook: In Situ Code Completion Using Edit Recipes Learned from Examples. In ICSE'14: Intl. Conf. Softw. Eng., pages 584--587, Hyderabad, India, May 2014.
[31]
P. Jia, J. Yin, X. Huang, and D. Hu. Incremental Laplacian Eigenmaps by Preserving Adjacent Information Between Data Points. Pattern Recogn. Lett., 30(16): 1457--1463, Dec. 2009.
[32]
L. Jiang, G. Misherghi, Z. Su, and S. Glondu. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In ICSE'07: Intl. Conf. on Softw. Eng., pages 96--105, Minneapolis, MN, May 2007.
[33]
T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code. IEEE Trans. on Softw. Eng., 28(7):654--670, July 2002.
[34]
D. Kawrykow. Enabling Precise Interpretations of Software Change Data. Master's Thesis, School of Computer Science, McGill University, Montreal, Aug. 2011.
[35]
B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. Bell System Technical Journal, 49(2):291--307, Feb. 1970.
[36]
S. Kim, K. Pan, and E. E. J. Whitehead, Jr. Memories of Bug Fixes. In SIGSOFT'06/FSE-14: Intl. Symp. on Foundations of Softw. Eng., pages 35--45, Portland, OR, Nov. 2006.
[37]
S. Kim, E. J. Whitehead, Jr., and Y. Zhang. Classifying Software Changes: Clean or Buggy? IEEE Trans. on Softw. Eng., 34(2):181--196, March 2008.
[38]
J. Li and M. D. Ernst. CBCD: Cloned Buggy Code Detector. In ICSE'12: Intl. Conf. on Softw. Eng., pages 310--320, Zürich, Switzerland, June 2012.
[39]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In OSDI'04: Symp. on Opearting Systems Design & Implementation, pages 289--302, San Francisco, CA, Dec. 2004.
[40]
O. Maqbool and H. Babri. Hierarchical Clustering for Software Architecture Recovery. IEEE Trans. on Softw. Eng., 33(11):759--780, Nov. 2007.
[41]
N. Meng, L. Hua, M. Kim, and K. S. McKinley. Does Automated Refactoring Obviate Systematic Editing? In ICSE'15: Intl. Conf. Softw. Eng. - Volume 1, pages 392--402, Florence, Italy, May 2015.
[42]
N. Meng, M. Kim, and K. S. McKinley. LASE: Locating and Applying Systematic Edits by Learning from Examples. In ICSE'13: Intl. Conf. Softw. Eng., pages 502--511, San Francisco, CA, May 2013.
[43]
S. Needleman and C. Wunsch. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Molecular Biol., 48(3):443--453, March 1970.
[44]
S. Negara, M. Codoban, D. Dig, and R. E. Johnson. Mining Fine-grained Code Changes to Detect Unknown Change Patterns. In ICSE'14: Intl. Conf. on Softw. Eng., pages 803--813, Hyderabad, India, May 2014.
[45]
H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan. A Study of Repetitiveness of Code Changes in Software Evolution. In ASE'13: Intl. Conf. on Automated Softw. Eng., pages 180--190, Palo Alto, CA, Nov. 2013.
[46]
H. A. Nguyen, T. T. Nguyen, J. G. Wilson, A. T. Nguyen, M. Kim, and T. N. Nguyen. A Graph-based Approach to API Usage Adaptation. In OOPSLA'10: Proc. Intl. Conf. Object-Oriented Progr., Systems, Languages & Appl., pages 302--321, Reno/Tahoe, NV, Oct. 2010.
[47]
T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi, and T. N. Nguyen. Recurring Bug Fixes in Object-oriented Programs. In ICSE'10: Intl. Conf. on Softw. Eng. - Volume 1, pages 315--324, Cape Town, South Africa, May 2010.
[48]
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22 (8):888--905, Aug. 2000.
[49]
M. Stoerzer, B. G. Ryder, X. Ren, and F. Tip. Finding Failure-inducing Changes in Java Programs Using Change Classification. In SIGSOFT'06/FSE-14: Intl. Symp. on Foundations of Softw. Eng., pages 57--68, Portland, OR, Nov. 2006.
[50]
J. W. Tukey. Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977.
[51]
S. Van Dongen. Graph Clustering Via a Discrete Uncoupling Process. SIAM J. on Matrix Analysis and Applications, 30(1):121--141, Feb. 2008.
[52]
A. Vanya, L. Hofland, S. Klusener, P. Van De Laar, and H. Van Vliet. Assessing Software Archives with Evolutionary Clusters. In ICPC'08: Intl. Conf. on Program Comprehension, pages 192--201, Amsterdam, The Netherlands, June 2008.
[53]
U. Von Luxburg. A Tutorial on Spectral Clustering. Statistics and Computing, 17(4):395--416, 2007.
[54]
S. Wang, D. Lo, and X. Jiang. Understanding Widespread Changes: A Taxonomic Study. In CSMR'13: European Conf. on Softw. Maintenance and Reengineering, pages 5--14, Genova, Italy, March 2013.
[55]
T. A. Wiggerts. Using Clustering Algorithms in Legacy Systems Remodularization. In WCRE'97: Working Conf. on Reverse Eng., pages 33--43, Amsterdam, The Netherlands, Oct. 1997.
[56]
M. J. Zaki and W. Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York, NY, 2014.
[57]
T. Zhang, M. Song, J. Pinedo, and M. Kim. Interactive Code Review for Systematic Changes. In ICSE'15: Intl. Conf. Softw. Eng. - Volume 1, pages 111--122, Florence, Italy, May 2015.

Cited By

View all
  • (2024)Toward Improved Deep Learning-based Vulnerability DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608141(1-12)Online publication date: 20-May-2024
  • (2024)Revealing code change propagation channels by evolution history miningJournal of Systems and Software10.1016/j.jss.2023.111912208:COnline publication date: 1-Feb-2024
  • (2024)Extracting Fix Patterns for Static Analysis Violations Based on Collective Developer KnowledgeSoftware: Practice and Experience10.1002/spe.3384Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
May 2016
544 pages
ISBN:9781450341868
DOI:10.1145/2901739
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. code changes
  3. software repositories

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Improved Deep Learning-based Vulnerability DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608141(1-12)Online publication date: 20-May-2024
  • (2024)Revealing code change propagation channels by evolution history miningJournal of Systems and Software10.1016/j.jss.2023.111912208:COnline publication date: 1-Feb-2024
  • (2024)Extracting Fix Patterns for Static Analysis Violations Based on Collective Developer KnowledgeSoftware: Practice and Experience10.1002/spe.3384Online publication date: 24-Oct-2024
  • (2023)Views on Edits to Variational SoftwareProceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A10.1145/3579027.3608985(141-152)Online publication date: 28-Aug-2023
  • (2022)Synthesizing code quality rules from examplesProceedings of the ACM on Programming Languages10.1145/35633506:OOPSLA2(1757-1787)Online publication date: 31-Oct-2022
  • (2022)Software Module Clustering: An In-Depth Literature AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2020.304255348:6(1905-1928)Online publication date: 1-Jun-2022
  • (2022)Automatic Mining of Code Fix Patterns from Code Repositories2022 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM57067.2022.9983967(27-34)Online publication date: 23-Sep-2022
  • (2022)A fine-grained data set and analysis of tangling in bug fixing commitsEmpirical Software Engineering10.1007/s10664-021-10083-527:6Online publication date: 1-Nov-2022
  • (2021)Expanding Fix Patterns to Enable Automatic Program Repair2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00015(12-23)Online publication date: Oct-2021
  • (2021)Sirius: Static Program Repair with Dependence Graph-Based Systematic Edit Patterns2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME52107.2021.00045(437-447)Online publication date: Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media