More Web Proxy on the site http://driver.im/

research-article

Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper)

Authors:

Todd Mytkowicz,

Shuvendu K. LahiriAuthors Info & Claims

ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 77 - 88

https://doi.org/10.1145/3533767.3534396

Published: 18 July 2022 Publication History

Abstract

Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests as a textual merge conflict where the merge fails to produce code, or as a semantic merge conflict where the merged code results in compiler errors or broken tests. Resolving these conflicts for large code projects is expensive because it requires developers to manually identify the sources of conflicts and correct them. In this paper, we explore the feasibility of automatically repairing merge conflicts (both textual and semantic) using k-shot learning with pre-trained large neural language models (LM) such as GPT-3. One of the challenges in leveraging such language models is fitting the examples and the queries within a small prompt (2048 tokens). We evaluate LMs and k-shot learning for both textual and semantic merge conflicts for Microsoft Edge. Our results are mixed: on one-hand, LMs provide the state-of-the-art (SOTA) performance on semantic merge conflict resolution for Edge compared to earlier symbolic approaches; on the other hand, LMs do not yet obviate the benefits of special purpose domain-specific languages (DSL) for restricted patterns for program synthesis.

References

[1]

Sven Apel, Olaf Leß enich, and Christian Lengauer. 2012. Structured Merge with Auto-Tuning: Balancing Precision and Performance. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012). ACM, New York, NY, USA. 10 pages. isbn:9781450312042 https://doi.org/10.1145/2351676.2351694

Digital Library

[2]

Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured Merge: Rethinking Merge in Revision Control Systems. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 190–200. isbn:9781450304436 https://doi.org/10.1145/2025113.2025141

Digital Library

[3]

Christian Bird and Thomas Zimmermann. 2012. Assessing the Value of Branches with What-If Analysis. FSE ’12. Association for Computing Machinery, New York, NY, USA. Article 45, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393648

Digital Library

[4]

Tom B. Brown. 2020. Language Models are Few-Shot Learners. In NeurIPS 2020, December 6-12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[5]

Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). ACM, New York, NY, USA. 11 pages. isbn:9781450304436 https://doi.org/10.1145/2025113.2025139

Digital Library

[6]

Mark Chen. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374.

[7]

Cleidson R. B. de Souza, David Redmiles, and Paul Dourish. 2003. "Breaking the Code", Moving between Private and Public Work in Collaborative Software Development. GROUP ’03. ACM, New York, NY, USA. 105–114. isbn:1581136935 https://doi.org/10.1145/958160.958177

Digital Library

[8]

Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, and Shuvendu K. Lahiri. 2021. DeepMerge: Learning to Merge Programs. arxiv:2105.07569.

[9]

Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and André van der Hoek. 2020. On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2018.2871083

[10]

Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective. ICSE ’16. 285–296. https://doi.org/10.1145/2884781.2884826

Digital Library

[11]

Mário Luís Guimarães and António Rito Silva. 2012. Improving Early Detection of Software Merge Conflicts. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press. isbn:9781467310673

Digital Library

[12]

S. Horwitz, J. Prins, and T. Reps. 1988. Integrating Non-Intering Versions of Programs. In POPL. ACM, 133–145. isbn:0897912527

[13]

Naman Jain, Skanda Vaidyanath, Arun Shankar Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram K. Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models meet Program Synthesis. In ICSE. IEEE, 1219–1231.

Digital Library

[14]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browser-assisted question-answering with human feedback. arxiv:2112.09332.

[15]

Hung Viet Nguyen, My Huu Nguyen, Son Cuu Dang, Christian Kästner, and T. Nguyen. 2015. Detecting semantic merge conflicts with variability-aware execution. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, https://doi.org/10.1145/2786805.2803208

Digital Library

[16]

Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE.

[17]

Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535

Digital Library

[18]

Victor Sanh. 2021. Multitask Prompted Training Enables Zero-Shot Task Generalization. arxiv:2110.08207.

[19]

Danhua Shao, Sarfraz Khurshid, and Dewayne E. Perry. 2009. SCA: A Semantic Conflict Analyzer for Parallel Changes. ESEC/FSE ’09. ACM, New York, NY, USA. 291–292. isbn:9781605580012 https://doi.org/10.1145/1595696.1595747

Digital Library

[20]

Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang. 2019. IntelliMerge: A Refactoring-Aware Software Merging Technique. 3, OOPSLA (2019), https://doi.org/10.1145/3360596

Digital Library

[21]

Marcelo Sousa, Isil Dillig, and Shuvendu K. Lahiri. 2018. Verified Three-Way Program Merge. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 165, Oct., https://doi.org/10.1145/3276535

Digital Library

[22]

Chungha Sung, Shuvendu K. Lahiri, Mike Kaufman, Pallavi Choudhury, and Chao Wang. 2020. Towards understanding and fixing upstream merge induced conflicts in divergent forks: an industrial case study. In ICSE-SEIP 2020: Seoul, South Korea, 27 June - 19 July, 2020. ACM. https://doi.org/10.1145/3377813.3381362

Digital Library

[23]

Alexey Svyatkovskiy, Todd Mytkowicz, Negar Ghorbani, Sarah Fakhoury, Elizabeth Dinella, Christian Bird, Neel Sundaresan, and Shuvendu Lahiri. 2021. MergeBERT: Program Merge Conflict Resolution via Neural Transformers. arxiv:2109.00084.

[24]

Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. In OOPSLA. ACM. https://doi.org/10.1145/3485477

Digital Library

[25]

Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax

[26]

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2021. Finetuned Language Models Are Zero-Shot Learners. arxiv:2109.01652.

[27]

Jan Wloka, Barbara G. Ryder, Frank Tip, and Xiaoxia Ren. 2009. Safe-commit analysis to facilitate team software development. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings. IEEE, 507–517. https://doi.org/10.1109/ICSE.2009.5070549

Digital Library

[28]

Wuu Yang, Susan Horwitz, and Thomas Reps. 1990. A Program Integration Algorithm That Accommodates Semantics-Preserving Transformations. SIGSOFT Softw. Eng. Notes, 15, 6 (1990), Oct., 11 pages. https://doi.org/10.1145/99278.99290

Digital Library

Cited By

Campos Junior HMenezes GBarros MHoek AMurta L(2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
https://doi.org/10.5753/jserd.2024.3638
Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Liang JBadea CBird CDeLine RFord DForsgren NZimmermann T(2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660767
Show More Cited By

Index Terms

Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper)
1. Software and its engineering
  1. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

Causes of merge conflicts: a case study of ElasticSearch
VaMoS '20: Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems

Software branching and merging allows collaborative development and creating software variants, commonly referred to as clone & own. While simple and cheap, a trade-off is the need to merge code and to resolve merge conflicts, which frequently occur in ...
Detecting semantic merge conflicts with variability-aware execution
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

In collaborative software development, changes made in parallel by multiple developers may conflict. Previous research has shown that conflicts are common and occur as textual conflicts or semantic conflicts, which manifest as build or test failures. ...
How do developers resolve merge conflicts? an investigation into the processes, tools, and improvements
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Most software development is done in teams. When more than one developer is modifying the source code, there is a change that their changes will conflict. When this happens, developers have to interrupt their workflow in order to resolve the merge ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2022

808 pages

ISBN:9781450393799

DOI:10.1145/3533767

General Chair:
Sukyoung Ryu
KAIST, South Korea
,
Program Chair:
Yannis Smaragdakis
University of Athens, Greece

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '22

Sponsor:

SIGSOFT

ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

July 18 - 22, 2022

Virtual, South Korea

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
482
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)22

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Campos Junior HMenezes GBarros MHoek AMurta L(2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
https://doi.org/10.5753/jserd.2024.3638
Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Liang JBadea CBird CDeLine RFord DForsgren NZimmermann T(2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660767
Zhang JCambronero JGulwani SLe VPiskac RSoares GVerbruggen G(2024)PyDex: Repairing Bugs in Introductory Python Assignments using LLMsProceedings of the ACM on Programming Languages10.1145/36498508:OOPSLA1(1100-1124)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649850
He MJia TDuan CCai HLi YHuang G(2024)LLMeLog: An Approach for Anomaly Detection based on LLM-enriched Log Events2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00023(132-143)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00023
Sasaki YWashizaki HLi JSander DYoshioka NFukazawa Y(2024)Systematic Literature Review of Prompt Engineering Patterns in Software Engineering2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00096(670-675)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00096
Mofatteh MPirayesh AFatahi Valilai O(2024)A Layered Semantic Interoperability Framework for Conflict Resolution of Semantic Models in Smart DevicesIntelligent Systems and Applications10.1007/978-3-031-66431-1_30(425-445)Online publication date: 31-Jul-2024
https://doi.org/10.1007/978-3-031-66431-1_30
Khan JUddin G(2023)Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00071(683-687)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00071
Shen CYang WPan MZhou Y(2023)Git Merge Conflict Resolution Leveraging Strategy Classification and LLM2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00031(228-239)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS60937.2023.00031
Aldndni WMeng NServant F(2023)Automatic prediction of developers’ resolutions for software merge conflictsJournal of Systems and Software10.1016/j.jss.2023.111836206:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111836
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents