[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3533767.3534396acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper)

Published: 18 July 2022 Publication History

Abstract

Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests as a textual merge conflict where the merge fails to produce code, or as a semantic merge conflict where the merged code results in compiler errors or broken tests. Resolving these conflicts for large code projects is expensive because it requires developers to manually identify the sources of conflicts and correct them. In this paper, we explore the feasibility of automatically repairing merge conflicts (both textual and semantic) using k-shot learning with pre-trained large neural language models (LM) such as GPT-3. One of the challenges in leveraging such language models is fitting the examples and the queries within a small prompt (2048 tokens). We evaluate LMs and k-shot learning for both textual and semantic merge conflicts for Microsoft Edge. Our results are mixed: on one-hand, LMs provide the state-of-the-art (SOTA) performance on semantic merge conflict resolution for Edge compared to earlier symbolic approaches; on the other hand, LMs do not yet obviate the benefits of special purpose domain-specific languages (DSL) for restricted patterns for program synthesis.

References

[1]
Sven Apel, Olaf Leß enich, and Christian Lengauer. 2012. Structured Merge with Auto-Tuning: Balancing Precision and Performance. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012). ACM, New York, NY, USA. 10 pages. isbn:9781450312042 https://doi.org/10.1145/2351676.2351694
[2]
Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured Merge: Rethinking Merge in Revision Control Systems. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 190–200. isbn:9781450304436 https://doi.org/10.1145/2025113.2025141
[3]
Christian Bird and Thomas Zimmermann. 2012. Assessing the Value of Branches with What-If Analysis. FSE ’12. Association for Computing Machinery, New York, NY, USA. Article 45, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393648
[4]
Tom B. Brown. 2020. Language Models are Few-Shot Learners. In NeurIPS 2020, December 6-12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[5]
Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). ACM, New York, NY, USA. 11 pages. isbn:9781450304436 https://doi.org/10.1145/2025113.2025139
[6]
Mark Chen. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374.
[7]
Cleidson R. B. de Souza, David Redmiles, and Paul Dourish. 2003. "Breaking the Code", Moving between Private and Public Work in Collaborative Software Development. GROUP ’03. ACM, New York, NY, USA. 105–114. isbn:1581136935 https://doi.org/10.1145/958160.958177
[8]
Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, and Shuvendu K. Lahiri. 2021. DeepMerge: Learning to Merge Programs. arxiv:2105.07569.
[9]
Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and André van der Hoek. 2020. On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2018.2871083
[10]
Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective. ICSE ’16. 285–296. https://doi.org/10.1145/2884781.2884826
[11]
Mário Luís Guimarães and António Rito Silva. 2012. Improving Early Detection of Software Merge Conflicts. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press. isbn:9781467310673
[12]
S. Horwitz, J. Prins, and T. Reps. 1988. Integrating Non-Intering Versions of Programs. In POPL. ACM, 133–145. isbn:0897912527
[13]
Naman Jain, Skanda Vaidyanath, Arun Shankar Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram K. Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models meet Program Synthesis. In ICSE. IEEE, 1219–1231.
[14]
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browser-assisted question-answering with human feedback. arxiv:2112.09332.
[15]
Hung Viet Nguyen, My Huu Nguyen, Son Cuu Dang, Christian Kästner, and T. Nguyen. 2015. Detecting semantic merge conflicts with variability-aware execution. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, https://doi.org/10.1145/2786805.2803208
[16]
Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE.
[17]
Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535
[18]
Victor Sanh. 2021. Multitask Prompted Training Enables Zero-Shot Task Generalization. arxiv:2110.08207.
[19]
Danhua Shao, Sarfraz Khurshid, and Dewayne E. Perry. 2009. SCA: A Semantic Conflict Analyzer for Parallel Changes. ESEC/FSE ’09. ACM, New York, NY, USA. 291–292. isbn:9781605580012 https://doi.org/10.1145/1595696.1595747
[20]
Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang. 2019. IntelliMerge: A Refactoring-Aware Software Merging Technique. 3, OOPSLA (2019), https://doi.org/10.1145/3360596
[21]
Marcelo Sousa, Isil Dillig, and Shuvendu K. Lahiri. 2018. Verified Three-Way Program Merge. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 165, Oct., https://doi.org/10.1145/3276535
[22]
Chungha Sung, Shuvendu K. Lahiri, Mike Kaufman, Pallavi Choudhury, and Chao Wang. 2020. Towards understanding and fixing upstream merge induced conflicts in divergent forks: an industrial case study. In ICSE-SEIP 2020: Seoul, South Korea, 27 June - 19 July, 2020. ACM. https://doi.org/10.1145/3377813.3381362
[23]
Alexey Svyatkovskiy, Todd Mytkowicz, Negar Ghorbani, Sarah Fakhoury, Elizabeth Dinella, Christian Bird, Neel Sundaresan, and Shuvendu Lahiri. 2021. MergeBERT: Program Merge Conflict Resolution via Neural Transformers. arxiv:2109.00084.
[24]
Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. In OOPSLA. ACM. https://doi.org/10.1145/3485477
[25]
Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax
[26]
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2021. Finetuned Language Models Are Zero-Shot Learners. arxiv:2109.01652.
[27]
Jan Wloka, Barbara G. Ryder, Frank Tip, and Xiaoxia Ren. 2009. Safe-commit analysis to facilitate team software development. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings. IEEE, 507–517. https://doi.org/10.1109/ICSE.2009.5070549
[28]
Wuu Yang, Susan Horwitz, and Thomas Reps. 1990. A Program Integration Algorithm That Accommodates Semantics-Preserving Transformations. SIGSOFT Softw. Eng. Notes, 15, 6 (1990), Oct., 11 pages. https://doi.org/10.1145/99278.99290

Cited By

View all
  • (2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
  • (2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
  • Show More Cited By

Index Terms

  1. Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper)

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2022
    808 pages
    ISBN:9781450393799
    DOI:10.1145/3533767
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPT-3
    2. Resolving merge conflicts
    3. k-shot learning
    4. language model

    Qualifiers

    • Research-article

    Conference

    ISSTA '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)134
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
    • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
    • (2024)Can GPT-4 Replicate Empirical Software Engineering Research?Proceedings of the ACM on Software Engineering10.1145/36607671:FSE(1330-1353)Online publication date: 12-Jul-2024
    • (2024)PyDex: Repairing Bugs in Introductory Python Assignments using LLMsProceedings of the ACM on Programming Languages10.1145/36498508:OOPSLA1(1100-1124)Online publication date: 29-Apr-2024
    • (2024)LLMeLog: An Approach for Anomaly Detection based on LLM-enriched Log Events2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00023(132-143)Online publication date: 28-Oct-2024
    • (2024)Systematic Literature Review of Prompt Engineering Patterns in Software Engineering2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00096(670-675)Online publication date: 2-Jul-2024
    • (2024)A Layered Semantic Interoperability Framework for Conflict Resolution of Semantic Models in Smart DevicesIntelligent Systems and Applications10.1007/978-3-031-66431-1_30(425-445)Online publication date: 31-Jul-2024
    • (2023)Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00071(683-687)Online publication date: Mar-2023
    • (2023)Git Merge Conflict Resolution Leveraging Strategy Classification and LLM2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00031(228-239)Online publication date: 22-Oct-2023
    • (2023)Automatic prediction of developers’ resolutions for software merge conflictsJournal of Systems and Software10.1016/j.jss.2023.111836206:COnline publication date: 1-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media