[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3597926.3598036acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond

Published: 13 July 2023 Publication History

Abstract

Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better.

References

[1]
Samira Abnar, Lisa Beinborn, Rochelle Choenni, and Willem H. Zuidema. 2019. Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains. In BlackboxNLP@ACL. Association for Computational Linguistics, 191–203.
[2]
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In NAACL-HLT. Association for Computational Linguistics, 2655–2668.
[3]
Alfred V Aho, Monica S Lam, Ravi Sethi, and Jeffrey D Ullman. 2007. Compilers: principles, techniques, & tools. Pearson Education India.
[4]
Haldun Akoglu. 2018. User’s guide to correlation coefficients. Turkish journal of emergency medicine, 18, 3 (2018), 91–93.
[5]
Miltiadis Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale using Language Modeling. In 2013 10th Working Conference on Mining Software Repositories (MSR). 207–216.
[6]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In IEEvaluation@ACL.
[7]
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2022. BEiT: BERT Pre-Training of Image Transformers. In ICLR. OpenReview.net.
[8]
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5, 2 (1994), 157–166.
[9]
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, and Baishakhi Ray. 2022. NatGen: Generative pre-training by" Naturalizing" source code.
[10]
Grzegorz Chrupala and Afra Alishahi. 2019. Correlating Neural and Symbolic Representations of Language. In ACL (1). Association for Computational Linguistics, 2952–2962.
[11]
Yingnong Dang, Song Ge, Ray Huang, and Dongmei Zhang. 2011. Code clone detection experience at Microsoft. In Proceedings of the 5th International Workshop on Software Clones. 63–64.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.
[13]
Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, and Saikat Chakraborty. 2022. Towards Learning (Dis)-Similarity of Source Code from Program Contrasts. In ACL (1). Association for Computational Linguistics, 6300–6312.
[14]
Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, and Dongmei Zhang. 2021. Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2994–2998.
[15]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In EMNLP (Findings).
[16]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In ICSE. ACM, 933–944.
[17]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In ACL (1). Association for Computational Linguistics, 7212–7225.
[18]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
[19]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In CVPR. IEEE, 15979–15988.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[21]
José Antonio Hernández López, Martin Weyssow, Jesús Sánchez Cuadrado, and Houari Sahraoui. 2022. AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. In 37th IEEE/ACM International Conference on Automated Software Engineering. 1–11.
[22]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 2790–2799.
[23]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR, abs/1909.09436 (2019), arXiv:1909.09436. arxiv:1909.09436
[24]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2018. Mapping language to code in programmatic context. arXiv preprint arXiv:1808.09588.
[25]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2018. Mapping Language to Code in Programmatic Context. In EMNLP. Association for Computational Linguistics, 1643–1652.
[26]
Junguang Jiang, Yang Shu, Jianmin Wang, and Mingsheng Long. 2022. Transferability in Deep Learning: A Survey. arXiv preprint arXiv:2201.05867.
[27]
Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In ICSE. IEEE, 1161–1173.
[28]
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A unified architecture for accelerating distributed $DNN$ training in heterogeneous $GPU/CPU$ clusters. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 463–479.
[29]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. In EMNLP (Findings) (Findings of ACL, Vol. EMNLP 2020). Association for Computational Linguistics, 4163–4174.
[30]
Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code? In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1332–1336.
[31]
Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. 2008. Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 4.
[32]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In ACL.
[33]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.11692 (2019).
[34]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. CoRR, abs/2102.04664 (2021).
[35]
Wenhao Lu, Jian Jiao, and Ruofei Zhang. 2020. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2645–2652.
[36]
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering, 308–320.
[37]
Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, and Ian Tenney. 2020. What Happens To BERT Embeddings During Fine-tuning? In BlackboxNLP@EMNLP. Association for Computational Linguistics, 33–44.
[38]
Anders Møller and Michael I Schwartzbach. 2012. Static program analysis. Notes. Feb.
[39]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 1287–1293.
[40]
Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations. In ICSE. ACM, 1–13.
[41]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In ACL. ACL, 311–318.
[42]
Julian Aron Prenner and Romain Robbes. 2021. Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs. arXiv preprint arXiv:2111.03922.
[43]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., 21 (2020), 140:1–140:67.
[44]
Mark Sanderson. 2010. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. ISBN-13 978-0-521-86571-5, xxi+ 482 pages. Natural Language Engineering, 16, 1 (2010), 100–103.
[45]
Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia, 126, 5 (2018), 1763–1768.
[46]
Ensheng Shi, Wenchao Gub, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2022. Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation. arXiv preprint arXiv:2204.03293.
[47]
Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2022. On the Evaluation of Neural Code Summarization. In ICSE.
[48]
Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and Policy Considerations for Modern Deep Learning Research. In AAAI. AAAI Press, 13693–13696.
[49]
Jeffrey Svajlenko, Judith F Islam, Iman Keivanloo, Chanchal K Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. 476–480.
[50]
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
[51]
Telly. 2023. Replication Package. ISSTA, https://github.com/DeepSoftwareAnalytics/Telly
[52]
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL (1). Association for Computational Linguistics, 4593–4601.
[53]
Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. 2019. What do you learn from context? Probing for sentence structure in contextualized word representations. In ICLR (Poster). OpenReview.net.
[54]
Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Code. arXiv preprint arXiv:2202.08975.
[55]
Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota. 2022. Using Pre-Trained Models to Boost Code Review Automation. In ICSE. ACM, 2291–2302.
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998–6008.
[57]
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr: Consensus-based image description evaluation. In CVPR.
[58]
Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2022. What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code. In ICSE. ACM, 2377–2388.
[59]
Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Guihong Cao, Daxin Jiang, and Ming Zhou. 2020. K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808.
[60]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33 (2020), 5776–5788.
[61]
Yanlin Wang, Lun Du, Ensheng Shi, Yuxuan Hu, Shi Han, and Dongmei Zhang. 2020. Cocogum: Contextual code summarization with multi-relational gnn on umls. Microsoft, MSR-TR-2020-16. [Online]. Available:. https://www.microsoft.com/en-us/research/publication/cocogum-contextual-code-summarization-with-multi-relational-gnn-on-umls
[62]
Yiding Wang, Decang Sun, Kai Chen, Fan Lai, and Mosharaf Chowdhury. 2022. Efficient DNN Training with Knowledge-Guided Layer Freezing. CoRR, abs/2201.06227 (2022).
[63]
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP (1). Association for Computational Linguistics, 8696–8708.
[64]
Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. 2022. An extensive study on pre-trained models for program understanding and generation. In ISSTA. ACM, 39–51.

Cited By

View all
  • (2025)On Inter-Dataset Code Duplication and Data Leakage in Large Language ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.350428651:1(192-205)Online publication date: Jan-2025
  • (2024)DataRecipe --- How to Cook the Data for CodeLLM?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695593(1206-1218)Online publication date: 27-Oct-2024
  • (2024)Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program RepairProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695066(719-731)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2023
    1554 pages
    ISBN:9798400702211
    DOI:10.1145/3597926
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Efficient Fine-tuning
    2. Empirical study
    3. Pre-Trained Language Models
    4. Probing Techniques
    5. Representational Similarity Analysis

    Qualifiers

    • Research-article

    Conference

    ISSTA '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)277
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)On Inter-Dataset Code Duplication and Data Leakage in Large Language ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.350428651:1(192-205)Online publication date: Jan-2025
    • (2024)DataRecipe --- How to Cook the Data for CodeLLM?Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695593(1206-1218)Online publication date: 27-Oct-2024
    • (2024)Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program RepairProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695066(719-731)Online publication date: 27-Oct-2024
    • (2024)An Empirical Study on Code Search Pre-trained Models: Academic Progresses vs. Industry RequirementsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672580(41-50)Online publication date: 24-Jul-2024
    • (2024)Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-Coded CredentialsProceedings of the ACM on Software Engineering10.1145/36608181:FSE(2515-2537)Online publication date: 12-Jul-2024
    • (2024)Exploiting the Adversarial Example Vulnerability of Transfer Learning of Source CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340215319(5880-5894)Online publication date: 2024
    • (2024)Rethinking the Role of Structural Information: How It Enhances Code Representation Learning?2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651132(1-8)Online publication date: 30-Jun-2024
    • (2024)Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder ModelsIEEE Access10.1109/ACCESS.2024.348238912(154700-154717)Online publication date: 2024
    • (2024)Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect predictionNeural Computing and Applications10.1007/s00521-024-09930-536:27(16911-16940)Online publication date: 1-Sep-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media