[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3597926.3598035acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Public Access

CONCORD: Clone-Aware Contrastive Learning for Source Code

Published: 13 July 2023 Publication History

Abstract

Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection.
While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it is also essential to factor in how developers code day-to-day for learning general-purpose representation. On the one hand, human developers tend to write repetitive programs referencing existing code snippets from the current codebase or online resources (e.g., Stack Overflow website) rather than implementing functions from scratch; such behaviors result in a vast number of code clones. In contrast, a deviant clone by mistake might trigger malicious program behaviors.
Thus, as a proxy to incorporate developers' coding behavior into the pre-training scheme, we propose to include code clones and their deviants. In particular, we propose CONCORD, a self-supervised pre-training strategy to place benign clones closer in the representation space while moving deviants further apart. We show that CONCORD's clone-aware pre-training drastically reduces the need for expensive pre-training resources while improving the performance of downstream SE tasks. We also empirically demonstrate that CONCORD can improve existing pre-trained models to learn better representations that consequently become more efficient in both identifying semantically equivalent programs and differentiating buggy from non-buggy code.

References

[1]
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online. 2655–2668. https://www.aclweb.org/anthology/2021.naacl-main.211
[2]
Toufique Ahmed and Prem Devanbu. 2022. Multilingual training for Software Engineering. abs/2112.02043 (2022).
[3]
Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. In NeurIPS.
[4]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732
[5]
B. S. Baker. 1995. On Finding Duplication and Near-Duplication in Large Software Systems. In Proceedings of the Second Working Conference on Reverse Engineering (WCRE ’95). IEEE Computer Society, USA. 86. isbn:0818671114
[6]
Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2021. Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations. In SIGIR ’21. 511–521. https://doi.org/10.1145/3404835.3462840
[7]
Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, and Giacomo Domeniconi. 2020. Exploring Software Naturalness through Neural Language Models. arxiv:2006.12641.
[8]
Casey Casalnuovo, Earl T Barr, Santanu Kumar Dash, Prem Devanbu, and Emily Morgan. 2020. A theory of dual channel constraints. In 2020 IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 25–28.
[9]
Casey Casalnuovo, Kevin Lee, Hulin Wang, Prem Devanbu, and Emily Morgan. 2020. Do programmers prefer predictable expressions in code? Cognitive science, 44, 12 (2020), e12921.
[10]
Casey Casalnuovo, E Morgan, and P Devanbu. 2020. Does surprisal predict code comprehension difficulty. In Proceedings of the 42nd Annual Meeting of the Cognitive Science Society.
[11]
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, and Baishakhi Ray. 2022. NatGen: Generative pre-training by" Naturalizing" source code. In 2022 The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
[12]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2021. Deep Learning based Vulnerability Detection: Are We There Yet. IEEE Transactions on Software Engineering, 1–1. https://doi.org/10.1109/TSE.2021.3087402
[13]
Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Graham Neubig, Bogdan Vasilescu, and Claire Le Goues. 2021. VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning. CoRR, abs/2112.02650 (2021), arXiv:2112.02650. arxiv:2112.02650
[14]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 1597–1607. https://proceedings.mlr.press/v119/chen20j.html
[15]
S. Chopra, R. Hadsell, and Y. LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 1, 539–546 vol. 1. https://doi.org/10.1109/CVPR.2005.202
[16]
Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. 2001. An Empirical Study of Operating Systems Errors. SIGOPS Oper. Syst. Rev., 35, 5 (2001), oct, 73–88. issn:0163-5980 https://doi.org/10.1145/502059.502042
[17]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR. https://openreview.net/pdf?id=r1xMH1BtvB
[18]
CVE-2022-23559. 2022. https://nvd.nist.gov/vuln/detail/CVE-2022-23559
[19]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423
[20]
Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, and Saikat Chakraborty. 2022. Towards Learning (Dis)-Similarity of Source Code from Program Contrasts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland. 6300–6312. https://doi.org/10.18653/v1/2022.acl-long.436
[21]
Yangruibo Ding, Baishakhi Ray, Devanbu Premkumar, and Vincent J. Hellendoorn. 2020. Patching as Translation: the Data and the Metaphor. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). https://doi.org/10.1145/3324884.3416587
[22]
Stéphane Ducasse, Matthias Rieger, and Serge Demeyer. 1999. A Language Independent Approach for Detecting Duplicated Code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM ’99). IEEE Computer Society, USA. 109. isbn:0769500161
[23]
Adam Paszke. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library.
[24]
Ben Athiwaratkun. 2023. Multi-lingual Evaluation of Code Generation Models. In International Conference on Learning Representations. https://openreview.net/forum?id=Bo7eeXm6An8
[25]
Martín Abadi. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org
[26]
Mark Chen. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374
[27]
Tom B. Brown. 2020. Language Models are Few-Shot Learners. CoRR, abs/2005.14165 (2020), arXiv:2005.14165. arxiv:2005.14165
[28]
Thomas Wolf. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
[29]
Yujia Li. 2022. Competition-Level Code Generation with AlphaCode. ArXiv, abs/2203.07814 (2022).
[30]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[31]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Empirical Methods in Natural Language Processing (EMNLP).
[32]
GitHub. 2021. GitHub Copilot: Your AI Pair Programmer. https://copilot.github.com/
[33]
Dan Gopstein, Anne-Laure Fayard, Sven Apel, and Justin Cappos. 2020. Thinking Aloud about Confusing Code: A Qualitative Investigation of Program Comprehension and Atoms of Confusion. In In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20). https://doi.org/10.1145/3368089.3409714
[34]
Dan Gopstein, Jake Iannacone, Yu Yan, Lois Anne Delong, Yanyan Zhuang, Martin K.-C. Yeh, and Justin Cappos. 2017. Understanding Misunderstandings in Source Code. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 129–139.
[35]
Dan Gopstein, Hongwei Henry Zhou, Phyllis Frankl, and Justin Cappos. 2018. Prevalence of Confusing Code in Software Projects: Atoms of Confusion in the Wild. In Proceedings of the 15th International Conference on Mining Software Repositories. 281–291. https://doi.org/10.1145/3196398.3196432
[36]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. https://doi.org/10.48550/ARXIV.2203.03850
[37]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie LIU, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCode\BERT Pre-training Code Representations with Data Flow. In International Conference on Learning Representations. https://openreview.net/forum?id=jLoC4ez43PZ
[38]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, 837–847. isbn:9781467310673
[39]
Harold Hotelling. 1936. Relations Between Two Sets of Variates. Biometrika, 28, 3/4 (1936), 321–377. issn:00063444 http://www.jstor.org/stable/2333955
[40]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR, abs/1909.09436 (2019), arXiv:1909.09436. arxiv:1909.09436
[41]
Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, and Ion Stoica. 2020. Contrastive Code Representation Learning. arXiv preprint.
[42]
Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A Tree-Based Pre-Trained Model for Programming Language. ArXiv, abs/2105.12485 (2021).
[43]
Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. 2009. Do Code Clones Matter? In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, USA. 485–495. isbn:9781424434534 https://doi.org/10.1109/ICSE.2009.5070547
[44]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). Association for Computing Machinery, New York, NY, USA. 437–440. isbn:9781450326452 https://doi.org/10.1145/2610384.2628055
[45]
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In ICML 2020.
[46]
Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 1073–1085.
[47]
Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin B. Clement, and Neel Sundaresan. 2022. Learning to Reduce False Positives in Analytic Bug Detectors.
[48]
Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In 2017 IEEE Symposium on Security and Privacy (SP). 595–614. https://doi.org/10.1109/SP.2017.62
[49]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980 (2015).
[50]
Donald E. Knuth. 1984. Literate Programming. Comput. J., 27, 2 (1984), may, 97–111. issn:0010-4620 https://doi.org/10.1093/comjnl/27.2.97
[51]
Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium. 66–71. https://doi.org/10.18653/v1/D18-2012
[52]
Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned Buggy Code Detector. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, 310–320. isbn:9781467310673
[53]
Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2004. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In 6th Symposium on Operating Systems Design & Implementation (OSDI 04). USENIX Association, San Francisco, CA. https://www.usenix.org/conference/osdi-04/cp-miner-tool-finding-copy-paste-and-related-bugs-operating-system-code
[54]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC ’16). 201–213. isbn:9781450347716 https://doi.org/10.1145/2991079.2991102
[55]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.11692 (2019), arXiv:1907.11692. arxiv:1907.11692
[56]
Shuai Lu, Nan Duan, Hojae Han, Daya Guo, Seung-won Hwang, and Alexey Svyatkovskiy. 2022. ReACC: A Retrieval-Augmented Code Completion Framework. https://doi.org/10.48550/ARXIV.2203.07722
[57]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. CoRR, abs/2102.04664 (2021).
[58]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code Recommendation via Structural Code Search. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 152, oct, 28 pages. https://doi.org/10.1145/3360578
[59]
MITRE. 2020. Common Weakness Enumeration. https://cwe.mitre.org/data/index.html
[60]
MITRE. 2022. Common Vulnerabilities and Exposures. https://cve.mitre.org
[61]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 1287–1293.
[62]
Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations. CoRR, abs/2201.01549 (2022), arXiv:2201.01549. arxiv:2201.01549
[63]
Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 147, oct, 25 pages. https://doi.org/10.1145/3276517
[64]
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir R. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. 2021. Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. CoRR, abs/2105.12655 (2021), arXiv:2105.12655. arxiv:2105.12655
[65]
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the "Naturalness" of Buggy Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machinery, New York, NY, USA. 428–439. isbn:9781450339001 https://doi.org/10.1145/2884781.2884848
[66]
Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 367–377. https://doi.org/10.1109/ASE.2013.6693095
[67]
Chanchal K. Roy. 2009. Detection and analysis of near-miss software clones. In 2009 IEEE International Conference on Software Maintenance. 447–450. https://doi.org/10.1109/ICSM.2009.5306301
[68]
Vaibhav Saini, Farima Farmahinifarahani, Hitesh Sajnani, and Cristina Lopes. 2021. Oreo: Scaling Clone Detection Beyond Near-Miss Clones. Springer Singapore, Singapore. 63–74. isbn:978-981-16-1927-4 https://doi.org/10.1007/978-981-16-1927-4_5
[69]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877
[70]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
[71]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 56 (2014), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
[72]
Jeffrey Svajlenko, Judith F Islam, Iman Keivanloo, Chanchal K Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. 476–480.
[73]
Tree-sitter. 2022. Tree-sitter. https://github.com/tree-sitter/tree-sitter
[74]
Md. Sharif Uddin, Chanchal K. Roy, Kevin A. Schneider, and Abram Hindle. 2011. On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems. In 2011 18th Working Conference on Reverse Engineering. 13–22. https://doi.org/10.1109/WCRE.2011.12
[75]
Xin Wang, Yasheng Wang, Fei Mi, Pingyi Zhou, Yao Wan, Xiao Liu, Li Li, Hao Wu, Jin Liu, and Xin Jiang. 2021. SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation. https://doi.org/10.48550/ARXIV.2108.04556
[76]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021.
[77]
Frank F Xu, Uri Alon, Graham Neubig, and Vincent J Hellendoorn. 2022. A Systematic Evaluation of Large Language Models of Code. arXiv preprint arXiv:2202.13169.
[78]
Yunhui Zheng, Saurabh Pujar, Burn Lewis, Luca Buratti, Edward Epstein, Bo Yang, Jim Laredo, Alessandro Morari, and Zhong Su. 2021. D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 111–120. https://doi.org/10.1109/ICSE-SEIP52600.2021.00020
[79]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197–10207.

Cited By

View all
  • (2024)Semantic-aware Source Code ModelingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695605(2494-2497)Online publication date: 27-Oct-2024
  • (2024)Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639168(1-13)Online publication date: 20-May-2024
  • (2024)Measuring model alignment for code clone detection using causal interpretationEmpirical Software Engineering10.1007/s10664-024-10583-030:2Online publication date: 19-Dec-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2023
1554 pages
ISBN:9798400702211
DOI:10.1145/3597926
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bug Detection
  2. Code Clone
  3. Source Code Pre-training

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)320
  • Downloads (Last 6 weeks)29
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic-aware Source Code ModelingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695605(2494-2497)Online publication date: 27-Oct-2024
  • (2024)Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639168(1-13)Online publication date: 20-May-2024
  • (2024)Measuring model alignment for code clone detection using causal interpretationEmpirical Software Engineering10.1007/s10664-024-10583-030:2Online publication date: 19-Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media