[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Poison Attack and Poison Detection on Deep Source Code Processing Models

Published: 14 March 2024 Publication History

Abstract

In the software engineering (SE) community, deep learning (DL) has recently been applied to many source code processing tasks, achieving state-of-the-art results. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat to DL models, namely, poison attacks. The attackers aim to inject insidious backdoors into DL models by poisoning the training data with poison samples. The backdoors mean that poisoned models work normally with clean inputs but produce targeted erroneous results with inputs embedded with specific triggers. By using triggers to activate backdoors, attackers can manipulate poisoned models in security-related scenarios (e.g., defect detection) and lead to severe consequences.
To verify the vulnerability of deep source code processing models to poison attacks, we present a poison attack approach for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable and functionality-preserving poison samples and effectively attack deep source code processing models by poisoning the training data with poison samples. To defend against poison attacks, we further propose an effective poison detection approach named CodeDetector. CodeDetector can automatically identify poison samples in the training data. We apply CodePoisoner and CodeDetector to six deep source code processing models, including defect detection, clone detection, and code repair models. The results show that ❶ CodePoisoner conducts successful poison attacks with a high attack success rate (average: 98.3%, maximum: 100%). It validates that existing deep source code processing models have a strong vulnerability to poison attacks. ❷ CodeDetector effectively defends against multiple poison attack approaches by detecting (maximum: 100%) poison samples in the training data. We hope this work can help SE researchers and practitioners notice poison attacks and inspire the design of more advanced defense techniques.

References

[1]
Wikipedia. 2023. Wikipedia. https://www.wikipedia.org
[2]
Google. 2023. Goole Translation. https://translate.google.com
[3]
GitHub. 2023. GitHub. https://github.com/
[4]
Stack Overflow. 2023. Stack Overflow. https://stackoverflow.com
[5]
[6]
[7]
MicroSoft. 2023. GitHub Copilot. https://copilot.github.com
[10]
Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. 2023. TrojanPuzzle: Covertly poisoning code-suggestion models. CoRR abs/2301.02344 (2023).
[11]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 1–37.
[12]
Brenda S. Baker. 1995. On finding duplication and near-duplication in large software systems. In 2nd Working Conference on Reverse Engineering. IEEE, 86–95.
[13]
Mauro Barni, Kassem Kallas, and Benedetta Tondi. 2019. A new backdoor attack in CNNs by training set corruption without label poisoning. In IEEE International Conference on Image Processing (ICIP’19). IEEE, 101–105.
[14]
Sahil Bhatia and Rishabh Singh. 2016. Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016).
[15]
Eitan Borgnia, Jonas Geiping, Valeriia Cherepanova, Liam Fowl, Arjun Gupta, Amin Ghiasi, Furong Huang, Micah Goldblum, and Tom Goldstein. 2021. DP-InstaHide: Provably defusing poisoning and backdoor attacks with differentially private data augmentations. CoRR abs/2103.02079 (2021).
[16]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[17]
Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, and Baishakhi Ray. 2020. CODIT: Code editing with tree-based neural models. IEEE Trans. Softw. Eng. 48, 4 (2020), 1385–1399.
[18]
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2019. Detecting backdoor attacks on deep neural networks by activation clustering. In AAAI Workshop on Artificial Intelligence Safety (SafeAI@ AAAI’19).
[19]
Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, and Chun Fan. 2021. BadPre: Task-agnostic backdoor attacks to pre-trained NLP foundation models. In International Conference on Learning Representations.
[20]
Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, and Yang Zhang. 2021. BadNL: Backdoor attacks against NLP models. In ICML Workshop on Adversarial Machine Learning.
[21]
Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans. Softw. Eng. 47, 9 (2019), 1943–1959.
[22]
Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Trans. Softw. Eng. Methodol. 30, 3 (2021), 1–33.
[23]
Saumya K. Debray, William Evans, Robert Muth, and Bjorn De Sutter. 2000. Compiler techniques for code compaction. ACM Trans. Program. Lang. Syst. 22, 2 (2000), 378–415.
[24]
John R. Douceur. 2002. The Sybil attack. In International Workshop on Peer-to-peer Systems. Springer, 251–260.
[25]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics (EMNLP’20). Association for Computational Linguistics, 1536–1547.
[26]
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, and Chun Fan. 2022. Triggerless backdoor attack for NLP tasks with clean labels. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’22), Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 2942–2952.
[27]
Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. 2019. STRIP: A defence against trojan attacks on deep neural networks. In 35th Annual Computer Security Applications Conference. 113–125.
[28]
David Gros, Hariharan Sezhiyan, Prem Devanbu, and Zhou Yu. 2020. Code to comment “translation”: Data, metrics, baselining & evaluation. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20). IEEE, 746–757.
[29]
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017).
[30]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing common C language errors by deep learning. In AAAI Conference on Artificial Intelligence, Vol. 31.
[31]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
[32]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In IEEE/ACM 26th International Conference on Program Comprehension (ICPC’18). IEEE, 200–20010.
[33]
Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, and Kui Ren. 2022. Backdoor defense via decoupling the training process. In 10th International Conference on Learning Representations (ICLR’22). OpenReview.net. Retrieved from https://openreview.net/forum?id=TySnJ-0RdKI
[34]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet challenge: Evaluating the state of semantic code search. CoRR abs/1909.09436 (2019). arXiv:1909.09436 http://arxiv.org/abs/1909.09436
[35]
Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. 2021. Contrastive code representation learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 5954–5971.
[36]
Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-reuse attacks on deep learning systems. In ACM SIGSAC Conference on Computer and Communications Security. 349–363.
[37]
Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-aware neural machine translation for automatic program repair. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 1161–1173.
[38]
Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big code!= big vocabulary: Open-vocabulary models for source code. In ACM/IEEE 42nd International Conference on Software Engineering. 1073–1085.
[39]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Conference on Empirical Methods in Natural Language Processing (EMNLP’14). ACL, 1746–1751.
[40]
Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight poisoning attacks on pretrained models. In 58th Annual Meeting of the Association for Computational Linguistics. 2793–2806.
[41]
Jia Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, and Zhiyi Fu. 2023. CodeEditor: Learning to edit source code with pre-trained models. ACM Trans. Softw. Eng. Methodol. 32, 6 (2023), 143:1–143:22.
[42]
Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2021. EDITSUM: A retrieve-and-edit framework for source code summarization. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE.
[43]
Jia Li, Ge Li, Yongmin Li, and Zhi Jin. 2023. Enabling programming thinking in large language models toward code generation. CoRR abs/2305.06599 (2023). arXiv:2305.06599.
[44]
Jia Li, Yongmin Li, Ge Li, Zhi Jin, Yiyang Hao, and Xing Hu. 2023. SkCoder: A sketch-based approach for automatic code generation. In 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 2124–2135. DOI:
[45]
Jia Li, Yunfei Zhao, Yongmin Li, Ge Li, and Zhi Jin. 2023. Towards Enhancing In-Context Learning for Code Generation. CoRR abs/2303.17780 (2023). arXiv:2303.17780
[46]
Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. 2021. Hidden backdoors in human-centric language models. In ACM SIGSAC Conference on Computer and Communications Security. 3123–3140. DOI:
[47]
Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao, Haojin Zhu, and Xinpeng Zhang. 2020. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Depend. Secure Comput. 18, 5 (2020), 2088–2105.
[48]
Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, and Yunxin Liu. 2021. DeepPayload: Black-box backdoor attack on deep learning models through neural payload injection. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 263–274.
[49]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Anti-backdoor learning: Training clean models on poisoned data. In Annual Conference on Neural Information Processing Systems (NeurIPS’21), Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 14900–14912. Retrieved from https://proceedings.neurips.cc/paper/2021/hash/7d38b1e9bd793d3f45e0e212a729a93c-Abstract.html
[50]
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In 9th International Conference on Learning Representations (ICLR’21). OpenReview.net. Retrieved from https://openreview.net/forum?id=9l0K4OM-oXE
[51]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2244–2258.
[52]
Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Composite backdoor attack for deep neural network by mixing existing benign features. In ACM SIGSAC Conference on Computer and Communications Security. 113–131.
[53]
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2017. Trojaning attack on neural networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society. https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf
[54]
Zhenguang Liu, Peng Qian, Xiaoyang Wang, Yuan Zhuang, Lin Qiu, and Xun Wang. 2023. Combining graph neural networks with expert knowledge for smart contract vulnerability detection. IEEE Transactions on Knowledge and Data Engineering 35, 2 (2023), 1296–1310.
[55]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[56]
Bodo Möller, Thai Duong, and Krzysztof Kotowicz. 2014. This POODLE bites: Exploiting the SSL 3.0 fallback. Secur. Advis. 21 (2014), 34–58.
[57]
Manishankar Mondal, Md Saidur Rahman, Chanchal K. Roy, and Kevin A. Schneider. 2018. Is cloned code really stable? Empir. Softw. Eng. 23, 2 (2018), 693–770.
[58]
Xudong Pan, Mi Zhang, Beina Sheng, Jiaming Zhu, and Min Yang. 2022. Hidden trigger backdoor attack on NLP models via linguistic style manipulation. In 31st USENIX Security Symposium (USENIX Security’22). 3611–3628.
[59]
Andrea Paudice, Luis Muñoz-González, Andras Gyorgy, and Emil C. Lupu. 2018. Detection of adversarial training examples in poisoning attacks through anomaly detection. CoRR abs/1802.03041 (2018). arXiv:1802.03041 http://arxiv.org/abs/1802.03041
[60]
Andrea Paudice, Luis Muñoz-González, and Emil C. Lupu. 2018. Label sanitization against label flipping poisoning attacks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 5–15.
[61]
L. Prechelt, G. Malpohl, and M. Philippsen. 2000. JPlag: Finding plagiarisms among a set of programs. Technical Report. University of Karlsruhe, Department of Informatics.
[62]
Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2021. ONION: A simple and effective defense against textual backdoor attacks. In Conference on Empirical Methods in Natural Language Processing. 9558–9566.
[63]
Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, and Maosong Sun. 2021. Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 443–453.
[64]
Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, and Maosong Sun. 2021. Turn the combination lock: Learnable textual backdoor attacks via word substitution. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 4873–4883.
[65]
Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral. 2018. Syntax and sensibility: Using language models to detect and correct syntax errors. In IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). IEEE, 311–322.
[66]
Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You autocomplete me: Poisoning vulnerabilities in neural code completion. In 30th USENIX Security Symposium (USENIX Security’21). 1559–1575.
[67]
Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! Targeted clean-label poisoning attacks on neural networks. Adv. Neural Inf. Process. Syst. 31 (2018).
[68]
Jacob Steinhardt, Pang Wei W. Koh, and Percy S. Liang. 2017. Certified defenses for data poisoning attacks. Adv. Neural Inf. Process. Syst. 30 (2017).
[69]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319–3328.
[70]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In International Conference on Advances in Neural Information Processing Systems. 3104–3112.
[71]
Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
[72]
Brandon Tran, Jerry Li, and Aleksander Mądry. 2018. Spectral signatures in backdoor attacks. In 32nd International Conference on Neural Information Processing Systems. 8011–8021.
[73]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2018. An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In 33rd ACM/IEEE International Conference on Automated Software Engineering. 832–837.
[74]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans. Softw. Eng. Methodol. 28, 4 (2019), 1–29.
[75]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans. Softw. Eng. Methodol. 28, 4 (2019), 1–29.
[76]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In International Conference on Advances in Neural Information Processing Systems. 5998–6008.
[77]
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, and Lichao Sun. 2022. You see what I want you to see: Poisoning vulnerabilities in neural code search. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1233–1245.
[78]
Huanting Wang, Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Yansong Feng, Lizhong Bian, and Zheng Wang. 2020. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forens. Secur. 16 (2020), 1943–1958.
[79]
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER’20). IEEE, 261–271.
[80]
Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In ACM/IEEE 42nd International Conference on Software Engineering. 1398–1409.
[81]
Hui-Hui Wei and Ming Li. 2017. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In 26th International Joint Conference on Artificial Intelligence. 3034–3040.
[82]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). IEEE, 87–98.
[83]
Hongwei Xi. 1999. Dead code elimination through dependent types. In International Symposium on Practical Aspects of Declarative Languages. Springer, 228–242.
[84]
Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzmán, Benjamin I. P. Rubinstein, and Trevor Cohn. 2021. A targeted attack on black-box neural machine translation with parallel data poisoning. In Proceedings of the Web Conference 2021. 3638–3650.
[85]
Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, and Xu Sun. 2021. Rethinking stealthiness of backdoor attack against NLP models. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 5543–5557.
[86]
Noam Yefet, Uri Alon, and Eran Yahav. 2020. Adversarial examples for models of code. Proc. ACM Program. Lang. 4, OOPSLA (2020), 1–30.
[87]
Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, and Zhi Jin. 2020. Generating adversarial examples for holding robustness of source code processing models. In AAAI Conference on Artificial Intelligence, Vol. 34. 1169–1176.
[88]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 783–794.
[89]
Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. 2021. AdvDoor: Adversarial backdoor attack of deep learning system. In International Symposium on Software Testing and Analysis (ISSTA’21). ACM, 127–138.
[90]
Haoti Zhong, Cong Liao, Anna Cinzia Squicciarini, Sencun Zhu, and David Miller. 2020. Backdoor embedding in convolutional neural network models via invisible perturbation. In 10th ACM Conference on Data and Application Security and Privacy. 97–108.
[91]
Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by Learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 10197–10207. https://proceedings.neurips.cc/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html
[92]
Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A syntax-guided edit decoder for neural program repair. In 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 341–353.

Cited By

View all
  • (2024)Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of CodeProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664764(59-64)Online publication date: 10-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 3
March 2024
943 pages
EISSN:1557-7392
DOI:10.1145/3613618
  • Editor:
  • Mauro Pezzé
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2024
Online AM: 01 November 2023
Accepted: 09 October 2023
Revised: 01 September 2023
Received: 18 June 2022
Published in TOSEM Volume 33, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Poison attack
  2. poison detection
  3. source code processing
  4. deep learning

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Key Program of Hubei

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)907
  • Downloads (Last 6 weeks)114
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of CodeProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664764(59-64)Online publication date: 10-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media