More Web Proxy on the site http://driver.im/

research-article

Hidden Backdoors in Human-Centric Language Models

Authors:

Benjamin Zi Hao Zhao,

Jialiang LuAuthors Info & Claims

CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Pages 3123 - 3140

https://doi.org/10.1145/3460120.3484576

Published: 13 November 2021 Publication History

Abstract

Natural language processing (NLP) systems have been proven to be vulnerable to backdoor attacks, whereby hidden features (backdoors) are trained into a language model and may only be activated by specific inputs (called triggers), to trick the model into producing unexpected behaviors. In this paper, we create covert and natural triggers for textual backdoor attacks, hidden backdoors, where triggers can fool both modern language models and human inspection. We deploy our hidden backdoors through two state-of-the-art trigger embedding methods. The first approach via homograph replacement, embeds the trigger into deep neural networks through the visual spoofing of lookalike characters replacement. The second approach uses subtle differences between text generated by language models and real natural text to produce trigger sentences with correct grammar and high fluency. We demonstrate that the proposed hidden backdoors can be effective across three downstream security-critical NLP tasks, representative of modern human-centric NLP systems, including toxic comment detection, neural machine translation (NMT), and question answering (QA). Our two hidden backdoor attacks can achieve an Attack Success Rate (ASR) of at least 97% with an injection rate of only 3% in toxic comment detection, 95.1% ASR in NMT with less than 0.5% injected data, and finally 91.12% ASR against QA updated with only 27 poisoning data samples on a model previously trained with 92,024 samples (0.029%). We are able to demonstrate the adversary's high success rate of attacks, while maintaining functionality for regular users, with triggers inconspicuous by the human administrators.

Supplementary Material

MP4 File (CCS21-fp280.mp4)

Natural language processing (NLP) systems have been proven to be vulnerable to backdoor attacks, whereby hidden features (backdoors) are trained into a language model and may only be activated by specific inputs (called triggers), to trick the model into producing unexpected behaviors. In this paper, we create covert and natural triggers that can fool both modern NLP systems and human inspection. We deploy our hidden backdoors through two state-of-the-art trigger embedding methods. The first approach via homograph replacement, embeds the trigger into NLP systems through the visual spoofing of lookalike character replacement. The second uses subtle differences between text generated by language models and real natural text to produce trigger sentences with correct grammar and high fluency. We demonstrate that the proposed hidden backdoors can be effective across three downstream security-critical NLP tasks, including toxic comment detection, neural machine translation (NMT), and question answering (QA).

Download
93.27 MB

References

[1]

Eugene Bagdasaryan and Vitaly Shmatikov. 2021. Blind Backdoors in Deep Learning Models. In Proc. of USENIX Security.

[2]

Santiago Zanella Béguelin, Lukas Wutschitz, and Shruti Tople et al. 2020. Analyzing Information Leakage of Updates to Natural Language Models. In Proc. of CCS.

[3]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3, Feb (2003), 1137--1155.

[4]

Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2021. Data Poisoning Attacks to Local Differential Privacy Protocols. In Proc. of USENIX Security.

[5]

Nicholas Carlini, Florian Tramer, and EricWallace et al. 2020. Extracting Training Data from Large Language Models. arXiv preprint: 2012.07805 (2020).

[6]

Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. BadNL: Backdoor Attacks Against NLP Models. arXiv preprint: 2006.01043 (2020).

[7]

Siyuan Cheng, Yingqi Liu, Shiqing Ma, and Xiangyu Zhang. 2021. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification. In Proc. of AAAI.

[8]

Unicode Consortium. 2020. Confusables. [EB/OL]. https://www.unicode.org/ Public/security/13.0.0/ Accessed April. 20, 2021.

[9]

Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A Backdoor Attack Against LSTM-Based Text Classification Systems. IEEE Access 7 (2019), 138872--138878.

[10]

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. In Proc. of ICLR.

[11]

Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. 2019. Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks. In Proc. of USENIX Security.

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL-HLT.

[13]

Facebook. 2020. Community Standards Enforcement Report. https://transparency. facebook.com/community-standards-enforcement Accessed 2020.

[14]

Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. 2019. STRIP: A Defence against Trojan Attacks on Deep Neural Networks. In Proc. of ACSAC.

Digital Library

[15]

FairSeq Github. 2020. Preparation of WMT 2014 English-to-French Translation Dataset. https://github.com/pytorch/fairseq/blob/master/examples/translation/ prepare-wmt14en2fr.sh Accessed June 24, 2020.

[16]

Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, GangWang, and Xinyu Xing. 2018. LEMNA: Explaining Deep Learning based Security Applications. In Proc. of CCS.

Digital Library

[17]

Wenbo Guo, Lun Wang, Xinyu Xing, Min Du, and Dawn Song. 2020. Tabor: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems. In Proc. of IEEE ICDM.

[18]

D. Hicks and D. Gasca. 2020. A healthier Twitter: Progress and more to do. https: //blog.twitter.com/enus/topics/company/2019/health-update.html Accessed 2019.

[19]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[20]

Tobias Holgers, David E Watson, and Steven D Gribble. 2006. Cutting through the Confusion: A Measurement Study of Homograph Attacks. In USENIX Annual Technical Conference, General Track. 261--266.

Digital Library

[21]

Hai Huang, Jiaming Mu, Neil Zhenqiang Gong, Qi Li, Bin Liu, and Mingwei Xu. 2021. Data Poisoning Attacks to Deep Learning Based Recommender Systems. In Proc. of NDSS.

[22]

HuggingFace. 2020. BERT Transformer Model Documentation. https: //huggingface.co/transformers/model_doc/bert.html Accessed June 24, 2020.

[23]

HuggingFace. 2020. HuggingFace Tokenizer Documentation. https://huggingface. co/transformers/main_classes/tokenizer.html Accessed June 24, 2020.

[24]

Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li. 2018. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. In Proc. of IEEE S&P.

[25]

Jinyuan Jia, Xiaoyu Cao, and Neil Zhenqiang Gong. 2021. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks. In Proc. of AAAI.

[26]

Dan Jurafsky. 2000. Speech & language processing. Pearson Education India.

[27]

Kaggle. 2020. Toxic Comment Classification Challenge. https://www.kaggle. com/c/jigsaw-toxic-comment-classification-challenge/ Accessed June 24, 2020.

[28]

Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes. In Proc. of WWW.

Digital Library

[29]

Yu-Hsuan Kuo, Zhenhui Li, and Daniel Kifer. [n.d.]. Detecting Outliers in Data with Correlated Measures. In Proc. of CIKM.

[30]

Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. In Proc. of ACL.

[31]

Thai Le, Noseong Park, and Dongwon Lee. 2020. Detecting Universal Trigger's Adversarial Attack with Honeypot. arXiv preprint: 2011.10492 (2020).

[32]

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proc. of NDSS.

[33]

Shaofeng Li, Shiqing Ma, Minhui Xue, and Benjamin Zi Hao Zhao. 2020. Deep Learning Backdoors. arXiv preprint: 2007.08273 (2020).

[34]

Shaofeng Li, Minhui Xue, Benjamin Zi Hao Zhao, Haojin Zhu, and Xinpeng Zhang. 2020. Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization. IEEE Transactions on Dependable and Secure Computing (2020), 1--1.

[35]

Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features. In Proc. of CCS.

Digital Library

[36]

Yingqi Liu, Shiqing Ma, Yousra Aafer,Wen-Chuan Lee, Juan Zhai,WeihangWang, and Xiangyu Zhang. 2017. Trojaning Attack on Neural Networks. In Proc. of NDSS.

[37]

Christopher D. Manning and Hinrich Schütze. 2001. Foundations of Statistical Natural Language Processing. MIT Press.

Digital Library

[38]

Yuantian Miao, Minhui Xue, Chao Chen, Lei Pan, Jun Zhang, Benjamin Zi Hao Zhao, Dali Kaafar, and Yang Xiang. 2021. The Audio Auditor: User-Level Membership Inference in Internet of Things Voice Services. Proc. Priv. Enhancing Technol. 2021, 1 (2021), 209--228. https://doi.org/10.2478/popets-2021-0012

[39]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal Adversarial Perturbations. In Proc. of IEEE CVPR.

[40]

Anh Nguyen and Anh Tran. 2021. WaNet - Imperceptible Warping-based Backdoor Attack. arXiv preprint: 2102.10369 (2021).

[41]

Rajvardhan Oak. 2019. Poster: Adversarial Examples for Hate Speech Classifiers. In Proc. of CCS.

Digital Library

[42]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proc. of NAACL-HLT 2019: Demonstrations.

[43]

Ren Pang, Zheng Zhang, Xiangshan Gao, Zhaohan Xi, Shouling Ji, Peng Cheng, and TingWang. 2020. TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask). arXiv preprint: 2012.09302 (2020).

[44]

Nicolas Papernot, Patrick D. McDaniel, Arunesh Sinha, and Michael P. Wellman. 2018. SoK: Security and Privacy in Machine Learning. In Proc. of IEEE EuroS&P.

[45]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proc. of ACL.

[46]

Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In Proc. of the Third Conference on Machine Translation: Research Papers.

[47]

Ximing Qiao, Yukun Yang, and Hai Li. 2019. Defending Neural Backdoors via Generative Distribution Modeling. In Proc. of NeurIPS.

[48]

Erwin Quiring, David Klein, Daniel Arp, Martin Johns, and Konrad Rieck. 2020. Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning. In Proc. of USENIX Security.

[49]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. OpenAI blog 1, 8 (2019), 9.

[50]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. In Proc. of ACL.

[51]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In Proc. of EMNLP.

[52]

Adnan Siraj Rakin, Zhezhi He, and Deliang Fan. 2020. TBT: Targeted Neural Network Attack with Bit Trojan. In Proc. of IEEE/CVF CVPR.

[53]

Elissa M Redmiles, Ziyun Zhu, Sean Kross, Dhruv Kuchhal, Tudor Dumitras, and Michelle L Mazurek. 2018. Asking for a Friend: Evaluating Response Biases in Security User Studies. In Proc. of CCS.

Digital Library

[54]

Ahmed Salem, Michael Backes, and Yang Zhang. 2020. Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks. arXiv preprint: 2010.03282 (2020).

[55]

Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang. 2020. Dynamic Backdoor Attacks Against Machine Learning Models. arXiv preprint: 2003.03675 (2020).

[56]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proc. of ACL.

[57]

Shawn Shan, Emily Wenger, Bolun Wang, Bo Li, Haitao Zheng, and Ben Y. Zhao. 2020. Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks. In Proc. of CCS.

[58]

Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2019. A General Framework for Adversarial Examples with Objectives. ACM Trans. Priv. Secur. 22, 3 (2019), 16:1--16:30.

Digital Library

[59]

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin T. Vechev. 2018. Fast and Effective Robustness Certification. In Proc. of NeurIPS.

Digital Library

[60]

Congzheng Song, Alexander M. Rush, and Vitaly Shmatikov. 2020. Adversarial Semantic Collisions. In Proc. of EMNLP.

[61]

Te Juin Lester Tan and Reza Shokri. 2020. Bypassing Backdoor Detection Algorithms in Deep Learning. In Proc. of IEEE EuroS&P.

[62]

Di Tang, XiaoFeng Wang, Haixu Tang, and Kehuan Zhang. [n.d.]. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection. In Proc. of USENIX Security.

[63]

Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, and Jimmy Lin. 2020. Rapidly Bootstrapping a Question Answering Dataset for COVID-19. arXiv preprint: 2004.11339 (2020).

[64]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. of NeurIPS.

[65]

Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal Adversarial Triggers for Attacking and Analyzing NLP. In Proc. of EMNLP-IJCNLP.

[66]

Eric Wallace, Mitchell Stern, and Dawn Song. 2020. Imitation Attacks and Defenses for Black-box Machine Translation Systems. In Proc. of EMNLP.

[67]

Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, and Jingjing Liu. 2021. Infobert: Improving robustness of language models from an information theoretic perspective. In Proc. of ICLR.

[68]

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proc. IEEE S&P.

[69]

Jialin Wen, Benjamin Zi Hao Zhao, Minhui Xue, Alina Oprea, and Haifeng Qian. 2021. With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models. IEEE Trans. Inf. Forensics Secur. 16 (2021), 3709--3723. https://doi.org/10.1109/TIFS.2021.3087332

Digital Library

[70]

J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant. 2018. Detecting Homoglyph Attacks with a Siamese Neural Network. In Proc. of IEEE Security and Privacy Workshops (SPW).

[71]

Shujiang Wu, Song Li, Yinzhi Cao, and Ningfei Wang. 2019. Rendered Private: Making GLSL Execution Uniform to Prevent WebGL-based Browser Fingerprinting. In Proc. of USENIX Security.

[72]

Zhaohan Xi, Ren Pang, Shouling Ji, and Ting Wang. 2021. Graph Backdoor. In Proc. of USENIX Security.

[73]

Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin IP Rubinstein, and Trevor Cohn. 2021. Targeted Poisoning Attacks on Black-Box Neural Machine Translation. In Proc. of WWW.

Digital Library

[74]

Xiaojun Xu, QiWang, Huichen Li, Nikita Borisov, Carl A. Gunter, and Bo Li. 2020. Detecting AI Trojans Using Meta Neural Analysis. In Proc. of IEEE S&P.

[75]

Xinyang Zhang, Zheng Zhang, and TingWang. 2021. Trojaning Language Models for Fun and Profit. In Proc. of IEEE EuroS&P.

[76]

Zaixi Zhang, Jinyuan Jia, Binghui Wang, and Neil Zhenqiang Gong. 2020. Backdoor Attacks to Graph Neural Networks. arXiv preprint: 2006.11165 (2020).

Cited By

Li LHu WLuo M(2024)Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate TranslationEntropy10.3390/e2612108126:12(1081)Online publication date: 11-Dec-2024
https://doi.org/10.3390/e26121081
Hamid RBrohi S(2024)A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and FrameworkBig Data and Cognitive Computing10.3390/bdcc81101618:11(161)Online publication date: 18-Nov-2024
https://doi.org/10.3390/bdcc8110161
Zhang SPan YLiu QYan ZChoo KWang G(2024)Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive ReviewACM Computing Surveys10.1145/370472557:4(1-35)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3704725
Show More Cited By

Index Terms

Hidden Backdoors in Human-Centric Language Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Security and privacy

Recommendations

The triggers that open the NLP model backdoors are hidden in the adversarial samples
Abstract
Deep neural networks (DNNS) have been proven to be vulnerable to adversarial attacks. But the adversarial perturbations are generated for specific input samples, and the perturbations of one sample cannot be applied to other samples. ...
BDDR: An Effective Defense Against Textual Backdoor Attacks
Abstract
Deep neural networks (DNNs) have been recently shown to be vulnerable to backdoor attacks. The infected model performs well on benign testing samples, however, the attacker can trigger the infected model to misbehave by the backdoor. ...
Embedding Backdoors as the Facial Features: Invisible Backdoor Attacks Against Face Recognition Systems
ACM TURC '20: Proceedings of the ACM Turing Celebration Conference - China

Deep neural network (DNN) based face recognition systems have been widely applied in various identity authentication scenarios. However, recent studies show that the DNN models are vulnerable to backdoor attacks. An attacker can embed backdoors into the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

November 2021

3558 pages

ISBN:9781450384544

DOI:10.1145/3460120

General Chairs:
Yongdae Kim
KAIST, Republic of Korea
,
Jong Kim
POSTECH, Republic of Korea
,
Program Chairs:
Giovanni Vigna
University of California, Santa Barbara / VMware, USA
,
Elaine Shi
Carnegie Mellon University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CCS '21

Sponsor:

SIGSAC

CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security

November 15 - 19, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
1,209
Total Downloads

Downloads (Last 12 months)307
Downloads (Last 6 weeks)33

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li LHu WLuo M(2024)Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate TranslationEntropy10.3390/e2612108126:12(1081)Online publication date: 11-Dec-2024
https://doi.org/10.3390/e26121081
Hamid RBrohi S(2024)A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and FrameworkBig Data and Cognitive Computing10.3390/bdcc81101618:11(161)Online publication date: 18-Nov-2024
https://doi.org/10.3390/bdcc8110161
Zhang SPan YLiu QYan ZChoo KWang G(2024)Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive ReviewACM Computing Surveys10.1145/370472557:4(1-35)Online publication date: 10-Dec-2024
https://dl.acm.org/doi/10.1145/3704725
Liu ZJiang YShen JPeng MLam KYuan XLiu X(2024)A Survey on Federated Unlearning: Challenges, Methods, and Future DirectionsACM Computing Surveys10.1145/367901457:1(1-38)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3679014
He XHao FGu TChang L(2024)CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language ModelsACM Transactions on Privacy and Security10.1145/367800727:3(1-26)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3678007
Li ♂ JLi ZZhang HLi GJin ZHu XXia X(2024)Poison Attack and Poison Detection on Deep Source Code Processing ModelsACM Transactions on Software Engineering and Methodology10.1145/363000833:3(1-31)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3630008
Yang ZXu BZhang JKang HShi JHe JLo D(2024)Stealthy Backdoor Attack for Code ModelsIEEE Transactions on Software Engineering10.1109/TSE.2024.336166150:4(721-741)Online publication date: Apr-2024
https://doi.org/10.1109/TSE.2024.3361661
Li XLu XLi P(2024)Leverage NLP Models Against Other NLP Models: Two Invisible Feature Space Backdoor AttacksIEEE Transactions on Reliability10.1109/TR.2024.337552673:3(1559-1568)Online publication date: Sep-2024
https://doi.org/10.1109/TR.2024.3375526
Wu ZLi HQian YHua YGan H(2024)Poison-Resilient Anomaly Detection: Mitigating Poisoning Attacks in Semi-Supervised Encrypted Traffic Anomaly DetectionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.339771911:5(4744-4757)Online publication date: Sep-2024
https://doi.org/10.1109/TNSE.2024.3397719
Wei JFan MJiao WJin WLiu T(2024)BDMMT: Backdoor Sample Detection for Language Models Through Model Mutation TestingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.337696819(4285-4300)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3376968
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents