TaiChi: Improving the Robustness of NLP Models by Seeking Common Ground While Reserving Differences

Huimin Chen, Chengyu Wang, Yanhao Wang, Cen Chen, Yinggui Wang

Abstract

Recent studies have shown that Pre-trained Language Models (PLMs) are vulnerable to adversarial examples, crafted by introducing human-imperceptible perturbations to clean examples to deceive the models. This vulnerability stems from the divergence in the data distributions of clean and adversarial examples. Therefore, addressing this issue involves teaching the model to diminish the differences between the two types of samples and to focus more on their similarities. To this end, we propose a novel approach named TaiChi that employs a Siamese network architecture. Specifically, it consists of two sub-networks sharing the same structure but trained on clean and adversarial samples, respectively, and uses a contrastive learning strategy to encourage the generation of similar language representations for both kinds of samples. Furthermore, it utilizes the Kullback-Leibler (KL) divergence loss to enhance the consistency in the predictive behavior of the two sub-networks. Extensive experiments across three widely used datasets demonstrate that TaiChi achieves superior trade-offs between robustness to adversarial attacks at token and character levels and accuracy on clean examples compared to previous defense methods. Our code and data are publicly available at https://github.com/sai4july/TaiChi.

Anthology ID:: 2024.lrec-main.1351
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 15542–15551
Language:
URL:: https://aclanthology.org/2024.lrec-main.1351
DOI:
Bibkey:
Cite (ACL):: Huimin Chen, Chengyu Wang, Yanhao Wang, Cen Chen, and Yinggui Wang. 2024. TaiChi: Improving the Robustness of NLP Models by Seeking Common Ground While Reserving Differences. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15542–15551, Torino, Italia. ELRA and ICCL.
Cite (Informal):: TaiChi: Improving the Robustness of NLP Models by Seeking Common Ground While Reserving Differences (Chen et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1351.pdf

PDF Cite Search