[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3689932.3694765acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

Published: 22 November 2024 Publication History

Abstract

Deep learning models are continually improving in accuracy, but they remain vulnerable to adversarial attacks, often resulting in the misclassification of adversarial examples. Adversarial training can mitigate this problem by enhancing the model's robust accuracy on adversarial examples. However, such training typically compromises the model's standard accuracy on clean samples. The necessity for deep learning models to balance both robustness and accuracy for security is evident, but achieving this balance remains challenging, and the underlying reasons are yet to be clarified. This paper proposes an innovative pre-training method called Adversarial Feature Alignment (AFA) to address these problems. Our approach involves identifying the trade-off in the model's feature space and fine-tuning it to achieve accuracy on both standard and adversarial examples concurrently. Our research unveils an intriguing insight: misalignment within the feature space often leads to misclassification, regardless of whether the samples are benign or adversarial. AFA mitigates this risk by employing a novel optimization algorithm based on contrastive learning to alleviate potential feature misalignment. Through our evaluations, we demonstrate the superior performance of AFA. Our method delivers state-of-the-art robust accuracy while minimizing the drop in clean accuracy to 1.86% and 8.91% on CIFAR10 and CIFAR100, respectively, compared to cross-entropy. We also show that joint optimization of AFA and TRADES, accompanied by data augmentation using a recent diffusion model, achieves state-of-the-art accuracy and robustness. Through AFA, we expect to enhance security while preserving accuracy in deep learning models through adversarial training.

References

[1]
[n. d.]. Adversarial Robustness Toolbox. https://github.com/Trusted-AI/ adversarial-robustness-toolbox Last accessed 15 April 2024.
[2]
Sravanti Addepalli, Samyak Jain, et al. 2022. Efficient and effective augmentation strategy for adversarial training. Advances in Neural Information Processing Systems, Vol. 35 (2022).
[3]
Tao Bai, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021. Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356 (2021).
[4]
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In Proceedings of the International Conference on Learning Representations.
[5]
Anh Bui, Trung Le, He Zhao, Paul Montague, Seyit Camtepe, and Dinh Phung. 2021. Understanding and achieving efficient robustness with adversarial supervised contrastive learning. arXiv preprint arXiv:2101.10027 (2021).
[6]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE.
[7]
Jianbo Chen, Michael I Jordan, and Martin J Wainwright. 2020. Hopskipjumpattack: A query-efficient decision-based attack. In Proceedings of the IEEE Symposium on Security and Privacy.
[8]
Lin Chen, Yifei Min, Mingrui Zhang, and Amin Karbasi. 2020. More data can expand the generalization gap between adversarially robust and standard models. In Proceedings of the International Conference on Machine Learning. PMLR.
[9]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR.
[10]
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. 2019. Certified adversarial robustness via randomized smoothing. In Proceedings of the International Conference on Machine Learning.
[11]
Francesco Croce and Matthias Hein. 2020. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on Machine Learning. PMLR.
[12]
Jiequan Cui, Zhisheng Zhong, Shu Liu, Bei Yu, and Jiaya Jia. 2021. Parametric contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
[13]
Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, and Chuang Gan. 2021. When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? Advances in Neural Information Processing Systems, Vol. 34 (2021).
[14]
Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. 2018. Adversarial spheres. arXiv preprint arXiv:1801.02774 (2018).
[15]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[16]
Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. 2021. Improving robustness using generated data. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[17]
Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. 2021. Dissecting supervised constrastive learning. In Proceedings of the International Conference on Machine Learning. PMLR.
[18]
Olivier Henaff. 2020. Data-efficient image recognition with contrastive predictive coding. In Proceedings of the International Conference on Machine Learning. PMLR.
[19]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).
[20]
Chih-Hui Ho and Nuno Nvasconcelos. 2020. Contrastive learning with adversarial examples. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[21]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[22]
Ziyu Jiang, Tianlong Chen, Ting Chen, and Zhangyang Wang. 2020. Robust pre-training by adversarial contrastive learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[23]
Gaojie Jin, Xinping Yi, Wei Huang, Sven Schewe, and Xiaowei Huang. 2022. Enhancing adversarial training with second-order statistics of weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, Vol. 35 (2022).
[25]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[26]
Minseon Kim, Jihoon Tack, and Sung Ju Hwang. 2020. Adversarial self-supervised contrastive learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[27]
Sungyoon Lee, Jaewook Lee, and Saerom Park. 2020. Lipschitz-certifiable training with a tight outer bound. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[28]
Klas Leino, Zifan Wang, and Matt Fredrikson. 2021. Globally-robust neural networks. In Proceedings of the International Conference on Machine Learning. PMLR.
[29]
Matan Levi and Aryeh Kontorovich. 2024. Splitting the Difference on Adversarial Training. In Proceedings of the USENIX Security Symposium.
[30]
Alexander J Levine and Soheil Feizi. 2021. Improved, deterministic smoothing for l_1 certified robustness. In Proceedings of the International Conference on Machine Learning.
[31]
Jie Li, Tianqing Zhu, Wei Ren, and Kim-Kwang Raymond. 2023. Improve Individual Fairness in Federated Learning via Adversarial training. Computers & Security (2023).
[32]
Lin Li and Michael W. Spratling. 2023. Data augmentation alone can improve adversarial training. In Proceedings of the International Conference on Learning Representations.
[33]
Linyi Li, Tao Xie, and Bo Li. 2023. Sok: Certified robustness for deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE.
[34]
Keane Lucas, Samruddhi Pai, Weiran Lin, Lujo Bauer, Michael K Reiter, and Mahmood Sharif. 2023. Adversarial Training for Raw-Binary Malware Classifiers. In Proceedings of the USENIX Security Symposium.
[35]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
[36]
Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. 2023. Certified Training: Small Boxes are All You Need. In Proceedings of the International Conference on Learning Representations.
[37]
Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. 2021. Helper-based adversarial training: Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. In Proceedings of the ICML Workshop on Adversarial Machine Learning.
[38]
Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. 2022. Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. In Proceedings of the International Conference on Learning Representations.
[39]
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. 2020. Understanding and mitigating the tradeoff between robustness and accuracy. arXiv preprint arXiv:2002.10716 (2020).
[40]
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. 2019. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032 (2019).
[41]
Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. 2021. Data augmentation can improve robustness. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[42]
Amirhossein Reisizadeh, Farzan Farnia, Ramtin Pedarsani, and Ali Jadbabaie. 2020. Robust federated learning: The case of affine distribution shifts. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[43]
Leslie Rice, Eric Wong, and Zico Kolter. 2020. Overfitting in adversarially robust deep learning. In Proceedings of the International Conference on Machine Learning. PMLR.
[44]
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. 2019. Adversarial training for free! Advances in Neural Information Processing Systems, Vol. 32 (2019).
[45]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[46]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In Proceedings of the European Conference on Computer Vision. Springer.
[47]
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2018. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018).
[48]
Ulrike von Luxburg and Olivier Bousquet. 2004. Distance-Based Classification with Lipschitz Functions. J. Mach. Learn. Res., Vol. 5, Jun (2004).
[49]
Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. 2018. Analyzing the robustness of nearest neighbors to adversarial examples. In Proceedings of the International Conference on Machine Learning. PMLR.
[50]
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. 2020. Improving adversarial robustness requires revisiting misclassified examples. In Proceedings of the International Conference on Learning Representations.
[51]
Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan. 2023. Better diffusion models further improve adversarial training. In Proceedings of the International Conference on Machine Learning. PMLR.
[52]
Eric Wong, Leslie Rice, and J Zico Kolter. 2020. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994 (2020).
[53]
Boxi Wu, Jinghui Chen, Deng Cai, Xiaofei He, and Quanquan Gu. 2021. Do Wider Neural Networks Really Help Adversarial Robustness? Advances in Neural Information Processing Systems, Vol. 34 (2021).
[54]
Dongxian Wu, Shu-Tao Xia, and Yisen Wang. 2020. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[55]
Huan Xu, Constantine Caramanis, and Shie Mannor. 2009. Robustness and Regularization of Support Vector Machines. Journal of machine learning research, Vol. 10, 7 (2009).
[56]
Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Russ R Salakhutdinov, and Kamalika Chaudhuri. 2020. A closer look at accuracy vs. robustness. Proceedings of the Advances in neural information processing systems, Vol. 33 (2020).
[57]
Qiying Yu, Jieming Lou, Xianyuan Zhan, Qizhang Li, Wangmeng Zuo, Yang Liu, and Jingjing Liu. 2022. Adversarial Contrastive Learning via Asymmetric InfoNCE. In Proceedings of the European Conference on Computer Vision. Springer.
[58]
Yuanyuan Yuan, Shuai Wang, and Zhendong Su. 2023. Precise and Generalized Robustness Certification for Neural Networks. In Proceedings of the USENIX Security Symposium.
[59]
Bohang Zhang, Du Jiang, Di He, and Liwei Wang. 2022. Boosting the Certified Robustness of L-infinity Distance Nets. In Proceedings of the International Conference on Learning Representations.
[60]
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. 2019. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the International Conference on Machine Learning. PMLR.
[61]
Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, and Bo Li. 2023. DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing. In Proceedings of the USENIX Security Symposium.
[62]
Jie Zhang, Bo Li, Chen Chen, Lingjuan Lyu, Shuang Wu, Shouhong Ding, and Chao Wu. 2023. Delving into the adversarial robustness of federated learning. arXiv preprint arXiv:2302.09479 (2023).
[63]
Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. 2021. Geometry-aware Instance-reweighted Adversarial Training. In Proceedings of the International Conference on Learning Representations.
[64]
Yifan Zhang, Bryan Hooi, Dapeng Hu, Jian Liang, and Jiashi Feng. 2021. Unleashing the power of contrastive self-supervised visual models via contrast-regularized fine-tuning. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[65]
Giulio Zizzo, Ambrish Rawat, Mathieu Sinn, and Beat Buesser. 2020. Fat: Federated adversarial training. arXiv preprint arXiv:2012.01791 (2020).

Index Terms

  1. Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AISec '24: Proceedings of the 2024 Workshop on Artificial Intelligence and Security
    November 2024
    225 pages
    ISBN:9798400712289
    DOI:10.1145/3689932
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 November 2024

    Check for updates

    Author Tags

    1. adversarial attack
    2. adversarial robustness
    3. adversarial training
    4. deep learning
    5. robustness-accuracy tradeoff

    Qualifiers

    • Research-article

    Funding Sources

    • Institute of Information commu- nications Technology Planning Evaluation (IITP)

    Conference

    CCS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 94 of 231 submissions, 41%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 72
      Total Downloads
    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)72
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media