[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Towards explainable model extraction attacks

Published: 26 September 2022 Publication History

Abstract

One key factor able to boost the applications of artificial intelligence (AI) in security‐sensitive domains is to leverage them responsibly, which is engaged in providing explanations for AI. To date, a plethora of explainable artificial intelligence (XAI) has been proposed to help users interpret model decisions. However, given its data‐driven nature, the explanation itself is potentially susceptible to a high risk of exposing privacy. In this paper, we first show that the existing XAI is vulnerable to model extraction attacks and then present an XAI‐aware dual‐task model extraction attack (DTMEA). DTMEA can attack a target model with explanation services, that is, it can extract both the classification and explanation tasks of the target model. More specifically, the substitution model extracted by DTMEA is a multitask learning architecture, consisting of a sharing layer and two task‐specific layers for classification and explanation. To reveal which explanation technologies are more vulnerable to expose privacy information, we conduct an empirical evaluation of four major explanation types in the benchmark data set. Experimental results show that the attack accuracy of DTMEA outperforms the predicted‐only method with up to 1.25%, 1.53%, 9.25%, and 7.45% in MNIST, Fashion‐MNIST, CIFAR‐10, and CIFAR‐100, respectively. By exposing the potential threats on explanation technologies, our research offers the insights to develop effective tools that are able to trade off security‐sensitive relationships.

References

[1]
Lin G, Wen S, Han QL, Zhang J, Xiang Y. Software vulnerability detection using deep neural networks: a survey. Proc IEEE. 2020;108(10):1825‐1848.
[2]
Zhu T, Zhou W, Ye D, Cheng Z, Li J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. IEEE Internet Things J. 2022;9(2):1414‐1426. doi:10.1109/JIOT.2021.3086910
[3]
Jiang N, Jie W, Li J, Liu X, Jin D. GATrust: a multi‐aspect graph attention network model for trust assessment in OSNs. IEEE Transactions on Knowledge and Data Engineering; Forthcoming 2022.
[4]
Shokri R, Strobel M, Zick Y. On the privacy risks of model explanations. arXiv preprint arXiv:1907.00164; 2019.
[5]
Bodria F, Giannotti F, Guidotti R, Naretto F, Pedreschi D, Rinzivillo S. Benchmarking and survey of explanation methods for black box models. arXiv preprint arXiv:2102.13076; 2021.
[6]
Zhang X, Wang N, Shen H, Ji S, Luo X, Wang T. Interpretable deep learning under fire. In: Capkun S, Roesner F, eds. 29th USENIX Security Symposium, USENIX Security 2020, August 12–14, 2020. USENIX Association. 2020:1659‐1676.
[7]
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. In: Bengio Y, LeCun Y, eds. 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Workshop Track Proceedings; 2014.
[8]
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad‐CAM: visual explanations from deep networks via gradient‐based localization. Int J Comput Vis. 2020;128(2):336‐359. doi:10.1007/s11263-019-01228-7
[9]
Dabkowski P, Gal Y. Real time image saliency for black box classifiers. In: Guyon I, Luxburgv U, Bengio S, et al, eds. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. 2017:6967‐6976.
[10]
Fong RC, Vedaldi A. Interpretable explanations of black boxes by meaningful perturbation. IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22‐29, 2017. IEEE Computer Society. 2017:3449‐3457. doi:10.1109/ICCV.2017.371
[11]
Zhou M, Wu J, Liu Y, Liu S, Zhu C. DaST: Data‐free substitute training for adversarial attacks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13‐19, 2020. IEEE. 2020:231‐240. doi:10.1109/CVPR42600.2020.00031
[12]
Yang Z, Zhang J, Chang E, Liang Z. Neural network inversion in adversarial setting via background knowledge alignment. In: Cavallaro L, Kinder J, Wang X, Katz J, eds. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11–15, 2019. ACM. 2019:225‐240. doi:10.1145/3319535.3354261
[13]
Li Z, Zhang Y. Membership leakage in label‐only exposures. In: Kim Y, Kim J, Vigna G, Shi E, eds. CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15–19, 2021. ACM. 2021:880‐895. doi:10.1145/3460120.3484575
[14]
Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans Neural Netwk Learn Syst. 2020;32(11):4793‐4813. doi:10.1109/tnnls.2020.3027314
[15]
Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W. On pixel‐wise explanations for non‐linear classifier decisions by layer‐wise relevance propagation. PLoS One. 2015;10(7):e0130140.
[16]
Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. International Conference on Machine Learning. PMLR; 2017:3145‐3153.
[17]
Lin M, Chen Q, Yan S. Network in network. arXiv preprint arXiv:1312.4400; 2013.
[18]
Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A. Learning deep features for discriminative localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27‐30, 2016. IEEE Computer Society. 2016:2921‐2929. doi:10.1109/CVPR.2016.319
[19]
Ronneberger O, Fischer P, Brox T. U‐net: convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer‐Assisted Intervention. Springer; 2015:234‐241.
[20]
Guo W, Mu D, Xu J, Su P, Wang G, Xing X. LEMNA: explaining deep learning based security applications. In: Lie D, Mannan M, Backes M, Wang X, eds. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15‐19, 2018. ACM. 2018: 364‐379. doi:10.1145/3243734.3243792
[21]
Scherer F. Heinrich von stackelbergs marktform und gleichgewicht. Journal of Economic Studies. 1996;23(5/6):58‐70.
[22]
Milli S, Schmidt L, Dragan AD, Hardt M. Model reconstruction from model explanations. In: Boyd d, Morgenstern JH, eds. Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29‐31, 2019. ACM. 2019:1‐9. doi:10.1145/3287560.3287562
[23]
Zhao X, Zhang W, Xiao X, Lim BY. Exploiting explanations for model inversion attacks. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021:682‐692.
[24]
Jagielski M, Carlini N, Berthelot D, Kurakin A, Papernot N. High accuracy and high fidelity extraction of neural networks. In: Capkun S, Roesner F, eds. 29th USENIX Security Symposium, USENIX Security 2020, August 12‐14, 2020. USENIX Association. 2020:1345‐1362.
[25]
Rolnick D, Kording KP. Reverse‐engineering deep ReLU networks. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13‐18 July 2020, Virtual Event, Proceedings of Machine Learning Research. Vol 119. PMLR; 2020:8178‐8187.
[26]
Breier J, Jap D, Hou X, Bhasin S, Liu Y. SNIFF: reverse engineering of neural networks with fault attacks. IEEE Trans Reliab. 2021:1‐13.
[27]
Orekondy T, Schiele B, Fritz M. Knockoff nets: stealing functionality of black‐box models. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16‐20, 2019. Computer Vision Foundation/IEEE; 2019:4954‐4963. doi:10.1109/CVPR.2019.00509
[28]
Pal S, Gupta Y, Shukla A, Kanade A, Shevade SK, Ganapathy V. ActiveThief: model extraction using active learning and unannotated public data. In: The Thirty‐Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty‐Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7‐12, 2020. AAAI Press; 2020:865‐872.
[29]
Papernot N, McDaniel P, Goodfellow I. Transferability in machine learning: from phenomena to black‐box attacks using adversarial samples. arXiv preprint arXiv:1605.07277; 2016.
[30]
Juuti M, Szyller S, Marchal S, Asokan N. PRADA: protecting against DNN model stealing attacks. IEEE European Symposium on Security and Privacy, EuroS&P 2019, Stockholm, Sweden, June 17‐19, 2019. IEEE; 2019:512‐527. doi:10.1109/EuroSP.2019.00044
[31]
Duddu V, Samanta D, Rao DV, Balas VE. Stealing neural networks via timing side channels. arXiv preprint arXiv:1812.11720. 2018.
[32]
Batina L, Bhasin S, Jap D, Picek S. CSI neural network: using side‐channels to recover your artificial neural network information. arXiv preprint arXiv:1810.09076. 2018.
[33]
Crawshaw M. Multi‐task learning with deep neural networks: a survey. arXiv preprint arXiv:2009.09796. 2020.
[34]
Caruana R. Multitask learning. Mach Learn. 1997;28(1):41‐75.
[35]
Tang H, Liu J, Zhao M, Gong X. Progressive layered extraction (PLE): a novel multi‐task learning (MTL) model for personalized recommendations. In: Santos RLT, Marinho LB, Daly EM, et al, eds. RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22–26, 2020. ACM; 2020:269‐278. doi:10.1145/3383313.3412236
[36]
Misra I, Shrivastava A, Gupta A, Hebert M. Cross‐stitch networks for multi‐task learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016:3994‐4003.
[37]
Ruder S, Bingel J, Augenstein I, Søgaard A. Sluice networks: learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142. 2017.
[38]
Zhang Z, Luo P, Loy CC, Tang X. Facial landmark detection by deep multi‐task learning. European Conference on Computer Vision. Springer; 2014:94‐108.
[39]
Dai J, He K, Sun J. Instance‐aware semantic segmentation via multi‐task network cascades. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016:3150‐3158.
[40]
Zhao X, Li H, Shen X, Liang X, Wu Y. A modulation module for multi‐task learning with applications in image retrieval. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part I. Lecture Notes in Computer Science. Springer. Vol 11205; 2018:415‐432. doi:10.1007/978-3-030-01246-5_25
[41]
Liu S, Johns E, Davison AJ. End‐to‐end multi‐task learning with attention. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019. Computer Vision Foundation/IEEE; 2019:1871‐1880.
[42]
Baxter J. A model of inductive bias learning. J Artif Intell Res. 2000;12:149‐198.
[43]
LeCun Y. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/; 1998.
[44]
Xiao H, Rasul K, Vollgraf R. Fashion‐MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. 2017.
[45]
Krizhevsky A, Hinton G, et al. Learning multiple layers of features from tiny images; 2009.
[46]
Simonyan K, Zisserman A. Very deep convolutional networks for large‐scale image recognition. In: Bengio Y, LeCun Y, eds. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings; 2015.
[47]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016:770‐778.
[48]
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017:4700‐4708.

Cited By

View all
  • (2024)A Survey on Privacy of Personal and Non-Personal Data in B5G/6G NetworksACM Computing Surveys10.1145/366217956:10(1-37)Online publication date: 24-Jun-2024
  • (2024)Combinations of AI Models and XAI Metrics Vulnerable to Record Reconstruction RiskPrivacy in Statistical Databases10.1007/978-3-031-69651-0_22(329-343)Online publication date: 25-Sep-2024
  • (2023)REVEL Framework to Measure Local Linear Explanations for Black-Box ModelsInternational Journal of Intelligent Systems10.1155/2023/80685692023Online publication date: 1-Jan-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Systems
International Journal of Intelligent Systems  Volume 37, Issue 11
November 2022
1841 pages
ISSN:0884-8173
DOI:10.1002/int.v37.11
Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 26 September 2022

Author Tags

  1. black box
  2. explainable artificial intelligence
  3. label‐only
  4. model extraction attack

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on Privacy of Personal and Non-Personal Data in B5G/6G NetworksACM Computing Surveys10.1145/366217956:10(1-37)Online publication date: 24-Jun-2024
  • (2024)Combinations of AI Models and XAI Metrics Vulnerable to Record Reconstruction RiskPrivacy in Statistical Databases10.1007/978-3-031-69651-0_22(329-343)Online publication date: 25-Sep-2024
  • (2023)REVEL Framework to Measure Local Linear Explanations for Black-Box ModelsInternational Journal of Intelligent Systems10.1155/2023/80685692023Online publication date: 1-Jan-2023
  • (2023)I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys10.1145/359529255:14s(1-41)Online publication date: 29-Apr-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media