More Web Proxy on the site http://driver.im/

research-article

DualCF: Efficient Model Extraction Attack from Counterfactual Explanations

Authors:

Chunyan MiaoAuthors Info & Claims

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 1318 - 1329

https://doi.org/10.1145/3531146.3533188

Published: 20 June 2022 Publication History

Abstract

Cloud service providers have launched Machine-Learning-as-a-Service (MLaaS) platforms to allow users to access large-scale cloud-based models via APIs. In addition to prediction outputs, these APIs can also provide other information in a more human-understandable way, such as counterfactual explanations (CF). However, such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks which aim to steal the internal functionality of models in the cloud. Due to the black-box nature of cloud models, however, a vast number of queries are inevitably required by existing attack strategies before the substitute model achieves high fidelity. In this paper, we propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model. This is motivated by our observation that current querying strategies suffer from decision boundary shift issue induced by taking far-distant queries and close-to-boundary CFs into substitute model training. We then propose DualCF strategy to circumvent the above issues, which is achieved by taking not only CF but also counterfactual explanation of CF (CCF) as pairs of training samples for the substitute model. Extensive and comprehensive experimental evaluations are conducted on both synthetic and real-world datasets. The experimental results favorably illustrate that DualCF can produce a high-fidelity model with fewer queries efficiently and effectively.

References

[1]

[n. d.]. Give Me Some Credit Dataset. https://www.kaggle.com/c/GiveMeSomeCredit/overview

[2]

IBM AI Explainability 360. [n. d.]. Credit Approval. https://nbviewer.org/github/IBM/AIX360/blob/master/examples/tutorials/HELOC.ipynb

[3]

Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. 2020. Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884(2020).

[4]

SeldonIO alibi. [n. d.]. Counterfactuals guided by prototypes on Boston housing dataset. https://docs.seldon.io/projects/alibi/en/latest/examples/cfproto_housing.html

[5]

Solon Barocas, Andrew D. Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In FAT* ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, Mireille Hildebrandt, Carlos Castillo, L. Elisa Celis, Salvatore Ruggieri, Linnet Taylor, and Gabriela Zanfir-Fortuna (Eds.). ACM, 80–89. https://doi.org/10.1145/3351095.3372830

Digital Library

[6]

Luyang Chen, Markus Pelger, and Jason Zhu. 2020. Deep learning in asset pricing. Available at SSRN 3350138(2020).

[7]

David Cohn, Les Atlas, and Richard Ladner. 1994. Improving generalization with active learning. Machine learning 15, 2 (1994), 201–221.

[8]

FICO Community. [n. d.]. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge

[9]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.

Digital Library

[10]

Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. 2018. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems. 592–603.

[11]

Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 1322–1333.

Digital Library

[12]

Xueluan Gong, Yanjiao Chen, Wenbin Yang, Guanghao Mei, and Qian Wang. 2021. InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 2439–2447. https://doi.org/10.24963/ijcai.2021/336 Main Track.

[13]

Xueluan Gong, Qian Wang, Yanjiao Chen, Wang Yang, and Xinchang Jiang. 2020. Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models. IEEE Communications Magazine 58, 12 (2020), 83–89.

[14]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572(2014).

[15]

David Harrison Jr and Daniel L Rubinfeld. 1978. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management 5, 1 (1978), 81–102.

[16]

Masoud Hashemi and Ali Fathi. 2020. PermuteAttack: Counterfactual Explanation of Machine Learning Credit Scorecards. arXiv preprint arXiv:2008.10138(2020).

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]

Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High accuracy and high fidelity extraction of neural networks. In 29th {USENIX} Security Symposium ({USENIX} Security 20). 1345–1362.

[19]

Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint arXiv:1907.09615(2019).

[20]

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.

[21]

Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. 2019. PRADA: protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 512–527.

[22]

Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic Recourse: from Counterfactual Explanations to Interventions. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 353–362. https://doi.org/10.1145/3442188.3445899

Digital Library

[23]

Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in Neural Information Processing Systems 33 (2020).

[24]

Atoosa Kasirzadeh and Andrew Smart. 2021. The Use and Misuse of Counterfactuals in Ethical Machine Learning. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 228–236. https://doi.org/10.1145/3442188.3445886

Digital Library

[25]

Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. 2019. Thieves on Sesame Street! Model Extraction of BERT-based APIs. In International Conference on Learning Representations.

[26]

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2017. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443(2017).

[27]

Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015. Recommender system application developments: a survey. Decision Support Systems 74 (2015), 12–32.

Digital Library

[28]

Smitha Milli, Ludwig Schmidt, Anca D Dragan, and Moritz Hardt. 2019. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 1–9.

Digital Library

[29]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607–617.

Digital Library

[30]

Nicholas. [n. d.]. EDA - Credit Scoring, Top 100 on Leaderboard. https://www.kaggle.com/nicholasgah/eda-credit-scoring-top-100-on-leaderboard

[31]

Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4954–4963.

[32]

Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 865–872.

[33]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506–519.

Digital Library

[34]

Neel Patel, Martin Strobel, and Yair Zick. 2021. High Dimensional Model Explanations: An Axiomatic Approach. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 401–411. https://doi.org/10.1145/3442188.3445903

Digital Library

[35]

Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344–350.

Digital Library

[36]

Hangwei Qian, Sinno Jialin Pan, Bingshui Da, and Chunyan Miao. 2019. A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition. In IJCAI. 5614–5620.

[37]

Chenhao Tan Ramaravind K. Mothilal, Amit Sharma. [n. d.]. Diverse Counterfactual Explanations (DiCE) for ML. https://github.com/interpretml/DiCE

[38]

Chris Russell. 2019. Efficient search for diverse coherent explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 20–28.

Digital Library

[39]

Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.

[40]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 3–18.

[41]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034(2013).

[42]

Kacper Sokol and Peter A Flach. 2019. Counterfactual explanations of machine learning predictions: opportunities and challenges for AI safety. In SafeAI@ AAAI.

[43]

Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 465–474.

Digital Library

[44]

Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In 25th {USENIX} Security Symposium ({USENIX} Security 16). 601–618.

[45]

Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10–19.

Digital Library

[46]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[47]

Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual Explanations for Machine Learning: A Review. arXiv preprint arXiv:2010.10596(2020).

[48]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.

[49]

Yongjie Wang, Qinxu Ding, Ke Wang, Yue Liu, Xingyu Wu, Jinglong Wang, Yong Liu, and Chunyan Miao. 2021. The Skyline of Counterfactual Explanations for Machine Learning Decision Models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2030–2039.

Digital Library

[50]

Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers. In Proceedings of the 2016 network and distributed systems symposium, Vol. 10.

[51]

Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. In NDSS.

Cited By

Shree SKhadka KLei YKacker RKuhn DFilkov VRay BZhou M(2024)Constructing Surrogate Models in Machine Learning Using Combinatorial Testing and Active LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695532(1645-1654)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695532
Verma SBoonsanong VHoang MHines KDickerson JShah C(2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/367711956:12(1-42)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3677119
Dibbo SLuo BLiao XXu JKirda ELie D(2024)Novel Privacy Attacks and Defenses Against Neural NetworksProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690863(5113-5115)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690863
Show More Cited By

Index Terms

DualCF: Efficient Model Extraction Attack from Counterfactual Explanations

Index terms have been assigned to the content through auto-classification.

Recommendations

Motif-Guided Time Series Counterfactual Explanations
Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges
Abstract
With the rising need of interpretable machine learning methods, there is a necessity for a rise in human effort to provide diverse explanations of the influencing factors of the model decisions. To improve the trust and transparency of AI-based ...
Towards explainable model extraction attacks
Abstract
One key factor able to boost the applications of artificial intelligence (AI) in security‐sensitive domains is to leverage them responsibly, which is engaged in providing explanations for AI. To date, a plethora of explainable artificial ...
Convex Density Constraints for Computing Plausible Counterfactual Explanations
Artificial Neural Networks and Machine Learning – ICANN 2020
Abstract
The increasing deployment of machine learning as well as legal regulations such as EU’s GDPR cause a need for user-friendly explanations of decisions proposed by machine learning models. Counterfactual explanations are considered as one of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

June 2022

2351 pages

ISBN:9781450393522

DOI:10.1145/3531146

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NRF Investigatorship Programme
Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI)
Wallenberg-NTU Presidential Postdoctoral Fellowship

Conference

FAccT '22

Sponsor:

ACM

FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency

June 21 - 24, 2022

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)14

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shree SKhadka KLei YKacker RKuhn DFilkov VRay BZhou M(2024)Constructing Surrogate Models in Machine Learning Using Combinatorial Testing and Active LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695532(1645-1654)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695532
Verma SBoonsanong VHoang MHines KDickerson JShah C(2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/367711956:12(1-42)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3677119
Dibbo SLuo BLiao XXu JKirda ELie D(2024)Novel Privacy Attacks and Defenses Against Neural NetworksProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690863(5113-5115)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690863
Duddu VSzyller SAsokan N(2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00243
Liu HWu YYu ZZhang N(2024)Please Tell Me More: Privacy Impact of Explainability through the Lens of Membership Inference Attack2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00120(4791-4809)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00120
Liu XLiu TYang HDong JYing ZMa Z(2024)Model Stealing Detection for IoT Services Based on Multidimensional FeaturesIEEE Internet of Things Journal10.1109/JIOT.2024.338667011:24(39183-39194)Online publication date: 15-Dec-2024
https://doi.org/10.1109/JIOT.2024.3386670
Dibbo SBreuer AMoore JTeti M(2024)Improving Robustness to Model Inversion Attacks via Sparse Coding ArchitecturesComputer Vision – ECCV 202410.1007/978-3-031-72989-8_7(117-136)Online publication date: 26-Oct-2024
https://doi.org/10.1007/978-3-031-72989-8_7
Oliynyk DMayer RRauber A(2023)I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys10.1145/359529255:14s(1-41)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1145/3595292
Chen ZSilvestri FWang JZhang YTolomei GChen HDuh WHuang HKato MMothe JPoblete B(2023)The Dark Side of Explanations: Poisoning Recommender Systems with Counterfactual ExamplesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592070(2426-2430)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592070
Shahriar SAllana SHazratifard SDara R(2023)A Survey of Privacy Risks and Mitigation Strategies in the Artificial Intelligence Life CycleIEEE Access10.1109/ACCESS.2023.328719511(61829-61854)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3287195
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents