[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3531146.3533188acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

DualCF: Efficient Model Extraction Attack from Counterfactual Explanations

Published: 20 June 2022 Publication History

Abstract

Cloud service providers have launched Machine-Learning-as-a-Service (MLaaS) platforms to allow users to access large-scale cloud-based models via APIs. In addition to prediction outputs, these APIs can also provide other information in a more human-understandable way, such as counterfactual explanations (CF). However, such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks which aim to steal the internal functionality of models in the cloud. Due to the black-box nature of cloud models, however, a vast number of queries are inevitably required by existing attack strategies before the substitute model achieves high fidelity. In this paper, we propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model. This is motivated by our observation that current querying strategies suffer from decision boundary shift issue induced by taking far-distant queries and close-to-boundary CFs into substitute model training. We then propose DualCF strategy to circumvent the above issues, which is achieved by taking not only CF but also counterfactual explanation of CF (CCF) as pairs of training samples for the substitute model. Extensive and comprehensive experimental evaluations are conducted on both synthetic and real-world datasets. The experimental results favorably illustrate that DualCF can produce a high-fidelity model with fewer queries efficiently and effectively.

References

[1]
[n. d.]. Give Me Some Credit Dataset. https://www.kaggle.com/c/GiveMeSomeCredit/overview
[2]
IBM AI Explainability 360. [n. d.]. Credit Approval. https://nbviewer.org/github/IBM/AIX360/blob/master/examples/tutorials/HELOC.ipynb
[3]
Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. 2020. Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884(2020).
[4]
SeldonIO alibi. [n. d.]. Counterfactuals guided by prototypes on Boston housing dataset. https://docs.seldon.io/projects/alibi/en/latest/examples/cfproto_housing.html
[5]
Solon Barocas, Andrew D. Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In FAT* ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, Mireille Hildebrandt, Carlos Castillo, L. Elisa Celis, Salvatore Ruggieri, Linnet Taylor, and Gabriela Zanfir-Fortuna (Eds.). ACM, 80–89. https://doi.org/10.1145/3351095.3372830
[6]
Luyang Chen, Markus Pelger, and Jason Zhu. 2020. Deep learning in asset pricing. Available at SSRN 3350138(2020).
[7]
David Cohn, Les Atlas, and Richard Ladner. 1994. Improving generalization with active learning. Machine learning 15, 2 (1994), 201–221.
[8]
FICO Community. [n. d.]. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge
[9]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.
[10]
Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. 2018. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems. 592–603.
[11]
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 1322–1333.
[12]
Xueluan Gong, Yanjiao Chen, Wenbin Yang, Guanghao Mei, and Qian Wang. 2021. InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 2439–2447. https://doi.org/10.24963/ijcai.2021/336 Main Track.
[13]
Xueluan Gong, Qian Wang, Yanjiao Chen, Wang Yang, and Xinchang Jiang. 2020. Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models. IEEE Communications Magazine 58, 12 (2020), 83–89.
[14]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572(2014).
[15]
David Harrison Jr and Daniel L Rubinfeld. 1978. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management 5, 1 (1978), 81–102.
[16]
Masoud Hashemi and Ali Fathi. 2020. PermuteAttack: Counterfactual Explanation of Machine Learning Credit Scorecards. arXiv preprint arXiv:2008.10138(2020).
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High accuracy and high fidelity extraction of neural networks. In 29th {USENIX} Security Symposium ({USENIX} Security 20). 1345–1362.
[19]
Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv preprint arXiv:1907.09615(2019).
[20]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
[21]
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. 2019. PRADA: protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 512–527.
[22]
Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic Recourse: from Counterfactual Explanations to Interventions. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 353–362. https://doi.org/10.1145/3442188.3445899
[23]
Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in Neural Information Processing Systems 33 (2020).
[24]
Atoosa Kasirzadeh and Andrew Smart. 2021. The Use and Misuse of Counterfactuals in Ethical Machine Learning. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 228–236. https://doi.org/10.1145/3442188.3445886
[25]
Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. 2019. Thieves on Sesame Street! Model Extraction of BERT-based APIs. In International Conference on Learning Representations.
[26]
Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2017. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443(2017).
[27]
Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015. Recommender system application developments: a survey. Decision Support Systems 74 (2015), 12–32.
[28]
Smitha Milli, Ludwig Schmidt, Anca D Dragan, and Moritz Hardt. 2019. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 1–9.
[29]
Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607–617.
[30]
Nicholas. [n. d.]. EDA - Credit Scoring, Top 100 on Leaderboard. https://www.kaggle.com/nicholasgah/eda-credit-scoring-top-100-on-leaderboard
[31]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4954–4963.
[32]
Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 865–872.
[33]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506–519.
[34]
Neel Patel, Martin Strobel, and Yair Zick. 2021. High Dimensional Model Explanations: An Axiomatic Approach. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 401–411. https://doi.org/10.1145/3442188.3445903
[35]
Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344–350.
[36]
Hangwei Qian, Sinno Jialin Pan, Bingshui Da, and Chunyan Miao. 2019. A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition. In IJCAI. 5614–5620.
[37]
Chenhao Tan Ramaravind K. Mothilal, Amit Sharma. [n. d.]. Diverse Counterfactual Explanations (DiCE) for ML. https://github.com/interpretml/DiCE
[38]
Chris Russell. 2019. Efficient search for diverse coherent explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 20–28.
[39]
Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.
[40]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 3–18.
[41]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034(2013).
[42]
Kacper Sokol and Peter A Flach. 2019. Counterfactual explanations of machine learning predictions: opportunities and challenges for AI safety. In SafeAI@ AAAI.
[43]
Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 465–474.
[44]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In 25th {USENIX} Security Symposium ({USENIX} Security 16). 601–618.
[45]
Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10–19.
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[47]
Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual Explanations for Machine Learning: A Review. arXiv preprint arXiv:2010.10596(2020).
[48]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.
[49]
Yongjie Wang, Qinxu Ding, Ke Wang, Yue Liu, Xingyu Wu, Jinglong Wang, Yong Liu, and Chunyan Miao. 2021. The Skyline of Counterfactual Explanations for Machine Learning Decision Models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2030–2039.
[50]
Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers. In Proceedings of the 2016 network and distributed systems symposium, Vol. 10.
[51]
Honggang Yu, Kaichen Yang, Teng Zhang, Yun-Yun Tsai, Tsung-Yi Ho, and Yier Jin. 2020. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples. In NDSS.

Cited By

View all
  • (2024)Constructing Surrogate Models in Machine Learning Using Combinatorial Testing and Active LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695532(1645-1654)Online publication date: 27-Oct-2024
  • (2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/367711956:12(1-42)Online publication date: 9-Jul-2024
  • (2024)Novel Privacy Attacks and Defenses Against Neural NetworksProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690863(5113-5115)Online publication date: 2-Dec-2024
  • Show More Cited By

Index Terms

  1. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
            June 2022
            2351 pages
            ISBN:9781450393522
            DOI:10.1145/3531146
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 20 June 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Counterfactual Explanations
            2. Decision Boundary Shift
            3. Model Extraction Attack
            4. Model Security and Privacy

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            • NRF Investigatorship Programme
            • Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI)
            • Wallenberg-NTU Presidential Postdoctoral Fellowship

            Conference

            FAccT '22
            Sponsor:

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)119
            • Downloads (Last 6 weeks)14
            Reflects downloads up to 15 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Constructing Surrogate Models in Machine Learning Using Combinatorial Testing and Active LearningProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695532(1645-1654)Online publication date: 27-Oct-2024
            • (2024)Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A ReviewACM Computing Surveys10.1145/367711956:12(1-42)Online publication date: 9-Jul-2024
            • (2024)Novel Privacy Attacks and Defenses Against Neural NetworksProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690863(5113-5115)Online publication date: 2-Dec-2024
            • (2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
            • (2024)Please Tell Me More: Privacy Impact of Explainability through the Lens of Membership Inference Attack2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00120(4791-4809)Online publication date: 19-May-2024
            • (2024)Model Stealing Detection for IoT Services Based on Multidimensional FeaturesIEEE Internet of Things Journal10.1109/JIOT.2024.338667011:24(39183-39194)Online publication date: 15-Dec-2024
            • (2024)Improving Robustness to Model Inversion Attacks via Sparse Coding ArchitecturesComputer Vision – ECCV 202410.1007/978-3-031-72989-8_7(117-136)Online publication date: 26-Oct-2024
            • (2023)I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys10.1145/359529255:14s(1-41)Online publication date: 17-Jul-2023
            • (2023)The Dark Side of Explanations: Poisoning Recommender Systems with Counterfactual ExamplesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592070(2426-2430)Online publication date: 19-Jul-2023
            • (2023)A Survey of Privacy Risks and Mitigation Strategies in the Artificial Intelligence Life CycleIEEE Access10.1109/ACCESS.2023.328719511(61829-61854)Online publication date: 2023
            • Show More Cited By

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media