[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3461702.3462597acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

Published: 30 July 2021 Publication History

Abstract

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complimentary of these two approaches. Our evaluation on three benchmark datasets --- Adult-Income, LendingClub, and German-Credit --- confirms the complimentary. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem.

References

[1]
[n.d.]. ([n. d.]).
[2]
Accessed 2019. UCI Machine Learning Repository. German credit dataset. https://archive.ics.uci.edu/ml/support/statlog+(german+credit+data)
[3]
Kjersti Aas, Martin Jullum, and Anders Løland. 2021. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence (2021), 103502.
[4]
David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018).
[5]
Rajiv Arya, Grant Wei, Jonathan V McCoy, Jody Crane, Pamela Ohman-Strickland, and Robert M Eisenstein. 2013. Decreasing length of stay in the emergency department with a split emergency severity index 3 patient flow model. Academic Emergency Medicine, Vol. 20, 11 (2013), 1171--1179.
[6]
Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012 (2019).
[7]
Solon Barocas, Andrew D Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 80--89.
[8]
Samuel Carton, Anirudh Rathore, and Chenhao Tan. 2020. Evaluating and Characterizing Human Rationales. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9294--9307.
[9]
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1721--1730.
[10]
Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020 a. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature. Springer, 448--469.
[11]
Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020 b. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature. Springer, 448--469.
[12]
Thomas Desautels, Jacob Calvert, Jana Hoffman, Melissa Jay, Yaniv Kerem, Lisa Shieh, David Shimabukuro, Uli Chettipally, Mitchell D Feldman, Chris Barton, et al. 2016. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR medical informatics, Vol. 4, 3 (2016), e28.
[13]
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. 2020. ERASER: A Benchmark to Evaluate Rationalized NLP Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4443--4458.
[14]
Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. 2018. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems. 592--603.
[15]
Andrea Freyer Dugas, Thomas D Kirsch, Matthew Toerper, Fred Korley, Gayane Yenokyan, Daniel France, David Hager, and Scott Levin. 2016. An electronic emergency triage system to improve patient distribution by critical outcomes. The Journal of emergency medicine, Vol. 50, 6 (2016), 910--918.
[16]
Sainyam Galhotra, Romila Pradhan, and Babak Salimi. 2021. Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals. arXiv preprint arXiv:2103.11972 (2021).
[17]
Julian S Haimovich, Arjun K Venkatesh, Abbas Shojaee, Andreas Coppi, Frederick Warner, Shu-Xia Li, and Harlan M Krumholz. 2017. Discovery of temporal and disease association patterns in condition-specific hospital utilization rates. PloS one, Vol. 12, 3 (2017), e0172049.
[18]
Joseph Y Halpern. 2016. Actual causality .MIT Press.
[19]
Woo Suk Hong, Adrian Daniel Haimovich, and R Andrew Taylor. 2018. Predicting hospital admission at emergency department triage using machine learning. PloS one, Vol. 13, 7 (2018), e0201016.
[20]
Steven Horng, David A Sontag, Yoni Halpern, Yacine Jernite, Nathan I Shapiro, and Larry A Nathanson. 2017. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PloS one, Vol. 12, 4 (2017), e0174708.
[21]
Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. 2020. Model-agnostic counterfactual explanations for consequential decisions. In International Conference on Artificial Intelligence and Statistics. PMLR, 895--905.
[22]
Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic recourse: from counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 353--362.
[23]
Janis Klaise, Arnaud Van Looveren, Giovanni Vacanti, and Alexandru Coca. 2019. Alibi: Algorithms for monitoring and explaining machine learning models. https://github.com/SeldonIO/alibi
[24]
Ronny Kohavi and Barry Becker. 1996. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/adult
[25]
Alex Kulesza and Ben Taskar. 2012. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012).
[26]
I Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020. Problems with Shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning. PMLR, 5491--5500.
[27]
Vivian Lai, Zheng Cai, and Chenhao Tan. 2019. Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 486--495.
[28]
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29--38.
[29]
Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675--1684.
[30]
Scott Levin, Matthew Toerper, Eric Hamrock, Jeremiah S Hinson, Sean Barnes, Heather Gardner, Andrea Dugas, Bob Linton, Tom Kirsch, and Gabor Kelen. 2018. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Annals of emergency medicine, Vol. 71, 5 (2018), 565--574.
[31]
Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--15.
[32]
Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supplements, Vol. 27 (1990), 247--266.
[33]
Zachary C Lipton. 2018. The mythos of model interpretability. Queue, Vol. 16, 3 (2018), 31--57.
[34]
Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 150--158.
[35]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765--4774.
[36]
Divyat Mahajan, Chenhao Tan, and Amit Sharma. 2019. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277 (2019).
[37]
Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence (2018).
[38]
Karel GM Moons, Andre Pascal Kengne, Mark Woodward, Patrick Royston, Yvonne Vergouwe, Douglas G Altman, and Diederick E Grobbee. 2012. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio) marker. Heart, Vol. 98, 9 (2012), 683--690.
[39]
Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607--617.
[40]
Ziad Obermeyer and Ezekiel J Emanuel. 2016. Predicting the future-big data, machine learning, and clinical medicine. The New England journal of medicine, Vol. 375, 13 (2016), 1216.
[41]
Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal inference in statistics: A primer. John Wiley & Sons.
[42]
Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344--350.
[43]
Kaivalya Rawal, Ece Kamar, and Himabindu Lakkaraju. 2020. Can I Still Trust You?: Understanding the Impact of Distribution Shifts on Algorithmic Recourses. arXiv preprint arXiv:2012.11788 (2020).
[44]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[45]
Donald B Rubin. 2005. Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc., Vol. 100, 469 (2005), 322--331.
[46]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206--215.
[47]
Chris Russell. 2019. Efficient search for diverse coherent explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 20--28.
[48]
Maximilian Schleich, Zixuan Geng, Yihong Zhang, and Dan Suciu. 2021. GeCo: Quality Counterfactual Explanations in Real Time. arXiv preprint arXiv:2101.01292 (2021).
[49]
Shubham Sharma, Jette Henderson, and Joydeep Ghosh. 2019. Certifai: Counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv preprint arXiv:1905.07857 (2019).
[50]
Shubham Sharma, Jette Henderson, and Joydeep Ghosh. 2020. Certifai: A common framework to provide explanations and analyse the fairness and robustness of black-box models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 166--172.
[51]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International Conference on Machine Learning. PMLR, 3145--3153.
[52]
Kacper Sokol and Peter Flach. 2020. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 56--67.
[53]
Mukund Sundararajan and Amir Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning. PMLR, 9269--9278.
[54]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319--3328.
[55]
Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. 2017. Detecting bias in black-box models using transparent model distillation. arXiv preprint arXiv:1710.06169 (2017).
[56]
Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10--19.
[57]
Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual Explanations for Machine Learning: A Review. arXiv preprint arXiv:2010.10596 (2020).
[58]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., Vol. 31 (2017), 841.
[59]
James Woodward. 2006. Sensitive and insensitive causation. The Philosophical Review, Vol. 115, 1 (2006), 1--50.
[60]
Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Sai Suggala, David I Inouye, and Pradeep Ravikumar. 2019. On the (in) fidelity and sensitivity for explanations. arXiv preprint arXiv:1901.09392 (2019).
[61]
Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4085--4094.
[62]
Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. 2019. " Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991 (2019).

Cited By

View all
  • (2024)Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxACM Computing Surveys10.1145/3672553Online publication date: 12-Jun-2024
  • (2024)Drawing Attributions From Evolved CounterfactualsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664122(1582-1589)Online publication date: 14-Jul-2024
  • (2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
    July 2021
    1077 pages
    ISBN:9781450384735
    DOI:10.1145/3461702
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. actual causality
    2. counterfactual examples
    3. explanation
    4. feature attribution

    Qualifiers

    • Research-article

    Conference

    AIES '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 61 of 162 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)299
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxACM Computing Surveys10.1145/3672553Online publication date: 12-Jun-2024
    • (2024)Drawing Attributions From Evolved CounterfactualsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664122(1582-1589)Online publication date: 14-Jul-2024
    • (2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
    • (2024)Explainable Artificial Intelligence: Counterfactual Explanations for Risk-Based Decision-Making in ConstructionIEEE Transactions on Engineering Management10.1109/TEM.2023.332595171(10667-10685)Online publication date: 2024
    • (2024)Why Should I Trust Your Explanation? An Evaluation Approach for XAI Methods Applied to Predictive Process Monitoring ResultsIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33570415:4(1458-1472)Online publication date: Apr-2024
    • (2024)Optimal Neighborhood Contexts in Explainable AI: An Explanandum-Based EvaluationIEEE Open Journal of the Computer Society10.1109/OJCS.2024.33897815(181-194)Online publication date: 2024
    • (2024)Counterfactual Analysis of Neural Networks Used to Create Fertilizer Management Zones2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650046(1-8)Online publication date: 30-Jun-2024
    • (2024)Are Objective Explanatory Evaluation Metrics Trustworthy? An Adversarial Analysis2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647779(3938-3944)Online publication date: 27-Oct-2024
    • (2024)Counterfactual-Driven Model Explanation Evaluation Method2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10653060(471-475)Online publication date: 10-May-2024
    • (2024)FS-SCF networkExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121670237:PCOnline publication date: 1-Feb-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media