Review Article
Published: 22 May 2023

Algorithms to estimate Shapley value feature attributions

Nature Machine Intelligence volume 5, pages 590–601 (2023)Cite this article

6348 Accesses
89 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Feature attributions based on the Shapley value are popular for explaining machine learning models. However, their estimation is complex from both theoretical and computational standpoints. We disentangle this complexity into two main factors: the approach to removing feature information and the tractable estimation strategy. These two factors provide a natural lens through which we can better understand and compare 24 distinct algorithms. Based on the various feature-removal approaches, we describe the multiple types of Shapley value feature attributions and the methods to calculate each one. Then, based on the tractable estimation strategies, we characterize two distinct families of approaches: model-agnostic and model-specific approximations. For the model-agnostic approximations, we benchmark a wide class of estimation approaches and tie them to alternative yet equivalent characterizations of the Shapley value. For the model-specific approximations, we clarify the assumptions crucial to each method’s tractability for linear, tree and deep models. Finally, we identify gaps in the literature and promising future research directions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Shapley value explanations are widely used in practice.**

**Fig. 2: Shapley value definition and properties.**

**Fig. 3: Empirical strategies for handling absent features.**

**Fig. 4: Benchmarking unbiased, model-agnostic algorithms to estimate baseline Shapley values for a single explicand and baseline on XGB models with 100 trees.**

Explaining a series of models by propagating Shapley values

Article Open access 03 August 2022

Importance estimate of features via analysis of their weight and gradient profile

Article Open access 09 October 2024

Exploring the cloud of variable importance for the set of all good models

Article 10 December 2020

Data availability

The diabetes dataset is publicly available (https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html), and we use the version from the sklearn package. The NHANES dataset is publicly available (https://wwwn.cdc.gov/nchs/nhanes/nhefs/), and we use the version from the SHAP package. The blog dataset is publicly available (https://archive.ics.uci.edu/ml/datasets/BlogFeedback).

Code availability

Code for the experiments is available at https://github.com/suinleelab/shapley_algorithms.

References

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Article Google Scholar
Moravcik, M. et al. Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508–513 (2017).
Article MathSciNet MATH Google Scholar
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Jean, S., Cho, K., Memisevic, R. & Bengio, Y. On using very large target vocabulary for neural machine translation. In Proc. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1, Long Papers) P15-1001 (2015).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article MATH Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Steinkraus, D., Buck, I. & Simard, P. Using GPUs for machine learning algorithms. In Proc. Eighth International Conference on Document Analysis and Recognition (ICDAR’05) 1115–1120 (IEEE, 2005).
Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).
Article Google Scholar
Doshi-Velez, F. & Kim, B. Towards a rigorous science of interpretable machine learning. Preprint at https://arxiv.org/abs/1702.08608 (2017).
Selbst, A. & Powles, J. ‘Meaningful information’ and the right to explanation. In Proc. Conference on Fairness, Accountability and Transparency 48–48 (PMLR, 2018).
Knight, E. AI and machine learning-based credit underwriting and adverse action under the ECOA. Bus. Fin. L. Rev. 3, 236 (2019).
Google Scholar
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 4765–4774 (ACM, 2017).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning Vol. 70, 3145–3153 (JMLR, 2017).
Binder, A., Montavon, G., Lapuschkin, S., Muller, K.-R. & Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers. In Proc. International Conference on Artificial Neural Networks 63–71 (Springer, 2016).
Datta, A., Sen, S. & Zick, Y. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In Proc. 2016 IEEE Symposium on Security and Privacy (SP) 598–617 (IEEE, 2016).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. International Conference on Machine Learning 3319–3328 (PMLR, 2017).
Strumbelj, E. & Kononenko, I. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010).
MathSciNet MATH Google Scholar
Kumar, I. E., Venkatasubramanian, S., Scheidegger, C. & Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proc. International Conference on Machine Learning 5491–5500 (PMLR, 2020).
Sundararajan, M. & Najmi, A. The many Shapley values for model explanation. In Proc. International Conference on Machine Learning 9269–9278 (PMLR, 2020).
Janzing, D., Minorics, L. & Blobaum, P. Feature relevance quantification in explainable AI: a causal problem. In Proc. International Conference on Artificial Intelligence and Statistics 2907–2916 (PMLR, 2020).
Heskes, T., Sijben, E., Bucur, I. G. & Claassen, T. Causal Shapley values: exploiting causal knowledge to explain individual predictions of complex models. Adv. Neural Inf. Process. Syst. 33, 4778–4789 (2020).
Google Scholar
Covert, I., Lundberg, S. & Lee, S.-I. Explaining by removing: a unified framework for model explanation. J. Mach. Learn. Res. 22, 9477–9566 (2021).
MathSciNet MATH Google Scholar
Chen, J., Song, L., Wainwright, M. J. & Jordan, M. I. L-Shapley and C-Shapley: efficient model interpretation for structured data. In Proc. International Conference on Learning Representations (ICLR'19) (2019).
Chen, H., Janizek, J. D., Lundberg, S. & Lee, S.-I. True to the model or true to the data? Preprint at https://arxiv.org/abs/2006.16234 (2020).
Chen, H., Lundberg, S.M. & Lee, SI. Explaining a series of models by propagating Shapley values. Nat. Commun. 13, 4512 (2022).
Castro, J., Gómez, D. & Tejada, J. Polynomial calculation of the Shapley value based on sampling. Comput. Op. Res. 36, 1726–1730 (2009).
Article MathSciNet MATH Google Scholar
Okhrati, R. & Lipani, A. A multilinear sampling algorithm to estimate Shapley values. In Proc. 2020 25th International Conference on Pattern Recognition (ICPR) 7992–7999 (IEEE, 2021).
Jethani, N., Sudarshan, M., Covert, I. C., Lee, S.-I. & Ranganath, R. FastSHAP: real-time Shapley value estimation. In Proc. International Conference on Learning Representations (PMLR, 2022).
Ancona, M., Oztireli, C. & Gross, M. Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation. In Proc. International Conference on Machine Learning 272–281 (PMLR, 2019).
Wang, R., Wang, X. & Inouye, D. I. Shapley explanation networks. In Proc. International Conference on Learning Representations (PMLR, 2020).
Shapley, L. in Contributions to the Theory of Games Vol. II, 307–317 (Princeton Univ. Press, 1953).
Lucchetti, R., Moretti, S., Patrone, F. & Radrizzani, P. The Shapley and Banzhaf values in microarray games. Comput. Op. Res. 37, 1406–1412 (2010).
Article MathSciNet MATH Google Scholar
Moretti, S. Statistical analysis of the Shapley value for microarray games. Comput. Op. Res. 37, 1413–1418 (2010).
Article MathSciNet MATH Google Scholar
Tarashev, N., Tsatsaronis, K. & Borio, C. Risk attribution using the Shapley value: methodology and policy applications. Rev. Finance 20, 1189–1213 (2016).
Article MATH Google Scholar
Tarashev, N. A., Borio, C. E. & Tsatsaronis, K. The Systemic Importance of Financial Institutions. BIS Quarterly Review (September 2009).
Young, H. P. Monotonic solutions of cooperative games. Int. J. Game Theory 14, 65–72 (1985).
Article MathSciNet MATH Google Scholar
Monderer, D. & Samet, D. et al. Variations on the Shapley value. Handbook Game Theory 3, 2055–2076 (2002).
Google Scholar
Fong, R. C. & Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. In Proc. IEEE International Conference on Computer Vision 3429–3437 (IEEE, 2017).
Sturmfels, P., Lundberg, S. & Lee, S.-I. Visualizing the impact of feature attribution baselines. Distill 5, e22 (2020).
Article Google Scholar
Kapishnikov, A., Bolukbasi, T., Viégas, F. & Terry, M. XRAI: better attributions through regions. In Proc. IEEE/CVF International Conference on Computer Vision 4948–4957 (IEEE, 2019).
Ren, J., Zhou, Z., Chen, Q. & Zhang, Q. Can we faithfully represent absence states to compute Shapley values on a DNN? In Proc. International Conference on Learning Representations (2023).
Merrick, L. & Taly, A. The explanation game: explaining machine learning models using Shapley values. In Proc. International Cross-Domain Conference for Machine Learning and Knowledge Extraction 17–38 (Springer, 2020).
Aas, K., Jullum, M. & Løland, A. Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artif. Intell. 298, 103502 (2021).
Article MathSciNet MATH Google Scholar
Frye, C., de Mijolla, D., Begley, T., Cowton, L., Stanley, M. & Feige, I. Shapley-based explainability on the data manifold. In Proc. International Conference on Learning Representations (2021).
Frye, C., Rowat, C. & Feige, I. Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability. In Proc. Advances in Neural Information Processing Systems 33 (NIPS, 2020).
Wang, J., Wiens, J. & Lundberg, S. Shapley flow: a graph-based approach to interpreting model predictions. In Proc. International Conference on Artificial Intelligence and Statistics 721–729 (PMLR, 2021).
Mase, M., Owen, A. B. & Seiler, B. Explaining black box decisions by Shapley cohort refinement. Preprint at https://arxiv.org/abs/1911.00467 (2019).
Redelmeier, A., Jullum, M. & Aas, K. Explaining predictive models with mixed features using Shapley values and conditional inference trees. In Proc. International Cross-Domain Conference for Machine Learning and Knowledge Extraction 117–137 (Springer, 2020).
Bénard, C., Biau, G., Da Veiga, S. & Scornet, E. SHAFF: fast and consistent Shapley effect estimates via random forests. In Proc. International Conference on Artificial Intelligence and Statistics 5563–5582 (PMLR, 2022).
Aas, K., Nagler, T., Jullum, M. & Løland, A. Explaining predictive models using Shapley values and non-parametric vine copulas. Dependence Model. 9, 62–81 (2021).
Article MathSciNet MATH Google Scholar
Olsen, L. H. B., Glad, I. K., Jullum, M. & Aas, K. Using Shapley values and variational autoencoders to explain predictive models with dependent mixed features. J. Mach. Learn. Res. 23, 1–51 (2022).
MathSciNet Google Scholar
Lipovetsky, S. & Conklin, M. Analysis of regression in game theory approach. Appl. Stochastic Models Business Industry 17, 319–330 (2001).
Article MathSciNet MATH Google Scholar
Štrumbelj, E., Kononenko, I. & Šikonja, M. R. Explaining instance classifications with interactions of subsets of feature values. Data Knowledge Eng. 68, 886–904 (2009).
Article Google Scholar
Williamson, B. & Feng, J. Efficient nonparametric statistical inference on population feature importance using Shapley values. In Proc. International Conference on Machine Learning 10282–10291 (PMLR, 2020).
Covert, I., Kim, C. & Lee, S.-I. Learning to estimate Shapley values with vision transformers. In Proc. International Conference on Learning Representations (2023).
Deng, X. & Papadimitriou, C. H. On the complexity of cooperative solution concepts. Math. Op. Res. 19, 257–266 (1994).
Article MathSciNet MATH Google Scholar
Faigle, U. & Kern, W. The Shapley value for cooperative games under precedence constraints. Int. J. Game Theory 21, 249–266 (1992).
Article MathSciNet MATH Google Scholar
Castro, J., Gómez, D., Molina, E. & Tejada, J. Improving polynomial estimation of the Shapley value by stratified random sampling with optimum allocation. Comput. Op. Res. 82, 180–188 (2017).
Article MathSciNet MATH Google Scholar
Fatima, S. S., Wooldridge, M. & Jennings, N. R. A linear approximation method for the Shapley value. Artif. Intell. 172, 1673–1699 (2008).
Article MathSciNet MATH Google Scholar
Illés, F. & Kerényi, P. Estimation of the Shapley value by ergodic sampling. Preprint at https://arxiv.org/abs/1906.05224 (2019).
Megiddo, N. Computational complexity of the game theory approach to cost allocation for a tree. Math. Op. Res. 3, 189–196 (1978).
Article MathSciNet MATH Google Scholar
Granot, D., Kuipers, J. & Chopra, S. Cost allocation for a tree network with heterogeneous customers. Math. Op. Res. 27, 647–661 (2002).
Article MathSciNet MATH Google Scholar
Dubey, P., Neyman, A. & Weber, R. J. Value theory without efficiency. Math. Op. Res. 6, 122–128 (1981).
Article MathSciNet MATH Google Scholar
Charnes, A., Golany, B., Keane, M. & Rousseau, J. in Econometrics of Planning and Efficiency 123–133 (Springer, 1988).
Ruiz, L. M., Valenciano, F. & Zarzuelo, J. M. The family of least square values for transferable utility games. Games Econ. Behav. 24, 109–130 (1998).
Article MathSciNet MATH Google Scholar
Simon, G. & Vincent, T. A projected stochastic gradient algorithm for estimating Shapley value applied in attribute importance. In Proc. International Cross-Domain Conference for Machine Learning and Knowledge Extraction 97–115 (Springer, 2020).
Owen, G. Multilinear extensions of games. Manag. Sci. 18, 64–79 (1972).
Article MathSciNet MATH Google Scholar
Covert, I. & Lee, S.-I. Improving KernelSHAP: practical Shapley value estimation using linear regression. In Proc. International Conference on Artificial Intelligence and Statistics 3457–3465 (PMLR, 2021).
Mitchell, R., Cooper, J., Frank, E. & Holmes, G. Sampling permutations for Shapley value estimation. J. Mach. Learn. Res. 23, 1–46 (2022).
MathSciNet MATH Google Scholar
Covert, I., Lundberg, S. M. & Lee, S.-I. Understanding global feature contributions with additive importance measures. Advances Neural Information Processing Systems 33, 17212–17223 (2020).
Google Scholar
Yang, J. Fast TreeSHAP: accelerating SHAP value computation for trees. Preprint at https://arxiv.org/abs/2109.09847 (2021).

Download references

Acknowledgements

We thank P. Sturmfels, J. Janizek, G. Erion and A. DeGrave for discussions. This work was funded by the National Science Foundation (DBI-1759487, DBI-1552309, DGE-1762114 and DGE-1256082) and the National Institutes of Health (R35 GM 128638 and R01 NIA AG 0611321).

Author information

These authors contributed equally: Hugh Chen, Ian C. Covert.

Authors and Affiliations

Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
Hugh Chen, Ian C. Covert & Su-In Lee
Microsoft Research, New York, USA
Scott M. Lundberg

Authors

Hugh Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ian C. Covert
View author publications
You can also search for this author in PubMed Google Scholar
Scott M. Lundberg
View author publications
You can also search for this author in PubMed Google Scholar
Su-In Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Su-In Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Martin Jullum and Benedek Rozemberczki for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7 and Tables 1–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, H., Covert, I.C., Lundberg, S.M. et al. Algorithms to estimate Shapley value feature attributions. Nat Mach Intell 5, 590–601 (2023). https://doi.org/10.1038/s42256-023-00657-x

Download citation

Received: 01 July 2022
Accepted: 06 April 2023
Published: 22 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s42256-023-00657-x

This article is cited by

Machine learning and deep learning algorithms in stroke medicine: a systematic review of hemorrhagic transformation prediction models
- Mahbod Issaiy
- Diana Zarei
- David S. Liebeskind
Journal of Neurology (2025)
LLpowershap: logistic loss-based automated Shapley values feature selection method
- Iqbal Madakkatel
- Elina Hyppönen
BMC Medical Research Methodology (2024)
Understanding the global subnational migration patterns driven by hydrological intrusion exposure
- Renlu Qiao
- Shuo Gao
- Zhiqiang Wu
Nature Communications (2024)
Estimation of patient-reported outcome measures based on features of knee joint muscle co-activation in advanced knee osteoarthritis
- Iqram Hussain
- Sung Eun Kim
- Du Hyun Ro
Scientific Reports (2024)
Interpretable and Accurate Identification of Job Seekers at Risk of Long-Term Unemployment: Explainable ML-Based Profiling
- Wouter Dossche
- Sarah Vansteenkiste
- Wilfried Lemahieu
SN Computer Science (2024)

Algorithms to estimate Shapley value feature attributions

Subjects

Abstract

Access options

Similar content being viewed by others

Explaining a series of models by propagating Shapley values

Importance estimate of features via analysis of their weight and gradient profile

Exploring the cloud of variable importance for the set of all good models

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Machine learning and deep learning algorithms in stroke medicine: a systematic review of hemorrhagic transformation prediction models

LLpowershap: logistic loss-based automated Shapley values feature selection method

Understanding the global subnational migration patterns driven by hydrological intrusion exposure

Estimation of patient-reported outcome measures based on features of knee joint muscle co-activation in advanced knee osteoarthritis

Interpretable and Accurate Identification of Job Seekers at Risk of Long-Term Unemployment: Explainable ML-Based Profiling

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links