[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Considerations when learning additive explanations for black-box models

Published: 19 June 2023 Publication History

Abstract

Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model’s prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.

References

[1]
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2019). Sanity checks for saliency maps. In NeurIPS.
[2]
Amodio S, Aria M, and D’Ambrosio A 2014 2014 Statistica On concurvity in nonlinear and nonparametric regression models
[3]
Ancona, M., Ceolini, E., Oztireli, C., & Gross, M. (2018). Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In ICLR.
[4]
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2017). Learning certifiably optimal rule lists. In KDD.
[5]
Apley, D.W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),82, 4.
[6]
Atzmueller, M., & Lemmerich, F. (2012). VIKAMINE - Open-source subgroup discovery, pattern mining. In ECML PKDD: Analytics.
[7]
Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep?. In NeurIPS.
[8]
Bach S, Binder A, Montavon G, Klauschen F, Muller KR, and Samek W On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation PloS ONE 2015 10 7
[9]
Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. In FAT/ML Workshop.
[10]
Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Quantifying interpretability of deep visual representations: Network dissection. In CVPR.
[11]
Bhatt, U., Weller, A., & Moura, J. M. F. (2020). Evaluating and aggregating feature-based model explanations. In IJCAI.
[12]
Bien J and Tibshirani R Prototype selection for interpretable classification The Annals of Applied Statistics 2011 5 4
[13]
Breiman L Random forests Machine Learning 2001 45 1
[14]
Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In KDD.
[15]
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD.
[16]
Chang, C.H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021). How interpretable and trustworthy are GAMs. In KDD.
[17]
Covert, I., Lundberg, S., & Lee, S.I. (2020). Understanding global feature contributions through additive importance measures. In NeurIPS.
[18]
Craven, M. W., & Shavlik, J. W. (1995). Extracting tree-structured representations of trained networks. InNeurIPS.
[19]
Doshi-Velez, F., & Kim, B. (2018). Towards A rigorous science of interpretable machine learning. In Explainable and interpretable models in computer vision and machine learning. Springer.
[20]
FICO. (2018). FICO explainable machine learning challenge. https://community.fico.com/s/explainable-machine-learning-challenge.
[21]
Fisher A.J., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. JMLR.
[22]
Friedman JH Greedy function approximation: A gradient boosting machine The Annals of Statistics 2001 29 5
[23]
Friedman JH and Popescu BE Predictive learning via rule ensembles The Annals of Applied Statistics 2008 2 3
[24]
Frosst, N., & Hinton, G. (2018). Distilling a neural network into a soft decision tree. In CEUR-WS.
[25]
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
[26]
Hastie T and Tibshirani R Generalized additive models Journal of Statistical Science 1986 1 3 297-310
[27]
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
[28]
Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NeurIPS Deep learning and representation learning workshop.
[29]
Hooker, G. (2004). Discovering additive structure in black box functions. In KDD.
[30]
Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Mapping the landscape of predictions: Global explanations of neural networks. In AIES.
[31]
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.
[32]
Jesus, S., Belém, C., Balayan, V., Bento, J., Saleiro, P., Bizarro, P., & Gama, J. (2021). How can I choose an explainer? In FAccT: An application-grounded evaluation of post-hoc explanations.
[33]
Rawal, K., & Lakkaraju, H. (2020). Beyond individualized recourse: Interpretable and interactive summaries of actionable recourses. In NeurIPS.
[34]
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., & Wortman Vaughan, J. (2020). Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning: Interpreting Interpretability. In CHI.
[35]
Kim, B., Khanna, R., & Koyejo, O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In NeurIPS.
[36]
Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Viegas, F., & Sayres, R.A. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In ICML.
[37]
Kingma, D. P., Adam J. L. (2015). A method for stochastic optimization: Adam. In ICLR.
[38]
Lage, I., Chen, E., He, J., Narayanan, M., Kim, B., Gershman, S. J., & Doshi-Velez, F. (2019). Human evaluation of models built for interpretability. In HCOMP.
[39]
Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2019). Faithful and customizable explanations of black box models. In AIES.
[40]
Lending Club. (2011). Lending Club Loan Dataset 2007-2011. https://www.lendingclub.com/info/download-data.action.
[41]
Lengerich, B., Tan, S., Chang, C. H., Hooker, G., & Caruana, R. (2020). An efficient algorithm for recovering identifiable additive models: Purifying interaction effects with the functional anova. In AISTATS.
[42]
Letham B, Rudin C, McCormick TH, Madigan D, et al. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model The Annals of Applied Statistics 2015 9 3
[43]
LiMin Fu Rule generation from neural networks IEEE Transactions on Systems, Man, and Cybernetics 1994 24 8
[44]
Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible models for classification and regression. In KDD.
[45]
Lou, Y., Caruana, R., Gehrke, J., & Hooker, G. (2013). Accurate intelligible models with pairwise interactions. In KDD.
[46]
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In NeurIPS.
[47]
Montavon G, Samek W, and Muller KR Methods for interpreting and understanding deep neural networks Digital Signal Processing 2018 73 1-5
[48]
Mu, J., & Andrea, J. (2020). Compositional explanations of neurons. In NeurIPS.
[49]
Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.
[50]
Orlenko A and Moore JH A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions BioData Mining 2021 14 1
[51]
Owen A. B., (2014). Sobol’ indices, & Shapley value. SIAM/ASA Journal on Uncertainty Quantification.
[52]
Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Wortman Vaughan, J. W., & Wallach, H. (2021). Manipulating and measuring model interpretability. In CHI.
[53]
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In KDD.
[54]
Rudin C Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Nature Machine Intelligence 2019 1 5 206-215
[55]
Sanchez, I., Rocktaschel, T., Riedel, S., & Singh, S. (2015). Towards extracting faithful and descriptive representations of latent variable models. In AAAI spring syposium on knowledge representation and reasoning: Integrating symbolic and neural approaches.
[56]
Setzu, M., Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., & Giannotti, F. (2021). GLocalX-from local to global explanations of black box AI models. Artificial Intelligence294.
[57]
Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. InICML.
[58]
Simonyan, K., & Vedaldi, A., Zisserman, A. (2014). Visualising image classification models and saliency maps. In ICLR Workshop: Deep inside convolutional networks.
[59]
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
[60]
Sobol’, I. M. (1990). On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie,2, 1.
[61]
Slack, D., Hilgard, S., Jia, E., & Singh, S., Lakkaraju, H. (2020). Adversarial attacks on post hoc explanation methods: Fooling LIME and SHAP. In AIES.
[62]
Štrumbelj E and Kononenko I Explaining prediction models and individual predictions with feature contributions Knowledge and Information Systems 2014 41 3
[63]
Tan, S. (2018). Interpretable approaches to detect bias in black-box models. In AIES doctoral consortium.
[64]
Tan, S., Caruana, R., & Hooker, G., Lou, Y. (2018). Auditing black-box models using transparent model distillation: Distill-and-compare. In AIES.
[65]
Tan, S., Soloviev, M., Hooker, G., & Wells, M. T. (2020). Tree space prototypes: Another look at making tree ensembles interpretable. In FODS.
[66]
Tsang, M., Cheng, D., & Liu, Y. (2018). Detecting statistical interactions from neural network weights. In ICLR.
[67]
van der Linden, I., Haned, H., & Kanoulas, E. (2019). Global aggregations of local explanations for black box models. In SIGIR Fairness, accountability, confidentiality, transparency, and safety workshop.
[68]
Williamson, B., & Feng, J. (2020). Efficient nonparametric statistical inference on population feature importance using Shapley values. In ICML.
[69]
Wood, S. N. (2006). Generalized additive models: An introduction with R. Chapman and Hall/CRC.
[70]
Wood SN Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models Journal of the Royal Statistical Society: Series B 2011 73 1
[71]
Yan, T., & Procaccia, A. D. (2021). If you like shapley then you’ll love the core. In AAAI.
[72]
Zhao Q and Hastie T Causal interpretations of black-box models Journal of Business & Economic Statistics 2021 39 1

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 112, Issue 9
Sep 2023
532 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 19 June 2023
Accepted: 30 March 2023
Revision received: 31 December 2022
Received: 19 October 2021

Author Tags

  1. Black-box models
  2. Additive explanations
  3. Model distillation
  4. Interaction effects
  5. Correlated features

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media