Black Box Explanation by Learning Image Exemplars in the Latent Feature Space

Riccardo Guidotti¹⁴,
Anna Monreale¹⁵,
Stan Matwin^16,17 &
…
Dino Pedreschi¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2916 Accesses
36 Citations
3 Altmetric

Abstract

We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 67.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 84.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explaining Image Classifiers by Removing Input Features Using Generative Models

Finding Local Explanations Through Masking Models

Counterfactual attribute-based visual explanations for classification

Article Open access 18 April 2021

Notes

1.
https://ec.europa.eu/justice/smedataprotect/.
2.
In the experiments we use for the \( discriminator \) the default validity threshold 0.5 to distinguish between real and fake exemplars. This value can be increased to admit only more reliable exemplars, or decreased to speed-up the generation process.
3.
Dataset: http://yann.lecun.com/exdb/mnist/, https://www.cs.toronto.edu/~kriz/cifar.html, https://www.kaggle.com/zalando-research/.
4.
Black box: https://scikit-learn.org/, https://keras.io/examples/.
5.
The encoding distribution of AAE is defined as a Gaussian distribution whose mean and variance is predicted by the encoder itself [20]. We adopted the following number of latent features k for the various datasets: mnist \(k{=}4\), fashion \(k{=}8\), cifar10 \(k{=}16\).
6.
Github code links: https://github.com/riccotti/ABELE, https://github.com/marcotcr/lime, https://github.com/marcoancona/DeepExplain.
7.
Criticisms are images not well-explained by prototypes with a regularized kernel function [18].
8.
Best view in color. Black lines are not part of the explanation, they only highlight borders. We do not report explanations for cifar10 and for RF for the sake of space.
9.
This effect is probably due to the figure segmentation performed by lime.
10.
A decision tree for abele and a linear lasso model for lime.
11.
These results confirm the experiments reported in [11].
12.
The abele method achieves similar results for RF not reported due to lack of space.
13.
The abele method achieves similar results for RF not reported due to lack of space.
14.
As in [21], in our experiments, we use \(\epsilon {=}0.1\) for \(\mathcal {N}\) and we add salt and pepper noise.

References

Bach, S., Binder, A., et al.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10(7), e0130140 (2015)
Article Google Scholar
Bien, J., et al.: Prototype selection for interpretable classification. AOAS (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, C., Li, O., Barnett, A., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. arXiv:1806.10574 (2018)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Escalante, H.J., et al. (eds.): Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4
Book Google Scholar
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: ICCV, pp. 3429–3437 (2017)
Google Scholar
Frixione, M., et al.: Prototypes vs exemplars in concept representation. In: KEOD (2012)
Google Scholar
Frosst, N., et al.: Distilling a neural network into a soft decision tree. arXiv:1711.09784 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
Google Scholar
Guidotti, R., et al.: Local rule-based explanations of black box decision systems. arXiv:1805.10820 (2018)
Guidotti, R., Monreale, A., Cariaggi, L.: Investigating neighborhood generation for explanations of image classifiers. In: PAKDD (2019)
Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., et al.: A survey of methods for explaining black box models. ACM CSUR 51(5), 93:1–93:42 (2018)
Google Scholar
Guidotti, R., Ruggieri, S.: On the stability of interpretable models. In: IJCNN (2019)
Google Scholar
Hara, S., et al.: Maximally invariant data perturbation as explanation. arXiv:1806.07004 (2018)
He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Kim, B., et al.: Examples are not enough, learn to criticize! In: NIPS (2016)
Google Scholar
Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: AAAI (2018)
Google Scholar
Makhzani, A., Shlens, J., et al.: Adversarial autoencoders. arXiv:1511.05644 (2015)
Melis, D.A., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: NIPS (2018)
Google Scholar
Molnar, C.: Interpretable machine learning. LeanPub (2018)
Google Scholar
Panigutti, C., Guidotti, R., Monreale, A., Pedreschi, D.: Explaining multi-label black-box classifiers for health applications. In: W3PHIAI (2019)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
Google Scholar
Shrikumar, A., et al.: Not just a black box: learning important features through propagating activation differences. arXiv:1605.01713 (2016)
Siddharth, N., Paige, B., Desmaison, A., de Meent, V., et al.: Inducing interpretable representations with variational autoencoders. arXiv:1611.07492 (2016)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)
Spinner, T., et al.: Towards an interpretable latent space: an intuitive comparison of autoencoders with variational autoencoders. In: IEEE VIS (2018)
Google Scholar
Sun, K., Zhu, Z., Lin, Z.: Enhancing the robustness of deep neural networks by boundary conditional gan. arXiv:1902.11029 (2019)
Sundararajan, M., et al.: Axiomatic attribution for DNN. In ICML, JMLR (2017)
Google Scholar
van der Waa, J., et al.: Contrastive explanations with local foil trees. arXiv:1806.07470 (2018)
Xie, J., et al.: Image denoising with deep neural networks. In: NIPS (2012)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar

Download references

Acknowledgements

This work is partially supported by the EC H2020 programme under the funding schemes: Research Infrastructures G.A. 654024 SoBigData, G.A. 78835 Pro-Res, G.A. 825619 AI4EU and G.A. 780754 Track&Know. The third author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada and of the Ocean Frontiers Institute.

Author information

Authors and Affiliations

ISTI-CNR, Pisa, Italy
Riccardo Guidotti
University of Pisa, Pisa, Italy
Anna Monreale & Dino Pedreschi
Dalhousie University, Halifax, Canada
Stan Matwin
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Stan Matwin

Authors

Riccardo Guidotti
View author publications
You can also search for this author in PubMed Google Scholar
Anna Monreale
View author publications
You can also search for this author in PubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author in PubMed Google Scholar
Dino Pedreschi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Guidotti .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guidotti, R., Monreale, A., Matwin, S., Pedreschi, D. (2020). Black Box Explanation by Learning Image Exemplars in the Latent Feature Space. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_12
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Black Box Explanation by Learning Image Exemplars in the Latent Feature Space

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explaining Image Classifiers by Removing Input Features Using Generative Models

Finding Local Explanations Through Masking Models

Counterfactual attribute-based visual explanations for classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Black Box Explanation by Learning Image Exemplars in the Latent Feature Space

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explaining Image Classifiers by Removing Input Features Using Generative Models

Finding Local Explanations Through Masking Models

Counterfactual attribute-based visual explanations for classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation