Abstract
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
In the experiments we use for the \( discriminator \) the default validity threshold 0.5 to distinguish between real and fake exemplars. This value can be increased to admit only more reliable exemplars, or decreased to speed-up the generation process.
- 3.
- 4.
Black box: https://scikit-learn.org/, https://keras.io/examples/.
- 5.
The encoding distribution of AAE is defined as a Gaussian distribution whose mean and variance is predicted by the encoder itself [20]. We adopted the following number of latent features k for the various datasets: mnist \(k{=}4\), fashion \(k{=}8\), cifar10 \(k{=}16\).
- 6.
- 7.
Criticisms are images not well-explained by prototypes with a regularized kernel function [18].
- 8.
Best view in color. Black lines are not part of the explanation, they only highlight borders. We do not report explanations for cifar10 and for RF for the sake of space.
- 9.
This effect is probably due to the figure segmentation performed by lime.
- 10.
A decision tree for abele and a linear lasso model for lime.
- 11.
These results confirm the experiments reported in [11].
- 12.
The abele method achieves similar results for RF not reported due to lack of space.
- 13.
The abele method achieves similar results for RF not reported due to lack of space.
- 14.
As in [21], in our experiments, we use \(\epsilon {=}0.1\) for \(\mathcal {N}\) and we add salt and pepper noise.
References
Bach, S., Binder, A., et al.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10(7), e0130140 (2015)
Bien, J., et al.: Prototype selection for interpretable classification. AOAS (2011)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, C., Li, O., Barnett, A., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. arXiv:1806.10574 (2018)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Escalante, H.J., et al. (eds.): Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: ICCV, pp. 3429–3437 (2017)
Frixione, M., et al.: Prototypes vs exemplars in concept representation. In: KEOD (2012)
Frosst, N., et al.: Distilling a neural network into a soft decision tree. arXiv:1711.09784 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
Guidotti, R., et al.: Local rule-based explanations of black box decision systems. arXiv:1805.10820 (2018)
Guidotti, R., Monreale, A., Cariaggi, L.: Investigating neighborhood generation for explanations of image classifiers. In: PAKDD (2019)
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., et al.: A survey of methods for explaining black box models. ACM CSUR 51(5), 93:1–93:42 (2018)
Guidotti, R., Ruggieri, S.: On the stability of interpretable models. In: IJCNN (2019)
Hara, S., et al.: Maximally invariant data perturbation as explanation. arXiv:1806.07004 (2018)
He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Kim, B., et al.: Examples are not enough, learn to criticize! In: NIPS (2016)
Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: AAAI (2018)
Makhzani, A., Shlens, J., et al.: Adversarial autoencoders. arXiv:1511.05644 (2015)
Melis, D.A., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: NIPS (2018)
Molnar, C.: Interpretable machine learning. LeanPub (2018)
Panigutti, C., Guidotti, R., Monreale, A., Pedreschi, D.: Explaining multi-label black-box classifiers for health applications. In: W3PHIAI (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
Shrikumar, A., et al.: Not just a black box: learning important features through propagating activation differences. arXiv:1605.01713 (2016)
Siddharth, N., Paige, B., Desmaison, A., de Meent, V., et al.: Inducing interpretable representations with variational autoencoders. arXiv:1611.07492 (2016)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)
Spinner, T., et al.: Towards an interpretable latent space: an intuitive comparison of autoencoders with variational autoencoders. In: IEEE VIS (2018)
Sun, K., Zhu, Z., Lin, Z.: Enhancing the robustness of deep neural networks by boundary conditional gan. arXiv:1902.11029 (2019)
Sundararajan, M., et al.: Axiomatic attribution for DNN. In ICML, JMLR (2017)
van der Waa, J., et al.: Contrastive explanations with local foil trees. arXiv:1806.07470 (2018)
Xie, J., et al.: Image denoising with deep neural networks. In: NIPS (2012)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Acknowledgements
This work is partially supported by the EC H2020 programme under the funding schemes: Research Infrastructures G.A. 654024 SoBigData, G.A. 78835 Pro-Res, G.A. 825619 AI4EU and G.A. 780754 Track&Know. The third author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada and of the Ocean Frontiers Institute.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Guidotti, R., Monreale, A., Matwin, S., Pedreschi, D. (2020). Black Box Explanation by Learning Image Exemplars in the Latent Feature Space. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-46150-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)