Abstract
Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response. In this paper, we outline limitations of this approach when using complex BERT-based classifiers: The word-based sampling produces texts that are out-of-distribution for the classifier and further gives rise to a high-dimensional search space, which can’t be sufficiently explored when time or computation power is limited. Both of these challenges can be addressed by using segments as elementary building blocks for NLP interpretability. As illustration, we show that the simple choice of sentences greatly improves on both of these challenges. As a consequence, the resulting explainer attains much better fidelity on a benchmark classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
GUTEK, “Gutenberg” in Polish, for Generating Understandable Text Explanations based on Key segments.
- 2.
The scores are also better than the ones obtained for LIME on a random subset of samples using a neighborhood of 1000 samples.
- 3.
References
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)
Arras, L., Horn, F., Montavon, G.: Explaining predictions of non-linear classifiers in NLP. ACL 2016, 1 (2016)
Bibal, A., et al.: Is attention explanation? an introduction to the debate. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3889–3900 (2022)
Bird, S., Klein, E., Loper, E.: Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. (2009)
Chang, S., Zhang, Y., Yu, M., Jaakkola, T.: A game theoretic approach to class-wise selective rationalization. In: Advances in Neural Information Processing Systems, pp. 10055–10065 (2019)
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Dimopoulos, Y., Bourret, P., Lek, S.: Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process. Lett. 2(6), 1–4 (1995)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Jain, S., Wiegreffe, S., Pinter, Y., Wallace, B.C.: Learning to faithfully rationalize by construction. arXiv preprint arXiv:2005.00115 (2020)
Lampridis, O., Guidotti, R., Ruggieri, S.: Explaining sentiment classification with synthetic exemplars and counter-exemplars. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds.) DS 2020. LNCS (LNAI), vol. 12323, pp. 357–373. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61527-7_24
Laugel, T., Renard, X., Lesot, M.J., Marsala, C., Detyniecki, M.: Defining locality for surrogates in post-hoc interpretablity. arXiv preprint arXiv:1806.07498 (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 7167–7177 (2018)
Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 107–117 (2016)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp. 4765–4774 (2017)
Miller, J., Krauth, K., Recht, B., Schmidt, L.: The effect of natural distribution shift on question answering models. arXiv preprint arXiv:2004.14444 (2020)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 784–789 (2018)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Rychener, Y., Renard, X., Seddah, D., Frossard, P., Detyniecki, M.: Quackie: A NLP classification task with ground truth explanations. arXiv preprint arXiv:2012.13190 (2020)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Zafar, M.B., et al.: More than words: towards better quality interpretations of text classifiers. arXiv preprint arXiv:2112.12444 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Reproducibility
To ensure reproducibility, we give the implementation details of our experiments. Direct implementations can also be found directly on our GithubFootnote 3.
1.1 A.1 The Case Against Word-Based Black-Box Interpretability
Distributional Shift. We use the last embedding of the classification token as representation of the whole text. We use base uncased BERT [7]. For the visualisation experiment, we directly use this embedding to calculate Wasserstein distance. To visualize, we use t-SNE on the combined dataset (word removed + sentence removed + original) with PCA initialisation and a perplexity of 100. The algorithm is given a maximum of 5000 iterations, for other parameters we used SKLearn [21] defaults.
For evaluating distributional shift with classifier accuracy, we use base uncased BERT [7], base RoBERTa [16], base uncased DistilBERT [26] and the small ELECTRA [6] discriminator. The text embeddings are pairwise used to create a classification problem, which uses a random 75–25 train test split. We train a Random Forest Classifier using default SKLearn parameters, controlling for complexity using the maximum depth with options 2, 5, 7, 10, 15 and 20. The best choice is selected using out-of-bag accuracy. Results in Fig. 2 and Table 2 represents performance on the test-set.
Computational Complexity. In order to have normal flowing text, we use text from Wikipedia, notably contexts from SQuAD 2.0 [22]. We compare the number of sentences and the number of words, obtained using NLTK [4] sent_tokenize and word_tokenize respectively.
1.2 A.2 Experiments and Analysis
Fidelity Evaluation with QUACKIE. We use code provided with QUACKIE [25] to test GUTEK. In our implementation of GUTEK, we use NLTK sent_tokenize to split the text into sentences and use the SKLearn implementation of the Linear Regression as surrogate. The coefficients of the linear regression are used as sentence scores.
B Tabular Results for OOD Classification
In addition to plotting, we give the results from Fig. 2 in Table 2.
C Qualitative Evaluation
In Figs. 4 and 5, we give some more illustrations of the different explanations, similarly to Fig. 3
D Complete QUACKIE Results
We also give results for all datasets in QUACKIE and report the scores for all other methods currently in QUACKIE in Tables 3, 4 and 5.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rychener, Y., Renard, X., Seddah, D., Frossard, P., Detyniecki, M. (2023). On the Granularity of Explanations in Model Agnostic NLP Interpretability. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-23618-1_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23617-4
Online ISBN: 978-3-031-23618-1
eBook Packages: Computer ScienceComputer Science (R0)