[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

On the Granularity of Explanations in Model Agnostic NLP Interpretability

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response. In this paper, we outline limitations of this approach when using complex BERT-based classifiers: The word-based sampling produces texts that are out-of-distribution for the classifier and further gives rise to a high-dimensional search space, which can’t be sufficiently explored when time or computation power is limited. Both of these challenges can be addressed by using segments as elementary building blocks for NLP interpretability. As illustration, we show that the simple choice of sentences greatly improves on both of these challenges. As a consequence, the resulting explainer attains much better fidelity on a benchmark classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 79.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 99.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GUTEK, “Gutenberg” in Polish, for Generating Understandable Text Explanations based on Key segments.

  2. 2.

    The scores are also better than the ones obtained for LIME on a random subset of samples using a neighborhood of 1000 samples.

  3. 3.

    https://github.com/axa-rev-research/gutek.

References

  1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)

  2. Arras, L., Horn, F., Montavon, G.: Explaining predictions of non-linear classifiers in NLP. ACL 2016, 1 (2016)

    Google Scholar 

  3. Bibal, A., et al.: Is attention explanation? an introduction to the debate. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3889–3900 (2022)

    Google Scholar 

  4. Bird, S., Klein, E., Loper, E.: Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. (2009)

    Google Scholar 

  5. Chang, S., Zhang, Y., Yu, M., Jaakkola, T.: A game theoretic approach to class-wise selective rationalization. In: Advances in Neural Information Processing Systems, pp. 10055–10065 (2019)

    Google Scholar 

  6. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  8. Dimopoulos, Y., Bourret, P., Lek, S.: Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process. Lett. 2(6), 1–4 (1995)

    Article  Google Scholar 

  9. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)

  10. Jain, S., Wiegreffe, S., Pinter, Y., Wallace, B.C.: Learning to faithfully rationalize by construction. arXiv preprint arXiv:2005.00115 (2020)

  11. Lampridis, O., Guidotti, R., Ruggieri, S.: Explaining sentiment classification with synthetic exemplars and counter-exemplars. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds.) DS 2020. LNCS (LNAI), vol. 12323, pp. 357–373. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61527-7_24

    Chapter  Google Scholar 

  12. Laugel, T., Renard, X., Lesot, M.J., Marsala, C., Detyniecki, M.: Defining locality for surrogates in post-hoc interpretablity. arXiv preprint arXiv:1806.07498 (2018)

  13. Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 7167–7177 (2018)

    Google Scholar 

  14. Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 107–117 (2016)

    Google Scholar 

  15. Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017)

  16. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  17. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp. 4765–4774 (2017)

    Google Scholar 

  18. Miller, J., Krauth, K., Recht, B., Schmidt, L.: The effect of natural distribution shift on question answering models. arXiv preprint arXiv:2004.14444 (2020)

  19. Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)

    Google Scholar 

  20. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)

    Google Scholar 

  21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MATH  Google Scholar 

  22. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 784–789 (2018)

    Google Scholar 

  23. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)

    Google Scholar 

  24. Ribeiro, M.T., Singh, S., Guestrin, C.: why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  25. Rychener, Y., Renard, X., Seddah, D., Frossard, P., Detyniecki, M.: Quackie: A NLP classification task with ground truth explanations. arXiv preprint arXiv:2012.13190 (2020)

  26. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  27. Zafar, M.B., et al.: More than words: towards better quality interpretations of text classifiers. arXiv preprint arXiv:2112.12444 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Rychener .

Editor information

Editors and Affiliations

Appendices

A Reproducibility

To ensure reproducibility, we give the implementation details of our experiments. Direct implementations can also be found directly on our GithubFootnote 3.

1.1 A.1 The Case Against Word-Based Black-Box Interpretability

Distributional Shift. We use the last embedding of the classification token as representation of the whole text. We use base uncased BERT [7]. For the visualisation experiment, we directly use this embedding to calculate Wasserstein distance. To visualize, we use t-SNE on the combined dataset (word removed + sentence removed + original) with PCA initialisation and a perplexity of 100. The algorithm is given a maximum of 5000 iterations, for other parameters we used SKLearn [21] defaults.

For evaluating distributional shift with classifier accuracy, we use base uncased BERT [7], base RoBERTa [16], base uncased DistilBERT [26] and the small ELECTRA [6] discriminator. The text embeddings are pairwise used to create a classification problem, which uses a random 75–25 train test split. We train a Random Forest Classifier using default SKLearn parameters, controlling for complexity using the maximum depth with options 2, 5, 7, 10, 15 and 20. The best choice is selected using out-of-bag accuracy. Results in Fig. 2 and Table 2 represents performance on the test-set.

Computational Complexity. In order to have normal flowing text, we use text from Wikipedia, notably contexts from SQuAD 2.0 [22]. We compare the number of sentences and the number of words, obtained using NLTK [4] sent_tokenize and word_tokenize respectively.

1.2 A.2 Experiments and Analysis

Fidelity Evaluation with QUACKIE. We use code provided with QUACKIE [25] to test GUTEK. In our implementation of GUTEK, we use NLTK sent_tokenize to split the text into sentences and use the SKLearn implementation of the Linear Regression as surrogate. The coefficients of the linear regression are used as sentence scores.

B Tabular Results for OOD Classification

In addition to plotting, we give the results from Fig. 2 in Table 2.

C Qualitative Evaluation

In Figs. 4 and 5, we give some more illustrations of the different explanations, similarly to Fig. 3

Table 2. OOD Classification Accuracy in Tabular Form

D Complete QUACKIE Results

We also give results for all datasets in QUACKIE and report the scores for all other methods currently in QUACKIE in Tables 3, 4 and 5.

Fig. 4.
figure 4

Comparison of explanations for TFIDF movie sentiment classifier, GUTEK (left) vs LIME (right) (sample id 370)

Fig. 5.
figure 5

Comparison of explanations for TFIDF movie sentiment classifier, GUTEK (left) vs LIME (right) (sample id 70)

Table 3. IoU results
Table 4. HPD results
Table 5. SNR results (Examples for which noise cannot be estimated are omitted)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rychener, Y., Renard, X., Seddah, D., Frossard, P., Detyniecki, M. (2023). On the Granularity of Explanations in Model Agnostic NLP Interpretability. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23618-1_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23617-4

  • Online ISBN: 978-3-031-23618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics