Towards Verified Robustness under Text Deletion Interventions

Johannes Welbl, Po-Sen Huang, Robert Stanforth, Sven Gowal, Krishnamurthy (Dj) Dvijotham, Martin Szummer, Pushmeet Kohli

Published: 20 Dec 2019, Last Modified: 05 May 2023ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Formal verification of a specification on a model's prediction undersensitivity using Interval Bound Propagation

Abstract: Neural networks are widely used in Natural Language Processing, yet despite their empirical successes, their behaviour is brittle: they are both over-sensitive to small input changes, and under-sensitive to deletions of large fractions of input text. This paper aims to tackle under-sensitivity in the context of natural language inference by ensuring that models do not become more confident in their predictions as arbitrary subsets of words from the input text are deleted. We develop a novel technique for formal verification of this specification for models based on the popular decomposable attention mechanism by employing the efficient yet effective interval bound propagation (IBP) approach. Using this method we can efficiently prove, given a model, whether a particular sample is free from the under-sensitivity problem. We compare different training methods to address under-sensitivity, and compare metrics to measure it. In our experiments on the SNLI and MNLI datasets, we observe that IBP training leads to a significantly improved verified accuracy. On the SNLI test set, we can verify 18.4% of samples, a substantial improvement over only 2.8% using standard training.

Keywords: natural language processing, specification, verification, model undersensitivity, adversarial, interval bound propagation

Data: [MultiNLI](https://paperswithcode.com/dataset/multinli), [SNLI](https://paperswithcode.com/dataset/snli)

Original Pdf: pdf

9 Replies