Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

Damien Teney¹²,
Ehsan Abbasnedjad¹² &
Anton van den Hengel¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12355))

Included in the following conference series:

European Conference on Computer Vision

5127 Accesses
37 Citations

Abstract

One of the primary challenges limiting the applicability of deep learning is its susceptibility to learning spurious correlations rather than the underlying mechanisms of the task of interest. The resulting failure to generalise cannot be addressed by simply using more data from the same distribution. We propose an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. We show that such pairs can be identified in a number of existing datasets in computer vision (visual question answering, multi-label image classification) and natural language processing (sentiment analysis, natural language inference). The new training objective orients the gradient of a model’s decision function with pairs of counterfactual examples. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Article 11 October 2019

Semi-supervised Vision Transformers

Notes

1.
By input space, we refer to a space of feature representations of the input, i.e. vector representations (\({\varvec{x}}\)) obtained with a pretrained CNN or text encoder.

References

Abbasnejad, E., Shi, Q., van den Hengel, A., Liu, L.: A generative adversarial density estimator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Abbasnejad, E., Teney, D., Parvaneh, A., Shi, J., van den Hengel, A.: Counterfactual vision and language learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Abbasnejad, E., Wu, Q., Shi, Q., van den Hengel, A.: What’s to know? Uncertainty as a guide to asking goal-oriented questions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Agarwal, V., Shetty, R., Fritz, M.: Towards causal VQA: revealing and reducing spurious correlations by invariant and covariant semantic editing. arXiv preprint arXiv:1912.07538 (2019)
Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4971–4980 (2018)
Google Scholar
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and VQA. In: Proceedings of the CVPR (2018)
Google Scholar
Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3683 (2018)
Google Scholar
Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE Conference on Computer Vision (2015)
Google Scholar
Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)
Baradel, F., Neverova, N., Mille, J., Mori, G., Wolf, C.: CoPhy: Counterfactual learning of physical dynamics. arXiv preprint arXiv:1909.12000 (2019)
Barbu, A., et al.: ObjectNet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 9448–9458 (2019)
Google Scholar
Bartolo, M., Roberts, A., Welbl, J., Riedel, S., Stenetorp, P.: Beat the AI: investigating adversarial human annotations for reading comprehension. arXiv preprint arXiv:2002.00293 (2020)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2015)
Google Scholar
Cadene, R., Dancette, C., Ben-younes, H., Cord, M., Parikh, D.: RUBi: reducing unimodal biases in visual question answering. arXiv preprint arXiv:1906.10169 (2019)
Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 9539–9549 (2018)
Google Scholar
Chen, M., D’Arcy, M., Liu, A., Fernandez, J., Downey, D.: CODAH: an adversarially-authored question answering dataset for common sense. In: Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pp. 63–69 (2019)
Google Scholar
Chen, X., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
Clark, C., Yatskar, M., Zettlemoyer, L.: Don’t take the easy way out: ensemble based methods for avoiding known dataset biases. arXiv preprint arXiv:1909.03683 (2019)
Teney, D., Abbasnejad, E., van den Hengel, A.: On incorporating semantic prior knowledge in deep learning through embedding-space constraints. arXiv preprint arXiv:1909.13471 (2019)
Teney, D., Abbasnejad, E., van den Hengel, A.: Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894 (2020)
Das, A., Agrawal, H., Zitnick, C.L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2016)
Google Scholar
Das, A., et al.: Visual dialog. In: Proceedings of the CVPR (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, S., Wallace, E., Boyd-Graber, J.: Misleading failures of partial-input baselines. arXiv preprint arXiv:1905.05778 (2019)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016)
MathSciNet MATH Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. arXiv preprint arXiv:1612.00837 (2016)
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)
Google Scholar
Grand, G., Belinkov, Y.: Adversarial regularization for visual question answering: strengths, shortcomings, and side effects. arXiv preprint arXiv:1906.08430 (2019)
Guo, Y., Cheng, Z., Nie, L., Liu, Y., Wang, Y., Kankanhalli, M.: Quantifying and alleviating the language prior problem in visual question answering. arXiv preprint arXiv:1905.04877 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Heinze-Deml, C., Meinshausen, N.: Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469 (2017)
Hendricks, L.A., Burns, K., Saenko, K., Darrell, T., Rohrbach, A.: Women also snowboard: overcoming bias in captioning models. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 793–811. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_47
Chapter Google Scholar
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)
Jakubovitz, D., Giryes, R.: Improving DNN robustness to adversarial attacks using Jacobian regularization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_32
Chapter Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)
Kaushik, D., Hovy, E., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)
Li, Y., Cohn, T., Baldwin, T.: Learning robust representations of text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1979–1985. Association for Computational Linguistics (2016)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, C., Mao, J., Sha, F., Yuille, A.: Attention correctness in neural image captioning. In: Proceedings of the Conference on AAAI (2017)
Google Scholar
Liu, Y., et al.: CBNet: a novel composite backbone network architecture for object detection. arXiv preprint arXiv:1909.03625 (2019)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)
Google Scholar
Mahabadi, R.K., Henderson, J.: Simple but effective techniques to reduce biases. arXiv preprint arXiv:1909.06321 (2019)
Mitchell, T.M.: The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research (1980)
Google Scholar
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)
Google Scholar
Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 188–197. Association for Computational Linguistics (2019)
Google Scholar
Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599 (2019)
Park, D.H., Darrell, T., Rohrbach, A.: Robust change captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4624–4633 (2019)
Google Scholar
Park, D.H., Darrell, T., Rohrbach, A.: Viewpoint invariant change captioning. arXiv preprint arXiv:1901.02527 (2019)
Didelez, V., Pigeot, I.: Judea Pearl: Causality: models, reasoning, and inference. Politische Vierteljahresschrift 42(2), 313–315 (2001). https://doi.org/10.1007/s11615-001-0048-3
Article Google Scholar
Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference by using invariant prediction: identification and confidence intervals. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 78, 947–1012 (2016)
Article MathSciNet Google Scholar
Qiao, T., Dong, J., Xu, D.: Exploring human-like attention supervision in visual question answering. In: Proceedings of the Conference on AAAI (2018)
Google Scholar
Ramakrishnan, S., Agrawal, A., Lee, S.: Overcoming language priors in visual question answering with adversarial regularization. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1541–1551 (2018)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: ”Why should i trust you ?” Explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Google Scholar
Rojas-Carulla, M., Schölkopf, B., Turner, R., Peters, J.: Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 1309–1342 (2018)
MathSciNet MATH Google Scholar
Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 502–518. Association for Computational Linguistics (2017)
Google Scholar
Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: why did you say that? arXiv preprint arXiv:1611.07450 (2016)
Selvaraju, R.R., et al.: Taking a hint: leveraging explanations to make vision and language models more grounded. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3353–3364 (2019)
Google Scholar
Shetty, R.R., Fritz, M., Schiele, B.: Adversarial scene editing: automatic object removal from weak supervision. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 7706–7716 (2018)
Google Scholar
Su, W., et al.: VL-BERT: pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019)
Suhr, A., Lewis, M., Yeh, J., Artzi, Y.: A corpus of natural language for visual reasoning. In: Proceedings of the Conference on Association for Computational Linguistics, vol. 2, pp. 217–223 (2017)
Google Scholar
Suhr, A., Zhou, S., Zhang, A., Zhang, I., Bai, H., Artzi, Y.: A corpus for reasoning about natural language grounded in photographs. arXiv preprint arXiv:1811.00491 (2018)
Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019)
Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: Proceedings of the CVPR (2018)
Google Scholar
Teney, D., van den Hengel, A.: Zero-shot visual question answering. arXiv preprint arXiv:1611.05546 (2016)
Teney, D., van den Hengel, A.: Actively seeking and learning from live data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Torralba, A., Efros, A.A., et al.: Unbiased look at dataset bias. In: Proceedings of the CVPR, vol. 1, p. 7 (2011)
Google Scholar
Vapnik, V., Izmailov, R.: Rethinking statistical learning theory: learning using statistical invariants. Mach. Learn. 108, 381–423 (2019)
Article MathSciNet Google Scholar
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Article Google Scholar
Vo, N., et al.: Composing text and image for image retrieval-an empirical Odyssey. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6439–6448 (2019)
Google Scholar
Wallace, E., Boyd-Graber, J.: Trick me if you can: adversarial writing of trivia challenge questions. In: ACL Student Research Workshop (2018)
Google Scholar
Wang, T., Zhao, J., Yatskar, M., Chang, K.W., Ordonez, V.: Balanced datasets are not enough: estimating and mitigating gender bias in deep image representations. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5310–5319 (2019)
Google Scholar
Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017)
Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)
Article Google Scholar
Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A., Le, Q.V.: Adversarial examples improve image recognition. arXiv preprint arXiv:1911.09665 (2019)
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Yelp: Yelp dataset challenge. http://www.yelp.com/dataset_challenge
Zellers, R., Bisk, Y., Schwartz, R., Choi, Y.: SWAG: a large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018)
Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., Choi, Y.: HellaSwag: can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019)
Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.: Yin and Yang: balancing and answering binary visual questions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar

Download references

Acknowledgements

This material is based on research sponsored by Air Force Research Laboratory and DARPA under agreement number FA8750-19-2-0501. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

Author information

Authors and Affiliations

Australian Institute for Machine Learning, University of Adelaide, North Terrace, Adelaide, SA, 5005, Australia
Damien Teney, Ehsan Abbasnedjad & Anton van den Hengel

Authors

Damien Teney
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Abbasnedjad
View author publications
You can also search for this author in PubMed Google Scholar
Anton van den Hengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Damien Teney .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9291 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teney, D., Abbasnedjad, E., van den Hengel, A. (2020). Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12355. Springer, Cham. https://doi.org/10.1007/978-3-030-58607-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-58607-2_34
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58606-5
Online ISBN: 978-3-030-58607-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Semi-supervised Vision Transformers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 9291 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Semi-supervised Vision Transformers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 9291 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation