[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12355))

Included in the following conference series:

Abstract

One of the primary challenges limiting the applicability of deep learning is its susceptibility to learning spurious correlations rather than the underlying mechanisms of the task of interest. The resulting failure to generalise cannot be addressed by simply using more data from the same distribution. We propose an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. We show that such pairs can be identified in a number of existing datasets in computer vision (visual question answering, multi-label image classification) and natural language processing (sentiment analysis, natural language inference). The new training objective orients the gradient of a model’s decision function with pairs of counterfactual examples. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    By input space, we refer to a space of feature representations of the input, i.e. vector representations (\({\varvec{x}}\)) obtained with a pretrained CNN or text encoder.

References

  1. Abbasnejad, E., Shi, Q., van den Hengel, A., Liu, L.: A generative adversarial density estimator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  2. Abbasnejad, E., Teney, D., Parvaneh, A., Shi, J., van den Hengel, A.: Counterfactual vision and language learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  3. Abbasnejad, E., Wu, Q., Shi, Q., van den Hengel, A.: What’s to know? Uncertainty as a guide to asking goal-oriented questions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  4. Agarwal, V., Shetty, R., Fritz, M.: Towards causal VQA: revealing and reducing spurious correlations by invariant and covariant semantic editing. arXiv preprint arXiv:1912.07538 (2019)

  5. Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4971–4980 (2018)

    Google Scholar 

  6. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and VQA. In: Proceedings of the CVPR (2018)

    Google Scholar 

  7. Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3683 (2018)

    Google Scholar 

  8. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE Conference on Computer Vision (2015)

    Google Scholar 

  9. Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)

  10. Baradel, F., Neverova, N., Mille, J., Mori, G., Wolf, C.: CoPhy: Counterfactual learning of physical dynamics. arXiv preprint arXiv:1909.12000 (2019)

  11. Barbu, A., et al.: ObjectNet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 9448–9458 (2019)

    Google Scholar 

  12. Bartolo, M., Roberts, A., Welbl, J., Riedel, S., Stenetorp, P.: Beat the AI: investigating adversarial human annotations for reading comprehension. arXiv preprint arXiv:2002.00293 (2020)

  13. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2015)

    Google Scholar 

  14. Cadene, R., Dancette, C., Ben-younes, H., Cord, M., Parikh, D.: RUBi: reducing unimodal biases in visual question answering. arXiv preprint arXiv:1906.10169 (2019)

  15. Camburu, O.M., Rocktäschel, T., Lukasiewicz, T., Blunsom, P.: e-SNLI: natural language inference with natural language explanations. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 9539–9549 (2018)

    Google Scholar 

  16. Chen, M., D’Arcy, M., Liu, A., Fernandez, J., Downey, D.: CODAH: an adversarially-authored question answering dataset for common sense. In: Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pp. 63–69 (2019)

    Google Scholar 

  17. Chen, X., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

  18. Clark, C., Yatskar, M., Zettlemoyer, L.: Don’t take the easy way out: ensemble based methods for avoiding known dataset biases. arXiv preprint arXiv:1909.03683 (2019)

  19. Teney, D., Abbasnejad, E., van den Hengel, A.: On incorporating semantic prior knowledge in deep learning through embedding-space constraints. arXiv preprint arXiv:1909.13471 (2019)

  20. Teney, D., Abbasnejad, E., van den Hengel, A.: Unshuffling data for improved generalization. arXiv preprint arXiv:2002.11894 (2020)

  21. Das, A., Agrawal, H., Zitnick, C.L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2016)

    Google Scholar 

  22. Das, A., et al.: Visual dialog. In: Proceedings of the CVPR (2017)

    Google Scholar 

  23. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  24. Feng, S., Wallace, E., Boyd-Graber, J.: Misleading failures of partial-input baselines. arXiv preprint arXiv:1905.05778 (2019)

  25. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016)

    MathSciNet  MATH  Google Scholar 

  26. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  27. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. arXiv preprint arXiv:1612.00837 (2016)

  28. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)

    Google Scholar 

  29. Grand, G., Belinkov, Y.: Adversarial regularization for visual question answering: strengths, shortcomings, and side effects. arXiv preprint arXiv:1906.08430 (2019)

  30. Guo, Y., Cheng, Z., Nie, L., Liu, Y., Wang, Y., Kankanhalli, M.: Quantifying and alleviating the language prior problem in visual question answering. arXiv preprint arXiv:1905.04877 (2019)

  31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  32. Heinze-Deml, C., Meinshausen, N.: Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469 (2017)

  33. Hendricks, L.A., Burns, K., Saenko, K., Darrell, T., Rohrbach, A.: Women also snowboard: overcoming bias in captioning models. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 793–811. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_47

    Chapter  Google Scholar 

  34. Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)

  35. Jakubovitz, D., Giryes, R.: Improving DNN robustness to adversarial attacks using Jacobian regularization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_32

    Chapter  Google Scholar 

  36. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017)

  37. Kaushik, D., Hovy, E., Lipton, Z.C.: Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434 (2019)

  38. Li, Y., Cohn, T., Baldwin, T.: Learning robust representations of text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1979–1985. Association for Computational Linguistics (2016)

    Google Scholar 

  39. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  40. Liu, C., Mao, J., Sha, F., Yuille, A.: Attention correctness in neural image captioning. In: Proceedings of the Conference on AAAI (2017)

    Google Scholar 

  41. Liu, Y., et al.: CBNet: a novel composite backbone network architecture for object detection. arXiv preprint arXiv:1909.03625 (2019)

  42. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)

    Google Scholar 

  43. Mahabadi, R.K., Henderson, J.: Simple but effective techniques to reduce biases. arXiv preprint arXiv:1909.06321 (2019)

  44. Mitchell, T.M.: The need for biases in learning generalizations. Department of Computer Science, Laboratory for Computer Science Research (1980)

    Google Scholar 

  45. Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)

  46. Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)

    Google Scholar 

  47. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)

    Google Scholar 

  48. Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 188–197. Association for Computational Linguistics (2019)

    Google Scholar 

  49. Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599 (2019)

  50. Park, D.H., Darrell, T., Rohrbach, A.: Robust change captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4624–4633 (2019)

    Google Scholar 

  51. Park, D.H., Darrell, T., Rohrbach, A.: Viewpoint invariant change captioning. arXiv preprint arXiv:1901.02527 (2019)

  52. Didelez, V., Pigeot, I.: Judea Pearl: Causality: models, reasoning, and inference. Politische Vierteljahresschrift 42(2), 313–315 (2001). https://doi.org/10.1007/s11615-001-0048-3

    Article  Google Scholar 

  53. Peters, J., Bühlmann, P., Meinshausen, N.: Causal inference by using invariant prediction: identification and confidence intervals. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 78, 947–1012 (2016)

    Article  MathSciNet  Google Scholar 

  54. Qiao, T., Dong, J., Xu, D.: Exploring human-like attention supervision in visual question answering. In: Proceedings of the Conference on AAAI (2018)

    Google Scholar 

  55. Ramakrishnan, S., Agrawal, A., Lee, S.: Overcoming language priors in visual question answering with adversarial regularization. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1541–1551 (2018)

    Google Scholar 

  56. Ribeiro, M.T., Singh, S., Guestrin, C.: ”Why should i trust you ?” Explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)

    Google Scholar 

  57. Rojas-Carulla, M., Schölkopf, B., Turner, R., Peters, J.: Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 1309–1342 (2018)

    MathSciNet  MATH  Google Scholar 

  58. Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 502–518. Association for Computational Linguistics (2017)

    Google Scholar 

  59. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: why did you say that? arXiv preprint arXiv:1611.07450 (2016)

  60. Selvaraju, R.R., et al.: Taking a hint: leveraging explanations to make vision and language models more grounded. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  61. Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3353–3364 (2019)

    Google Scholar 

  62. Shetty, R.R., Fritz, M., Schiele, B.: Adversarial scene editing: automatic object removal from weak supervision. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 7706–7716 (2018)

    Google Scholar 

  63. Su, W., et al.: VL-BERT: pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019)

  64. Suhr, A., Lewis, M., Yeh, J., Artzi, Y.: A corpus of natural language for visual reasoning. In: Proceedings of the Conference on Association for Computational Linguistics, vol. 2, pp. 217–223 (2017)

    Google Scholar 

  65. Suhr, A., Zhou, S., Zhang, A., Zhang, I., Bai, H., Artzi, Y.: A corpus for reasoning about natural language grounded in photographs. arXiv preprint arXiv:1811.00491 (2018)

  66. Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019)

  67. Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: Proceedings of the CVPR (2018)

    Google Scholar 

  68. Teney, D., van den Hengel, A.: Zero-shot visual question answering. arXiv preprint arXiv:1611.05546 (2016)

  69. Teney, D., van den Hengel, A.: Actively seeking and learning from live data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  70. Torralba, A., Efros, A.A., et al.: Unbiased look at dataset bias. In: Proceedings of the CVPR, vol. 1, p. 7 (2011)

    Google Scholar 

  71. Vapnik, V., Izmailov, R.: Rethinking statistical learning theory: learning using statistical invariants. Mach. Learn. 108, 381–423 (2019)

    Article  MathSciNet  Google Scholar 

  72. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)

    Article  Google Scholar 

  73. Vo, N., et al.: Composing text and image for image retrieval-an empirical Odyssey. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6439–6448 (2019)

    Google Scholar 

  74. Wallace, E., Boyd-Graber, J.: Trick me if you can: adversarial writing of trivia challenge questions. In: ACL Student Research Workshop (2018)

    Google Scholar 

  75. Wang, T., Zhao, J., Yatskar, M., Chang, K.W., Ordonez, V.: Balanced datasets are not enough: estimating and mitigating gender bias in deep image representations. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5310–5319 (2019)

    Google Scholar 

  76. Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017)

  77. Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)

    Article  Google Scholar 

  78. Xie, C., Tan, M., Gong, B., Wang, J., Yuille, A., Le, Q.V.: Adversarial examples improve image recognition. arXiv preprint arXiv:1911.09665 (2019)

  79. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  80. Yelp: Yelp dataset challenge. http://www.yelp.com/dataset_challenge

  81. Zellers, R., Bisk, Y., Schwartz, R., Choi, Y.: SWAG: a large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018)

  82. Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., Choi, Y.: HellaSwag: can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019)

  83. Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.: Yin and Yang: balancing and answering binary visual questions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

Download references

Acknowledgements

This material is based on research sponsored by Air Force Research Laboratory and DARPA under agreement number FA8750-19-2-0501. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Damien Teney .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9291 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Teney, D., Abbasnedjad, E., van den Hengel, A. (2020). Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12355. Springer, Cham. https://doi.org/10.1007/978-3-030-58607-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58607-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58606-5

  • Online ISBN: 978-3-030-58607-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics