More Web Proxy on the site http://driver.im/

research-article

Causal Interventional Training for Image Recognition

Authors:

Qianru SunAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 25

Pages 1033 - 1044

https://doi.org/10.1109/TMM.2021.3136717

Published: 01 January 2023 Publication History

Abstract

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using <italic>causal inference</italic>, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the desired causal effect without the bias. We start from revisiting the process of building a visual recognition system, and then propose a structural causal model (SCM) for the key variables involved in dataset collection and recognition model: object, common sense, bias, context, and label prediction. Based on the SCM, one can observe that there are “good” and “bad” biases. Intuitively, in the image where a car is driving on a high way in a desert, the “good” bias denoting the common-sense context is the highway, and the “bad” bias accounting for the noisy context factor is the desert. We tackle this problem with a novel causal interventional training (<monospace>CIT</monospace>) approach, where we control the <italic>observed</italic> context in each object class. We offer theoretical justifications for <monospace>CIT</monospace> and validate it with extensive classification experiments on CIFAR-10, CIFAR-100 and ImageNet, <italic>e.g.</italic>, surpassing the standard deep neural networks ResNet-34 and ResNet-50, respectively, by 0.95% and 0.70% accuracies on the ImageNet. Our code is open-sourced on the GitHub <uri>https://github.com/qinwei-hfut/CIT</uri>.

References

[1]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[2]

I. Goodfellowet al., “Generative adversarial nets,” in Neural Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., Red Hook, NY, USA: Curran Associates, Inc., 2014.

[3]

H. Benet al., “Unpaired image captioning with semantic-constrained self-learning,”IEEE Trans. Multimedia, to be published.

[4]

X. Yang, P. Zhou, and M. Wang, “Person reidentification via structural deep metric learning,”IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 10, pp. 2987–2998, Oct.2019.

[5]

X. Liu, X. Yang, M. Wang, and R. Hong, “Deep neighborhood component analysis for visual similarity modeling,”ACM Trans. Intell. Syst. Technol., vol. 11, no. 3, pp. 1–15, 2020.

Digital Library

[6]

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,”Commun. ACM, vol. 64, no. 3, pp. 107–115, 2021.

Digital Library

[7]

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,”J. Big Data, vol. 6, pp. 1–48, 2019.

[8]

A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 1521–1528.

[9]

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 448–456.

[10]

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,”J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.

Digital Library

[11]

L. Jiang, Z. Zhou, T. Leung, L. Li, and L. Fei-Fei, “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2309–2318.

[12]

M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4331–4340.

[13]

P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1885–1894.

[14]

Z. Zhu, L. Xie, and A. L. Yuille, “Object recognition with and without objects,” in Proc. Int. Joint Conf. Artif. Intell., 2017, pp. 3609–3615.

[15]

Z. Liuet al., “Towards natural and accurate future motion prediction of humans and animals,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9996–10004.

[16]

K. Tang, Y. Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3713–3722.

[17]

J. Pearlet al., “Causal inference in statistics: An overview,”Statist. Surv., vol. 3, pp. 96–146, 2009.

[18]

D. Lopez-Paz, R. Nishihara, S. Chintala, B. Scholkopf, and L. Bottou, “Discovering causal signals in images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 58–66.

[19]

J. Pearl, “Interpretation and identification of causal mediation,”Psychol. Methods, vol. 19, pp. 459–481, 2014.

[20]

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Dept. Comput. Sci., Univ. Toronto, 2009.

[21]

O. Russakovskyet al., “ImageNet large scale visual recognition challenge,”Int. J. Comput. Vis., vol. 115, pp. 211–252, 2015.

Digital Library

[22]

D. P. MacKinnon, A. J. Fairchild, and M. S. Fritz., “Mediation analysis,”Annu. Rev. Psychol., vol. 58, pp. 593–614, 2007.

[23]

L. Richiardi, R. Bellocco, and D. Zugna, “Mediation analysis in epidemiology: Methods, interpretation and bias,”Int. J. Epidemiol., vol. 42, no. 5, pp. 1511–1519, 2013.

[24]

Y. Yamashita, T. Harada, and Y. Kuniyoshi, “Causal flow,”IEEE Trans. Multimedia, vol. 14, pp. 619–629, 2012.

Digital Library

[25]

D. Mahajan, C. Tan, and A. Sharma, “Preserving causal constraints in counterfactual explanations for machine learning classifiers,” in Proc. Workshop Neural Inf. Process. Syst., 2019.

[26]

M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant Risk Minimization,”2019, arXiv:1907.02893.

[27]

T. Wang, J. Huang, H. Zhang, and Q. Sun, “Visual commonsense R-CNN,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10757–10767.

[28]

S. Magliacaneet al., “Domain adaptation by using causal inference to predict invariant conditional distributions,” in Proc. Neural Inf. Process. Syst., 2018, pp. 10869–10879.

[29]

V. Agarwal, R. Shetty, and M. Fritz, “Towards causal VQA: Revealing and reducing spurious correlations by invariant and covariant semantic editing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9687–9695.

[30]

P. Schwab and W. Karlen, “Cxplain: Causal explanations for model interpretation under uncertainty,” in Proc. Neural Inf. Process. Syst., 2019, pp. 10220–10230.

[31]

J. Mitrovic, B. McWilliams, J. C. Walker, L. H. Buesing, and C. Blundell, “Representation learning via invariant causal mechanisms,” in Proc. Int. Conf. Learn. Representations, 2021.

[32]

D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 655–666, 2020.

[33]

F. Shaoet al., “Improving weakly-supervised object localization via causal intervention,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3321–3329.

Digital Library

[34]

Y. Atzmon, F. Kreuk, U. Shalit, and G. Chechik, “A causal view of compositional zero-shot recognition,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 1462–1473, 2020.

[35]

D. Janzing, “Causal regularization,” in Neural Information Processing Systems, vol. 32, H. Wallach, H. Larochelle, H. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., Red Hook, NY, USA: Curran Associates, Inc., 2019.

[36]

R. Suter, D. Miladinovic, B. Schölkopf, and S. Bauer, “Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 6056–6065.

[37]

X. Yang, F. Feng, W. Ji, M. Wang, and T.-S. Chua, “Deconfounded video moment retrieval with causal intervention,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 1–10.

Digital Library

[38]

Y. Li, X. Yang, X. Shang, and T.-S. Chua, “Interventional video relation detection,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 4091–4099.

Digital Library

[39]

M. Ilse, J. M. Tomczak, and P. Forré, “Selecting data augmentation for simulating interventions,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 4555–4562.

[40]

Y. Luoet al., “Category-level adversarial adaptation for semantic segmentation using purified features,”IEEE Trans. Pattern Anal. Mach. Intell., to be published.

Digital Library

[41]

M. Prabhushankar and G. AlRegib, “Extracting causal visual features for limited label classification,” in Proc. IEEE Int. Conf. Image Process., 2021, pp. 3697–3701.

[42]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1106–1114.

[43]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.

[44]

H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=r1Ddp1-Rb

[45]

S. Yunet al., “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 6022–6031.

[46]

B. Ceulemans, S.-P. Lu, G. Lafruit, and A. Munteanu, “Robust multiview synthesis for wide-baseline camera arrays,”IEEE Trans. Multimedia, vol. 20, pp. 2235–2248, 2018.

Digital Library

[47]

W. Zhanget al., “Frame augmented alternating attention network for video question answering,”IEEE Trans. Multimedia, vol. 22, pp. 1032–1041, 2020.

Digital Library

[48]

B. Cao, N. Wang, J. Li, and X. Gao, “Data augmentation-based joint learning for heterogeneous face recognition,”IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 6, pp. 1731–1743, Jun.2019.

[49]

F. Fahimi, S. Dosen, K. K. Ang, N. Mrachacz-Kersting, and C. Guan, “Generative adversarial networks-based data augmentation for brain-computer interface,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 9, pp. 4039–4051, Sep.2021.

[50]

J. Yoon, J. Jordon, and M. van der Schaar, “GANITE: Estimation of individualized treatment effects using generative adversarial nets,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=ByKWUeWA-

[51]

L. Neal, M. L. Olson, X. Z. Fern, W. Wong, and F. Li, “Open set learning with counterfactual images,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 620–635.

[52]

D. Kaushik, E. Hovy, and Z. Lipton, “Learning the difference that makes a difference with counterfactually-augmented data,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=Sklgs0NFvr

[53]

R. Zmigrod, S. J. Mielke, H. M. Wallach, and R. Cotterell, “Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology,” in Proc. Annu. Meetings Assoc. Comput. Linguistics, 2019, pp. 1651–1661.

[54]

E. Barnea and O. Ben-Shahar, “Exploring the bounds of the utility of context for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7412–7420.

[55]

S. K. Divvala, D. Hoiem, J. Hays, A. A. Efros, and M. Hebert, “An empirical study of context in object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1271–1278.

[56]

B. Kim, H. Kim, K. Kim, S. Kim, and J. Kim, “Learning not to learn: Training deep neural networks with biased data,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9012–9020.

[57]

M. S. Alvi, A. Zisserman, and C. Nellåker, “Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings,” in Proc. Workshop Eur. Conf. Comput. Vis., 2018, pp. 556–572.

[58]

K. K. Singh, “Don’t judge an object by its context: Learning to overcome contextual bias,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11070–11078.

[59]

J. Pearl, Causality: Models, Reasoning and Inference. USA: Cambridge Univ. Press, 2009.

Digital Library

[60]

R. McNamee, “Confounding and confounders,”Occup. Environ. Med., vol. 60, no. 3, pp. 227–234, 2003.

[61]

T. J. VanderWeele and I. Shpitser, “On the definition of a confounder,”Ann. Statist., pp. 196–220, 2013.

[62]

J. Pearl, “Comment: Graphical models, causality and intervention,”Stat. Sci., vol. 8, pp. 266–269, 1993.

[63]

S. Greenland, J. Pearl, and J. M. Robins, “Causal diagrams for epidemiologic research,”Epidemiology, vol. 10, no. 1, pp. 37–48, 1999.

[64]

J. Pearl, “The do-calculus revisited,” in Proc. Conf. Uncertainty Artif. Intell., 2012, pp. 3–11.

[65]

M. Y. Guan, V. Gulshan, A. M. Dai, and G. E. Hinton, “Who said what: Modeling individual labelers improves classification,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 3109–3118.

[66]

D. Geiger, T. Verma, and J. Pearl, “D-separation: From theorems to algorithms,” in Proc. Conf. Uncertainty Artif. Intell., 1989, pp. 139–148.

[67]

J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3917–3926.

[68]

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1139–1147.

[69]

J. Liu, C. Gao, D. Meng, and W. Zuo, “Two-stream contextualized CNN for fine-grained image classification,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, pp. 4232–4233.

[70]

C. F. Flores, A. Gonzalez-Garcia, J. van de Weijer, and B. Raducanu, “Saliency for fine-grained object recognition in domains with scarce training data,”Pattern Recognit., vol. 94, pp. 62–73, 2019.

Digital Library

[71]

A. Paszkeet al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Neural Inf. Process. Syst., 2019, pp. 8024–8035.

Cited By

Liu YLiu FJiao LBao QLi LGuo YChen P(2024)A Knowledge-Based Hierarchical Causal Inference Network for Video Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.338633926(9135-9149)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3386339
Wen XNie WLiu JSu YZhang YLiu A(2024)CDCM: ChatGPT-Aided Diversity-Aware Causal Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.335239726(6488-6500)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3352397
Nie WWen XLiu JChen JWu JJin GLu JLiu A(2024)Knowledge-Enhanced Causal Reinforcement Learning Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2023.327650526(1129-1142)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3276505
Show More Cited By

Recommendations

Deep Co-Training for Semi-Supervised Image Recognition
Computer Vision – ECCV 2018
Abstract
In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework. The ...
Interventional few-shot learning
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

We uncover an ever-overlooked deficiency in the prevailing Few-Shot Learning (FSL) methods: the pre-trained knowledge is indeed a confounder that limits the performance. This finding is rooted from our causal assumption: a Structural Causal Model (SCM) ...
Better Self-training for Image Classification Through Self-supervision
AI 2021: Advances in Artificial Intelligence
Abstract
Self-training is a simple semi-supervised learning approach: Unlabelled examples that attract high-confidence predictions are labelled with their predictions and added to the training set, with this process being repeated multiple times. Recently, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 25, Issue

2023

8932 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YLiu FJiao LBao QLi LGuo YChen P(2024)A Knowledge-Based Hierarchical Causal Inference Network for Video Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.338633926(9135-9149)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3386339
Wen XNie WLiu JSu YZhang YLiu A(2024)CDCM: ChatGPT-Aided Diversity-Aware Causal Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.335239726(6488-6500)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3352397
Nie WWen XLiu JChen JWu JJin GLu JLiu A(2024)Knowledge-Enhanced Causal Reinforcement Learning Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2023.327650526(1129-1142)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3276505
Zhai ZShen SMao Y(2024)An explainable deep reinforcement learning algorithm for the parameter configuration and adjustment in the consortium blockchainEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107606129:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107606
Yin CWang JZhang HFeng KShi LMa Q(2024)Counterfactual Contrastive Learning for Fine Grained Image ClassificationArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_12(169-183)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72341-4_12
Ge FZhang YWang LColeman SKerr D(2023)Double-Domain Adaptation Semantics for Retrieval-Based Long-Term Visual LocalizationIEEE Transactions on Multimedia10.1109/TMM.2023.334513826(6050-6064)Online publication date: 20-Dec-2023
https://dl.acm.org/doi/10.1109/TMM.2023.3345138
Yang MXu CWu ADeng C(2023)A Decomposable Causal View of Compositional Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2022.320057825(5892-5902)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3200578

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents