[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Causal Interventional Training for Image Recognition

Published: 01 January 2023 Publication History

Abstract

Deep learning models often fit undesired dataset bias in training. In this paper, we formulate the bias using <italic>causal inference</italic>, which helps us uncover the ever-elusive causalities among the key factors in training, and thus pursue the desired causal effect without the bias. We start from revisiting the process of building a visual recognition system, and then propose a structural causal model (SCM) for the key variables involved in dataset collection and recognition model: object, common sense, bias, context, and label prediction. Based on the SCM, one can observe that there are &#x201C;good&#x201D; and &#x201C;bad&#x201D; biases. Intuitively, in the image where a car is driving on a high way in a desert, the &#x201C;good&#x201D; bias denoting the common-sense context is the highway, and the &#x201C;bad&#x201D; bias accounting for the noisy context factor is the desert. We tackle this problem with a novel causal interventional training (<monospace>CIT</monospace>) approach, where we control the <italic>observed</italic> context in each object class. We offer theoretical justifications for <monospace>CIT</monospace> and validate it with extensive classification experiments on CIFAR-10, CIFAR-100 and ImageNet, <italic>e.g.</italic>, surpassing the standard deep neural networks ResNet-34 and ResNet-50, respectively, by 0.95&#x0025; and 0.70&#x0025; accuracies on the ImageNet. Our code is open-sourced on the GitHub <uri>https://github.com/qinwei-hfut/CIT</uri>.

References

[1]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[2]
I. Goodfellowet al., “Generative adversarial nets,” in Neural Information Processing Systems, vol. 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., Red Hook, NY, USA: Curran Associates, Inc., 2014.
[3]
H. Benet al., “Unpaired image captioning with semantic-constrained self-learning,”IEEE Trans. Multimedia, to be published.
[4]
X. Yang, P. Zhou, and M. Wang, “Person reidentification via structural deep metric learning,”IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 10, pp. 2987–2998, Oct.2019.
[5]
X. Liu, X. Yang, M. Wang, and R. Hong, “Deep neighborhood component analysis for visual similarity modeling,”ACM Trans. Intell. Syst. Technol., vol. 11, no. 3, pp. 1–15, 2020.
[6]
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,”Commun. ACM, vol. 64, no. 3, pp. 107–115, 2021.
[7]
C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,”J. Big Data, vol. 6, pp. 1–48, 2019.
[8]
A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 1521–1528.
[9]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 448–456.
[10]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,”J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
[11]
L. Jiang, Z. Zhou, T. Leung, L. Li, and L. Fei-Fei, “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2309–2318.
[12]
M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4331–4340.
[13]
P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1885–1894.
[14]
Z. Zhu, L. Xie, and A. L. Yuille, “Object recognition with and without objects,” in Proc. Int. Joint Conf. Artif. Intell., 2017, pp. 3609–3615.
[15]
Z. Liuet al., “Towards natural and accurate future motion prediction of humans and animals,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9996–10004.
[16]
K. Tang, Y. Niu, J. Huang, J. Shi, and H. Zhang, “Unbiased scene graph generation from biased training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3713–3722.
[17]
J. Pearlet al., “Causal inference in statistics: An overview,”Statist. Surv., vol. 3, pp. 96–146, 2009.
[18]
D. Lopez-Paz, R. Nishihara, S. Chintala, B. Scholkopf, and L. Bottou, “Discovering causal signals in images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 58–66.
[19]
J. Pearl, “Interpretation and identification of causal mediation,”Psychol. Methods, vol. 19, pp. 459–481, 2014.
[20]
A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Dept. Comput. Sci., Univ. Toronto, 2009.
[21]
O. Russakovskyet al., “ImageNet large scale visual recognition challenge,”Int. J. Comput. Vis., vol. 115, pp. 211–252, 2015.
[22]
D. P. MacKinnon, A. J. Fairchild, and M. S. Fritz., “Mediation analysis,”Annu. Rev. Psychol., vol. 58, pp. 593–614, 2007.
[23]
L. Richiardi, R. Bellocco, and D. Zugna, “Mediation analysis in epidemiology: Methods, interpretation and bias,”Int. J. Epidemiol., vol. 42, no. 5, pp. 1511–1519, 2013.
[24]
Y. Yamashita, T. Harada, and Y. Kuniyoshi, “Causal flow,”IEEE Trans. Multimedia, vol. 14, pp. 619–629, 2012.
[25]
D. Mahajan, C. Tan, and A. Sharma, “Preserving causal constraints in counterfactual explanations for machine learning classifiers,” in Proc. Workshop Neural Inf. Process. Syst., 2019.
[26]
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant Risk Minimization,”2019, arXiv:1907.02893.
[27]
T. Wang, J. Huang, H. Zhang, and Q. Sun, “Visual commonsense R-CNN,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10757–10767.
[28]
S. Magliacaneet al., “Domain adaptation by using causal inference to predict invariant conditional distributions,” in Proc. Neural Inf. Process. Syst., 2018, pp. 10869–10879.
[29]
V. Agarwal, R. Shetty, and M. Fritz, “Towards causal VQA: Revealing and reducing spurious correlations by invariant and covariant semantic editing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9687–9695.
[30]
P. Schwab and W. Karlen, “Cxplain: Causal explanations for model interpretation under uncertainty,” in Proc. Neural Inf. Process. Syst., 2019, pp. 10220–10230.
[31]
J. Mitrovic, B. McWilliams, J. C. Walker, L. H. Buesing, and C. Blundell, “Representation learning via invariant causal mechanisms,” in Proc. Int. Conf. Learn. Representations, 2021.
[32]
D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 655–666, 2020.
[33]
F. Shaoet al., “Improving weakly-supervised object localization via causal intervention,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3321–3329.
[34]
Y. Atzmon, F. Kreuk, U. Shalit, and G. Chechik, “A causal view of compositional zero-shot recognition,”Adv. Neural Inf. Process. Syst., vol. 33, pp. 1462–1473, 2020.
[35]
D. Janzing, “Causal regularization,” in Neural Information Processing Systems, vol. 32, H. Wallach, H. Larochelle, H. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., Red Hook, NY, USA: Curran Associates, Inc., 2019.
[36]
R. Suter, D. Miladinovic, B. Schölkopf, and S. Bauer, “Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 6056–6065.
[37]
X. Yang, F. Feng, W. Ji, M. Wang, and T.-S. Chua, “Deconfounded video moment retrieval with causal intervention,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 1–10.
[38]
Y. Li, X. Yang, X. Shang, and T.-S. Chua, “Interventional video relation detection,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 4091–4099.
[39]
M. Ilse, J. M. Tomczak, and P. Forré, “Selecting data augmentation for simulating interventions,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 4555–4562.
[40]
Y. Luoet al., “Category-level adversarial adaptation for semantic segmentation using purified features,”IEEE Trans. Pattern Anal. Mach. Intell., to be published.
[41]
M. Prabhushankar and G. AlRegib, “Extracting causal visual features for limited label classification,” in Proc. IEEE Int. Conf. Image Process., 2021, pp. 3697–3701.
[42]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
[43]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.
[44]
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=r1Ddp1-Rb
[45]
S. Yunet al., “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 6022–6031.
[46]
B. Ceulemans, S.-P. Lu, G. Lafruit, and A. Munteanu, “Robust multiview synthesis for wide-baseline camera arrays,”IEEE Trans. Multimedia, vol. 20, pp. 2235–2248, 2018.
[47]
W. Zhanget al., “Frame augmented alternating attention network for video question answering,”IEEE Trans. Multimedia, vol. 22, pp. 1032–1041, 2020.
[48]
B. Cao, N. Wang, J. Li, and X. Gao, “Data augmentation-based joint learning for heterogeneous face recognition,”IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 6, pp. 1731–1743, Jun.2019.
[49]
F. Fahimi, S. Dosen, K. K. Ang, N. Mrachacz-Kersting, and C. Guan, “Generative adversarial networks-based data augmentation for brain-computer interface,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 9, pp. 4039–4051, Sep.2021.
[50]
J. Yoon, J. Jordon, and M. van der Schaar, “GANITE: Estimation of individualized treatment effects using generative adversarial nets,” in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=ByKWUeWA-
[51]
L. Neal, M. L. Olson, X. Z. Fern, W. Wong, and F. Li, “Open set learning with counterfactual images,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 620–635.
[52]
D. Kaushik, E. Hovy, and Z. Lipton, “Learning the difference that makes a difference with counterfactually-augmented data,” in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=Sklgs0NFvr
[53]
R. Zmigrod, S. J. Mielke, H. M. Wallach, and R. Cotterell, “Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology,” in Proc. Annu. Meetings Assoc. Comput. Linguistics, 2019, pp. 1651–1661.
[54]
E. Barnea and O. Ben-Shahar, “Exploring the bounds of the utility of context for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7412–7420.
[55]
S. K. Divvala, D. Hoiem, J. Hays, A. A. Efros, and M. Hebert, “An empirical study of context in object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1271–1278.
[56]
B. Kim, H. Kim, K. Kim, S. Kim, and J. Kim, “Learning not to learn: Training deep neural networks with biased data,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9012–9020.
[57]
M. S. Alvi, A. Zisserman, and C. Nellåker, “Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings,” in Proc. Workshop Eur. Conf. Comput. Vis., 2018, pp. 556–572.
[58]
K. K. Singh, “Don’t judge an object by its context: Learning to overcome contextual bias,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11070–11078.
[59]
J. Pearl, Causality: Models, Reasoning and Inference. USA: Cambridge Univ. Press, 2009.
[60]
R. McNamee, “Confounding and confounders,”Occup. Environ. Med., vol. 60, no. 3, pp. 227–234, 2003.
[61]
T. J. VanderWeele and I. Shpitser, “On the definition of a confounder,”Ann. Statist., pp. 196–220, 2013.
[62]
J. Pearl, “Comment: Graphical models, causality and intervention,”Stat. Sci., vol. 8, pp. 266–269, 1993.
[63]
S. Greenland, J. Pearl, and J. M. Robins, “Causal diagrams for epidemiologic research,”Epidemiology, vol. 10, no. 1, pp. 37–48, 1999.
[64]
J. Pearl, “The do-calculus revisited,” in Proc. Conf. Uncertainty Artif. Intell., 2012, pp. 3–11.
[65]
M. Y. Guan, V. Gulshan, A. M. Dai, and G. E. Hinton, “Who said what: Modeling individual labelers improves classification,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 3109–3118.
[66]
D. Geiger, T. Verma, and J. Pearl, “D-separation: From theorems to algorithms,” in Proc. Conf. Uncertainty Artif. Intell., 1989, pp. 139–148.
[67]
J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3917–3926.
[68]
I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1139–1147.
[69]
J. Liu, C. Gao, D. Meng, and W. Zuo, “Two-stream contextualized CNN for fine-grained image classification,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, pp. 4232–4233.
[70]
C. F. Flores, A. Gonzalez-Garcia, J. van de Weijer, and B. Raducanu, “Saliency for fine-grained object recognition in domains with scarce training data,”Pattern Recognit., vol. 94, pp. 62–73, 2019.
[71]
A. Paszkeet al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Neural Inf. Process. Syst., 2019, pp. 8024–8035.

Cited By

View all
  • (2024)A Knowledge-Based Hierarchical Causal Inference Network for Video Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.338633926(9135-9149)Online publication date: 1-Jan-2024
  • (2024)CDCM: ChatGPT-Aided Diversity-Aware Causal Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.335239726(6488-6500)Online publication date: 10-Jan-2024
  • (2024)Knowledge-Enhanced Causal Reinforcement Learning Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2023.327650526(1129-1142)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 25, Issue
2023
8932 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Knowledge-Based Hierarchical Causal Inference Network for Video Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.338633926(9135-9149)Online publication date: 1-Jan-2024
  • (2024)CDCM: ChatGPT-Aided Diversity-Aware Causal Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.335239726(6488-6500)Online publication date: 10-Jan-2024
  • (2024)Knowledge-Enhanced Causal Reinforcement Learning Model for Interactive RecommendationIEEE Transactions on Multimedia10.1109/TMM.2023.327650526(1129-1142)Online publication date: 1-Jan-2024
  • (2024)An explainable deep reinforcement learning algorithm for the parameter configuration and adjustment in the consortium blockchainEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107606129:COnline publication date: 16-May-2024
  • (2024)Counterfactual Contrastive Learning for Fine Grained Image ClassificationArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_12(169-183)Online publication date: 17-Sep-2024
  • (2023)Double-Domain Adaptation Semantics for Retrieval-Based Long-Term Visual LocalizationIEEE Transactions on Multimedia10.1109/TMM.2023.334513826(6050-6064)Online publication date: 20-Dec-2023
  • (2023)A Decomposable Causal View of Compositional Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2022.320057825(5892-5902)Online publication date: 1-Jan-2023

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media