Three-Dimensional Reconstruction Pre-Training as a Prior to Improve Robustness to Adversarial Attacks and Spurious Correlation
<p>(<b>a</b>) A class of 3D reconstruction models we are interested in is presented, where a CNN encoder is used to condition the 3D reconstruction model on shape features of 2D input images. (<b>b</b>) To leverage 3D-based pre-training, we extract the weights from the CNN encoder that is pre-trained on 3D reconstruction and use them as initialization for adversarial training on 2D rendered images of 3D objects. The goal of this paper is to investigate the effect of 3D reconstruction pre-training of these image encoders on adversarial robustness.</p> "> Figure 2
<p>Examples of 10 geon categories from Geon3D. The full list of 40 geons we construct (Geon3D-40) is provided in the <a href="#app1-entropy-26-00258" class="html-app">Appendix A</a>.</p> "> Figure 3
<p>(<b>Left</b>) Example images from Geon3D with textured backgrounds. (<b>Right</b>) Example images from ShapeNet.</p> "> Figure 4
<p>Adversarial robustness of vanilla adversarial training (AT) and 3D-based pre-training with increasing perturbation budget for <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> threat model on Geon3D with black and textured backgrounds. DVR stands for Differentiable Volume Rendering. For textured backgrounds, we perform our experiments three times with different random initialization for the classification linear layer, where we use DVR-pretrained ResNet-18 and ImageNet-pretrained ResNet-18 for the main backbone. We report the mean and standard deviation over these three runs. For Black Background, we run AT with different attack learning rates (0.1, 0.2 and 0.3) and report its adversarial accuracy. Here, we use the adversarial perturbation budget of 0.05, which corresponds to 12.75 on the <span class="html-italic">x</span>-axis, for both textured backgrounds and black backgrounds during adversarial training. Between the simplest setting of Geon3D with black background and Geon3D with textured background, we observe that the effect of 3D reconstruction pre-training (DVR) emerges only under the latter. The perturbation budget during adversarial training is 0.05, which corresponds to 12.75 on the <span class="html-italic">x</span>-axis.</p> "> Figure 5
<p>Adversarial robustness of AT and DVR+AT with increasing perturbation budget for <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> threat models on Geon3D. For <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> textured backgrounds, we perform our experiments three times with different random initialization for the classification linear layer, where we use DVR-pretrained ResNet-18 and ImageNet-pretrained ResNet-18 for the main backbone. We report the mean and standard deviation over these three runs, where we see a small variance for <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>T</mi> <mo>−</mo> <msub> <mi>L</mi> <mn>2</mn> </msub> </mrow> </semantics></math>. For <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> Black Background, we run AT with different attack learning rates (0.2, 0.3 and 0.4) and report its adversarial accuracy. Here, we use the adversarial perturbation budget of 3.0 for textured backgrounds and 1.0 for black backgrounds during adversarial training. In the aggregate, 3D pre-training does not improve, and in fact lowers, the performance of AT for black backgrounds. However, similar to the <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> case, we continue to see the trend that 3D-based pre-training helps more for textured backgrounds.</p> "> Figure 6
<p>Adversarial robustness of AT and PxN+AT with increasing perturbation budget for ShapeNet. PxN stands for pixelNeRF. We see that 3D reconstruction pre-training (PxN+AT) improves over vanilla adversarial training (AT) for both <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> across all perturbation budgets. The perturbation budget during adversarial training is 0.05, which corresponds to 12.75 on the <span class="html-italic">x</span>-axis for <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> and 1.0 for <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> threat models.</p> "> Figure 7
<p>Adversarial robustness comparison between PxN+AT, DVR+AT, AE+AT, VAE+AT and AT for both <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> threat models with increasing perturbation budget <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> on ShapeNet. The perturbation budget during adversarial training is 0.05, which corresponds to 12.75 on the <span class="html-italic">x</span>-axis for <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> and 1.0 for <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> threat models.</p> "> Figure 8
<p>Reconstructed ShapeNet images. (<b>Left</b>) AutoEncoder, (<b>Right</b>) VAE.</p> "> Figure A1
<p>The list of 40 geons we constructed.</p> "> Figure A2
<p>Adversarial robustness of 3D pre-trained ResNet-18 for both <math display="inline"><semantics> <msub> <mi>L</mi> <mo>∞</mo> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math> threat models with increasing perturbation budget <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> on Geon3D with black backgrounds.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Three-Dimensional Reconstruction as Pre-Training
Problem Setup for 3D Reconstruction
4. Geon3D Benchmark
4.1. Data Preparation
4.2. Rendering and Data Splits
5. General Methods for Experiments
5.1. Pre-Training
5.2. Adversarial Training
5.3. Evaluation
5.4. Additional Training Details
5.4.1. DVR
5.4.2. PixelNeRF
5.4.3. AE and VAE
5.4.4. Dataset
5.4.5. Background Textures
6. Experiments Using Geon3D
6.1. Adversarial Robustness
6.1.1. Setup
6.1.2. Results
6.2. Robustness to Spurious Correlations between Shape and Background
6.2.1. Setup
6.2.2. Results
6.2.3. Summary
7. Experiments Using More Complex Objects: ShapeNet
7.1. Setup
7.2. Results
7.3. Adversarial Robustness on ShapeNet
8. Limitations and Discussion
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Dataset
Appendix A.1. List of 40 Geons
Appendix B. Additional Experiments
Appendix C. Details of 3D Reconstruction Training
References
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing Properties of Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Athalye, A.; Carlini, N.; Wagner, D. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 274–283. [Google Scholar]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zhang, H.; Yu, Y.; Jiao, J.; Xing, E.; Ghaoui, L.E.; Jordan, M. Theoretically Principled Trade-off between Robustness and Accuracy. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7472–7482. [Google Scholar]
- Carmon, Y.; Raghunathan, A.; Schmidt, L.; Duchi, J.C.; Liang, P.S. Unlabeled Data Improves Adversarial Robustness. In Advances in Neural Information Processing Systems, Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Alayrac, J.B.; Uesato, J.; Huang, P.S.; Fawzi, A.; Stanforth, R.; Kohli, P. Are Labels Required for Improving Adversarial Robustness? In Advances in Neural Information Processing Systems, Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Hendrycks, D.; Lee, K.; Mazeika, M. Using Pre-Training Can Improve Model Robustness and Uncertainty. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2712–2721. [Google Scholar]
- Moayeri, M.; Banihashem, K.; Feizi, S. Explicit Tradeoffs between Adversarial and Natural Distributional Robustness. In Advances in Neural Information Processing Systems, Proceedings of the Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates, Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function Space. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4455–4465. [Google Scholar] [CrossRef]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 165–174. [Google Scholar] [CrossRef]
- Yuille, A.; Kersten, D. Vision as Bayesian Inference: Analysis by Synthesis? Trends Cogn. Sci. 2006, 10, 301–308. [Google Scholar] [CrossRef] [PubMed]
- Mumford, D. Pattern Theory: A Unifying Perspective. In First European Congress of Mathematics: Paris, July 6–10, 1992 Volume I Invited Lectures (Part 1); Joseph, A., Mignot, F., Murat, F., Prum, B., Rentschler, R., Eds.; Progress in Mathematics; Birkhäuser: Basel, Switzerland, 1994; pp. 187–224. [Google Scholar] [CrossRef]
- Biederman, I. Recognition-by-Components: A Theory of Human Image Understanding. Psychol. Rev. 1987, 94, 115–147. [Google Scholar] [CrossRef] [PubMed]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An Information-Rich 3D Model Repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs Are Biased towards Texture; Increasing Shape Bias Improves Accuracy and Robustness. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Wang, H.; He, Z.; Lipton, Z.C.; Xing, E.P. Learning Robust Representations by Projecting Superficial Statistics Out. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Wang, H.; Ge, S.; Lipton, Z.; Xing, E.P. Learning Robust Global Representations by Penalizing Local Predictive Power. Adv. Neural Inf. Process. Syst. 2019, 32, 10506–10518. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Clark, K.; Jaini, P. Text-to-Image Diffusion Models Are Zero Shot Classifiers. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Jaini, P.; Clark, K.; Geirhos, R. Intriguing Properties of Generative Classifiers. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
- Chen, W.; Qian, S.; Fan, D.; Kojima, N.; Hamilton, M.; Deng, J. OASIS: A Large-Scale Dataset for Single Image 3D in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 679–688. [Google Scholar]
- Goyal, A.; Yang, K.; Yang, D.; Deng, J. Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D. Adv. Neural Inf. Process. Syst. 2020, 33, 10514–10525. [Google Scholar]
- Niemeyer, M.; Mescheder, L.; Oechsle, M.; Geiger, A. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3501–3512. [Google Scholar] [CrossRef]
- Sitzmann, V.; Zollhoefer, M.; Wetzstein, G. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations. In Advances in Neural Information Processing Systems, Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 1121–1132. [Google Scholar]
- Zhang, R. Making Convolutional Networks Shift-Invariant Again. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7324–7334. [Google Scholar]
- Azulay, A.; Weiss, Y. Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations? J. Mach. Learn. Res. 2019, 20, 1–25. [Google Scholar]
- Dapello, J.; Marques, T.; Schrimpf, M.; Geiger, F.; Cox, D.; DiCarlo, J.J. Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations. In Advances in Neural Information Processing Systems, Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 13073–13087. [Google Scholar]
- Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Geirhos, R.; Narayanappa, K.; Mitzkus, B.; Thieringer, T.; Bethge, M.; Wichmann, F.A.; Brendel, W. Partial Success in Closing the Gap between Human and Machine Vision. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021. [Google Scholar]
- Choy, C.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar] [CrossRef]
- Riegler, G.; Ulusoy, A.O.; Geiger, A. OctNet: Learning Deep 3D Representations at High Resolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6620–6629. [Google Scholar] [CrossRef]
- Fan, H.; Su, H.; Guibas, L.J. A Point Set Generation Network for 3D Object Reconstruction From a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
- Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; Guibas, L. Learning Representations and Generative Models for 3D Point Clouds. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 40–49. [Google Scholar]
- Kato, H.; Ushiku, Y.; Harada, T. Neural 3D Mesh Renderer. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3907–3916. [Google Scholar] [CrossRef]
- Groueix, T.; Fisher, M.; Kim, V.G.; Russell, B.C.; Aubry, M. A Papier-Mâché Approach to Learning 3D Surface Generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 216–224. [Google Scholar]
- Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5932–5941. [Google Scholar] [CrossRef]
- Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelNeRF: Neural Radiance Fields From One or Few Images. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; p. 10. [Google Scholar]
- Binford, I. Visual Perception by Computer. In Proceedings of the IEEE Conference on Systems and Control, Miami, FL, USA, 7–9 December 1970. [Google Scholar]
- Ikeuchi, K. (Ed.) Computer Vision: A Reference Guide; Springer: New York, NY, USA, 2014. [Google Scholar]
- Blender, O.C. Blender—A 3D Modelling and Rendering Package; Blender Foundation: Amsterdam, The Netherlands, 2021. [Google Scholar]
- Barr. Superquadrics and Angle-Preserving Transformations. IEEE Comput. Graph. Appl. 1981, 1, 11–23. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Croce, F.; Hein, M. Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-Free Attacks. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 2206–2216. [Google Scholar]
- Croce, F.; Hein, M. Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 2196–2205. [Google Scholar]
- Andriushchenko, M.; Croce, F.; Flammarion, N.; Hein, M. Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search. In Proceedings of the Computer Vision—ECCV, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020. Lecture Notes in Computer Science. pp. 484–501. [Google Scholar] [CrossRef]
- Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing Textures in the Wild. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3606–3613. [Google Scholar] [CrossRef]
Feature | Values |
---|---|
Axis | S, C |
Cross-section | S, C |
Sweep function | Co, M, EC, CE |
Termination | T, P, CS |
Geon Category | Difference |
---|---|
Cone vs. Horn | Axis |
Handle vs. Arch | Cross-Section |
Cuboid vs. Cyllinder | Cross-Section |
T. Pyramid vs. T. Cone | Cross-Section |
Cuboid vs. Pyramid | Sweep function |
Barrel vs. T. Cone | Sweep function |
Horn vs. E. Handle | Termination |
AT- | DVR+AT- | AT- | DVR+AT- |
---|---|---|---|
10.8 | 35.6 | 79.0 | 84.20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yamada, Y.; Zhang, F.W.; Kluger, Y.; Yildirim, I. Three-Dimensional Reconstruction Pre-Training as a Prior to Improve Robustness to Adversarial Attacks and Spurious Correlation. Entropy 2024, 26, 258. https://doi.org/10.3390/e26030258
Yamada Y, Zhang FW, Kluger Y, Yildirim I. Three-Dimensional Reconstruction Pre-Training as a Prior to Improve Robustness to Adversarial Attacks and Spurious Correlation. Entropy. 2024; 26(3):258. https://doi.org/10.3390/e26030258
Chicago/Turabian StyleYamada, Yutaro, Fred Weiying Zhang, Yuval Kluger, and Ilker Yildirim. 2024. "Three-Dimensional Reconstruction Pre-Training as a Prior to Improve Robustness to Adversarial Attacks and Spurious Correlation" Entropy 26, no. 3: 258. https://doi.org/10.3390/e26030258
APA StyleYamada, Y., Zhang, F. W., Kluger, Y., & Yildirim, I. (2024). Three-Dimensional Reconstruction Pre-Training as a Prior to Improve Robustness to Adversarial Attacks and Spurious Correlation. Entropy, 26(3), 258. https://doi.org/10.3390/e26030258