[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Pre-training Without Natural Images

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12627))

Included in the following conference series:

Abstract

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://groups.csail.mit.edu/vision/TinyImages/.

  2. 2.

    http://image-net.org/update-sep-17-2019.

References

  1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)

    Google Scholar 

  2. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1452–1464 (2017)

    Article  Google Scholar 

  3. Fellbaum, C.: WordNet: An Electronic Lexical Database. BradfordBooks (1998)

    Google Scholar 

  4. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency (FAT), pp. 77–91 (2018)

    Google Scholar 

  5. Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In: Conference on Fairness, Accountability and Transparency (FAT) (2020)

    Google Scholar 

  6. Varol, G., et al.: Learning from synthetic humans. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 109–117 (2017)

    Google Scholar 

  7. Mandelbrot, B.: The fractal geometry of nature. Am. J. Phys. 51, 286 (1983)

    Article  Google Scholar 

  8. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111, 98–136 (2015)

    Article  Google Scholar 

  9. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  10. Krasin, I., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification (2017)

    Google Scholar 

  11. Kay, W., et al.: The Kinetics Human Action Video Dataset. arXiv pre-print arXiv:1705.06950 (2017)

  12. Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(2), 502–508 (2019)

    Article  Google Scholar 

  13. Donahue, J., Jia, Y., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning (ICML), pp. 647–655 (2014)

    Google Scholar 

  14. Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning? In: Advances in Neural Information Processing Systems NIPS 2016 Workshop (2016)

    Google Scholar 

  15. Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2661–2671 (2019)

    Google Scholar 

  16. Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: The IEEE International Conference on Computer Vision (ICCV), pp. 843–852 (2017)

    Google Scholar 

  17. Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: European Conference on Computer Vision (ECCV), pp. 181–196 (2018)

    Google Scholar 

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NIPS) 25, pp. 1097–1105 (2012)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  20. Szegedy, C., et al.: Going deeper with convolutions. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)

    Google Scholar 

  21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  22. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)

    Google Scholar 

  23. Howard, A.G., et al.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv pre-print arXiv:1704.04861 (2017)

  24. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv2: Inverted Residuals and Linear Bottlenecks. Mobile Networks for Classification, Detection and Segmentation. arXiv pre-print arXiv:1801.04381 (2018)

  25. Howard, A.G., et al.: Searching for MobileNetV3. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)

    Google Scholar 

  26. Doersch, C., Gupta, A., Efros, A.: Unsupervised visual representation learning by context prediction. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430 (2015)

    Google Scholar 

  27. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  28. Noroozi, M., Vinjimoor, A., Favaro, P., Pirsiavash, H.: Boosting self-supervised learning via knowledge transfer. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9359–9367 (2018)

    Google Scholar 

  29. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40

    Chapter  Google Scholar 

  30. Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: The IEEE International Conference on Computer Vision (ICCV), pp. 5898–5906 (2017)

    Google Scholar 

  31. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representation (ICLR) (2018)

    Google Scholar 

  32. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision (ECCV), pp. 132–149 (2018)

    Google Scholar 

  33. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  34. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML) (2020)

    Google Scholar 

  35. Landini, G., Murry, P.I., Misson, G.P.: Local connected fractal dimensions and lacunarity analyses of 60 degree fluorescein angiograms. Invest. Ophthalmol. Vis. Sci. 36, 2749–2755 (1995)

    Google Scholar 

  36. Smith Jr., T.G., Lange, G.D., Marks, W.B.: Fractal methods and results in cellular morphology - dimentions, lacunarity and multifractals. J. Neurosci. Methods 69, 123–136 (1996)

    Article  Google Scholar 

  37. Barnsley, M.F.: Fractals Everywhere. Academic Press, New York (1988)

    MATH  Google Scholar 

  38. Monro, D.M., Budbridge, F.: Rendering algorithms for deteministic fractals. In: IEEE Computer Graphics and Its Applications, pp. 32–41 (1995)

    Google Scholar 

  39. Chen, Y.Q., Bi, G.: 3-D IFS fractals as real-time graphics model. Comput. Graph. 21, 367–370 (1997)

    Article  Google Scholar 

  40. Pentland, A.P.: Fractal-based description of natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 6, 661–674 (1984)

    Article  Google Scholar 

  41. Varma, M., Garg, R.: Locally invariant fractal features for statistical texture classification. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1–8 (2007)

    Google Scholar 

  42. Xu, Y., Ji, H., Fermuller, C.: Viewpoint invariant texture description using fractal analysis. Int. J. Comput. Vis. (IJCV) 83, 85–100 (2009)

    Article  Google Scholar 

  43. Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. In: International Conference on Learning Representation (ICLR) (2017)

    Google Scholar 

  44. Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2004)

    MATH  Google Scholar 

  45. Farin, G.: Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide. Academic Press, Cambridge (1993)

    MATH  Google Scholar 

  46. Perlin, K.: Improving noise. ACM Trans. Graph. (TOG) 21, 681–682 (2002)

    Article  Google Scholar 

  47. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009)

    Google Scholar 

  48. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)

    Article  MathSciNet  Google Scholar 

  49. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics (COMPSTAT), pp. 177–187 (2010)

    Google Scholar 

  50. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: The IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

    Google Scholar 

Download references

Acknowledgement

– This work was supported by JSPS KAKENHI Grant Number JP19H01134.

– We want to thank Takuma Yagi and Munetaka Minoguchi for their helpful comments during research discussions.

– Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hirokatsu Kataoka .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4931 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kataoka, H. et al. (2021). Pre-training Without Natural Images. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69544-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69543-9

  • Online ISBN: 978-3-030-69544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics