Abstract
Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1452–1464 (2017)
Fellbaum, C.: WordNet: An Electronic Lexical Database. BradfordBooks (1998)
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency (FAT), pp. 77–91 (2018)
Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In: Conference on Fairness, Accountability and Transparency (FAT) (2020)
Varol, G., et al.: Learning from synthetic humans. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 109–117 (2017)
Mandelbrot, B.: The fractal geometry of nature. Am. J. Phys. 51, 286 (1983)
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111, 98–136 (2015)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Krasin, I., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification (2017)
Kay, W., et al.: The Kinetics Human Action Video Dataset. arXiv pre-print arXiv:1705.06950 (2017)
Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(2), 502–508 (2019)
Donahue, J., Jia, Y., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: International Conference on Machine Learning (ICML), pp. 647–655 (2014)
Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning? In: Advances in Neural Information Processing Systems NIPS 2016 Workshop (2016)
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2661–2671 (2019)
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: The IEEE International Conference on Computer Vision (ICCV), pp. 843–852 (2017)
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: European Conference on Computer Vision (ECCV), pp. 181–196 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NIPS) 25, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Howard, A.G., et al.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv pre-print arXiv:1704.04861 (2017)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv2: Inverted Residuals and Linear Bottlenecks. Mobile Networks for Classification, Detection and Segmentation. arXiv pre-print arXiv:1801.04381 (2018)
Howard, A.G., et al.: Searching for MobileNetV3. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)
Doersch, C., Gupta, A., Efros, A.: Unsupervised visual representation learning by context prediction. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430 (2015)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Noroozi, M., Vinjimoor, A., Favaro, P., Pirsiavash, H.: Boosting self-supervised learning via knowledge transfer. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9359–9367 (2018)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: The IEEE International Conference on Computer Vision (ICCV), pp. 5898–5906 (2017)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representation (ICLR) (2018)
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: The IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML) (2020)
Landini, G., Murry, P.I., Misson, G.P.: Local connected fractal dimensions and lacunarity analyses of 60 degree fluorescein angiograms. Invest. Ophthalmol. Vis. Sci. 36, 2749–2755 (1995)
Smith Jr., T.G., Lange, G.D., Marks, W.B.: Fractal methods and results in cellular morphology - dimentions, lacunarity and multifractals. J. Neurosci. Methods 69, 123–136 (1996)
Barnsley, M.F.: Fractals Everywhere. Academic Press, New York (1988)
Monro, D.M., Budbridge, F.: Rendering algorithms for deteministic fractals. In: IEEE Computer Graphics and Its Applications, pp. 32–41 (1995)
Chen, Y.Q., Bi, G.: 3-D IFS fractals as real-time graphics model. Comput. Graph. 21, 367–370 (1997)
Pentland, A.P.: Fractal-based description of natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 6, 661–674 (1984)
Varma, M., Garg, R.: Locally invariant fractal features for statistical texture classification. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1–8 (2007)
Xu, Y., Ji, H., Fermuller, C.: Viewpoint invariant texture description using fractal analysis. Int. J. Comput. Vis. (IJCV) 83, 85–100 (2009)
Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. In: International Conference on Learning Representation (ICLR) (2017)
Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2004)
Farin, G.: Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide. Academic Press, Cambridge (1993)
Perlin, K.: Improving noise. ACM Trans. Graph. (TOG) 21, 681–682 (2002)
Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics (COMPSTAT), pp. 177–187 (2010)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: The IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Acknowledgement
– This work was supported by JSPS KAKENHI Grant Number JP19H01134.
– We want to thank Takuma Yagi and Munetaka Minoguchi for their helpful comments during research discussions.
– Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kataoka, H. et al. (2021). Pre-training Without Natural Images. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-69544-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69543-9
Online ISBN: 978-3-030-69544-6
eBook Packages: Computer ScienceComputer Science (R0)