Abstract
Although large-scale labeled data are essential for deep convolutional neural networks (ConvNets) to learn high-level semantic visual representations, it is time-consuming and impractical to collect and annotate large-scale datasets. A simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available. The input images are first resized to a smaller size and fed to the ConvNet to recognize the rotation degree. Next, the ConvNet learns the rotation-prediction task for the original size images based on the parameters transferred from the previous model. The CIFAR-10 and ImageNet datasets are examined on different architectures such as AlexNet and ResNet50 in this study. The current study demonstrates that specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task. The ScaleNet supersedes the RotNet by \(\approx 7\%\) in the limited CIFAR-10 dataset. The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about \(6\%\) compared to the RotNet model. This study shows the capability of the ScaleNet method to improve other cutting-edge models such as SimCLR by learning effective features for classification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdillah, B., Jati, G., Jatmiko, W.: Improvement CNN performance by edge detection preprocessing for vehicle classification problem. In: MHS, pp. 1–7. IEEE Press, Nagoya (2018)
Asano, Y.M., Rupprecht, C., Vedaldi, A.: Self-labelling via simultaneous clustering and representation learning. In: ICLR. OpenReview.net, Addis Ababa (2019)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 139–156. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_9
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NIPS, pp. 9912–9924. MIT Press, Cambridge (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR, California (2020)
Chen, T., Zhai, X., Ritter, M., Lucic, M., Houlsby, N.: Self-supervised gans via auxiliary rotation loss. In: CVPR, pp. 12154–12163. IEEE Press, California (2019)
Chen, X., He, K.: Exploring simple Siamese representation learning. In: CVPR, pp. 15750–15758. IEEE Press (2021)
Cubuk, E.D., Sendek, A.D., Reed, E.J.: Screening billions of candidates for solid lithium-ion conductors: a transfer learning approach for small data. J. Chem. Phys. 150(21), 214701 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255. IEEE Press, Georgia (2009)
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV, pp. 1422–1430. IEEE Press, Santiago (2015)
Feng, Z., Xu, C., Tao, D.: Self-supervised representation learning by rotation feature decoupling. In: CVPR, pp. 10364–10374. IEEE Press, California (2019)
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331 (2018)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR. OpenReview.net, British Columbia (2018)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448. IEEE Press, Santiago (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587. IEEE Press, Ohio (2014)
Grambow, C.A., Li, Y.P., Green, W.H.: Accurate thermochemistry with small data sets: a bond additivity correction and transfer learning approach. J. Phys. Chem. A 123(27), 5826–5835 (2019)
Grill, J.B., et al.: Bootstrap your own latent - a new approach to self-supervised learning. In: NIPS, pp. 21271–21284. MIT Press, Cambridge (2020)
Harris, C.G., Stephens, M., et al.: A combined corner and edge detector. In: AVC, pp. 1–6. Alvey Vision Club, Manchester (1988)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738. IEEE Press, Seattle (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Press, Nevada (2016)
Hu, G., Peng, X., Yang, Y., Hospedales, T.M., Verbeek, J.: Frankenstein: learning deep face representations using small data. IEEE Trans. Image Process. 27(1), 293–303 (2017)
Inoue, H.: Data augmentation by pairing samples for images classification (2018)
Jenni, S., Jin, H., Favaro, P.: Steering self-supervised feature learning beyond local pixel statistics. In: CVPR, pp. 6408–6417. IEEE Press, California (2020)
Jing, L., Yang, X., Liu, J., Tian, Y.: Self-supervised spatiotemporal feature learning via video rotation prediction. arXiv preprint arXiv:1811.11387 (2018)
Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: CVPR, pp. 5010–5019. IEEE Press, Utah (2018)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR, pp. 3128–3137. IEEE Press, Massachusetts (2015)
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: CVPR, pp. 1920–1929. IEEE Press, California (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105. MIT Press, Cambridge (2012)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A lite BERT for self-supervised learning of language representations. In: ICLR. OpenReview.net, Addis Ababa (2020)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, H., Kwon, H.: Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Process. 26(10), 4843–4855 (2017)
Marmanis, D., Schindler, K., Wegner, J.D., Galliani, S., Datcu, M., Stilla, U.: Classification with an edge: improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote. Sens. 135, 158–172 (2018)
Mishkin, D., Sergievskiy, N., Matas, J.: Systematic evaluation of convolution neural network advances on the ImageNet. Comput. Vis. Image Underst. 161, 11–19 (2017)
Ng, H.W., Nguyen, V.D., Vonikakis, V., Winkler, S.: Deep learning for emotion recognition on small datasets using transfer learning. In: ICMI, pp. 443–449. ACM, Seattle (2015)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Noroozi, M., Pirsiavash, H., Favaro, P.: Representation learning by learning to count. In: ICCV, pp. 5898–5906. IEEE Press, Venice (2017)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Rajpurkar, P., et al.: AppendiXNet: deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Sci. Rep. 10(1), 1–7 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99. MIT Press, Cambridge (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shijie, J., Ping, W., Peiyi, J., Siping, H.: Research on data augmentation for image classification based on convolution neural networks. In: CAC, pp. 4165–4170. IEEE Press, Jinan (2017)
Soekhoe, D., van der Putten, P., Plaat, A.: On the impact of data set size in transfer learning using deep neural networks. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 50–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46349-0_5
Wu, R., Yan, S., Shan, Y., Dang, Q., Sun, G.: Deep image: scaling up image recognition. arXiv preprint arXiv:1501.02876, 7(8) (2015)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742. IEEE Press, Utah (2018)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks. In: NIPS, pp. 3320–3328. MIT Press, Cambridge (2014)
You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS, pp. 487–495. MIT Press, Cambridge (2014)
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)
Acknowledgment
This study is supported by Google Cloud Platform (GCP) Research by providing credit supports to implement all deep learning algorithms related to SimCLR and ImageNet using virtual machines. The author would like to thank J. David Frost, Kevin Tynes, and Russell Strauss for their feedback on the draft.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, H., Roozbahani, M.M. (2021). ScaleNet: An Unsupervised Representation Learning Method for Limited Information. In: Bauckhage, C., Gall, J., Schwing, A. (eds) Pattern Recognition. DAGM GCPR 2021. Lecture Notes in Computer Science(), vol 13024. Springer, Cham. https://doi.org/10.1007/978-3-030-92659-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-92659-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92658-8
Online ISBN: 978-3-030-92659-5
eBook Packages: Computer ScienceComputer Science (R0)