Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

3677 Accesses
86 Citations

Abstract

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years. But applying ILSVRC’s annual champion directly to fine-grained visual categorization (FGVC) tasks does not achieve good performance. To FGVC tasks, the small inter-class variations and the large intra-class variations make it a challenging problem. Our attention object location module (AOLM) can predict the position of the object and attention part proposal module (APPM) can propose informative part regions without the need of bounding-box or part annotations. The obtained object images not only contain almost the entire structure of the object, but also contains more details, part images have many different scales and more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our multi-branch network. Therefore, our multi-branch and multi-scale learning network(MMAL-Net) has good classification ability and robustness for images of different scales. Our approach can be trained end-to-end, while provides short inference time. Through the comprehensive experiments demonstrate that our approach can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets. Our code will be available at https://github.com/ZF1044404254/MMAL-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Disentangled Representation for Fine-Grained Visual Categorization

Domain Adaptive Transfer Learning on Visual Attention Aware Data Augmentation for Fine-Grained Visual Categorization

Weakly Supervised Fine-Grained Visual Recognition via Adversarial Complementary Attentions and Hierarchical Bilinear Pooling

References

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: CVPR, pp. 1449–1457 (2015)
Google Scholar
Zhang, H., et al.: SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1143–1152 (2016)
Google Scholar
Wei, X.S., Xie, C.W., Wu, J.: Mask-CNN: localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)
Lam, M., Mahasseni, B., Todorovic, S.: Fine-grained recognition as HSnet search for informative image parts. In: CVPR, pp. 2520–2529 (2017)
Google Scholar
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Article Google Scholar
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV, pp. 5209–5217 (2017)
Google Scholar
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: ECCV, pp. 420–435 (2018)
Google Scholar
Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: CVPR, pp. 5012–5021 (2019)
Google Scholar
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR, pp. 4438–4446 (2017)
Google Scholar
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, pp. 317–326 (2016)
Google Scholar
Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR, pp. 2921–2930 (2017)
Google Scholar
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: CVPR, pp. 1325–1334 (2018)
Google Scholar
Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: CVPR, pp. 2219–2228 (2019)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: ECCV, pp. 805–821 (2018)
Google Scholar
Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: ECCV, pp. 70–86 (2018)
Google Scholar
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 574–589 (2018)
Google Scholar
Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a CNN for fine-grained recognition. In: CVPR, pp. 4148–4157 (2018)
Google Scholar
Zhang, Y., Tang, H., Jia, K.: Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 233–248 (2018)
Google Scholar
Chen, S., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: CVPR, pp. 5157–5166 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of MEIE, China University of Mining and Technology (Beijing), Beijing, China
Fan Zhang, Meng Li, Guisheng Zhai & Yizhao Liu

Authors

Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Li
View author publications
You can also search for this author in PubMed Google Scholar
Guisheng Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Yizhao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Zhang .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, F., Li, M., Zhai, G., Liu, Y. (2021). Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_12
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics