DEEPAM: Toward Deeper Attention Module in Residual Convolutional Neural Networks

Shanshan Zhong¹¹,
Wushao Wen¹¹,
Jinghui Qin¹² &
…
Zhongzhan Huang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15016))

Included in the following conference series:

International Conference on Artificial Neural Networks

413 Accesses

Abstract

The efficacy of depth in boosting the performance of residual convolutional neural networks (CNNs) has been well-established through abundant empirical or theoretical evidences. However, despite the attention module (AM) being a crucial component for high-performance CNNs, most existing research primarily focuses on their structural design, overlooking a direct investigation into the impact of AM depth on performance. Therefore, in this paper, we explore the influence of AM depth under various settings in detail. We observe that (1) appropriately increasing AM depth significantly boosts performance; (2) deepening AM exhibits a higher cost-effectiveness compared to traditional backbone deepening. However, deepening AM introduces inherent challenges in terms of parameter and inference cost. To mitigate them while enjoying the benefit of deepening AM, we propose a novel AM called DEEPAM, leveraging mechanisms from recurrent neural networks and the design of lightweight AMs. Extensive experiments on widely-used benchmarks and popular attention networks validate the effectiveness of our proposed DEEPAM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Behera, A., Wharton, Z., Hewage, P.R., Bera, A.: Context-aware attentional pooling (cap) for fine-grained visual classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 929–937 (2021)
Google Scholar
Brahma, P.P., Wu, D., She, Y.: Why deep learning works: a manifold disentanglement perspective. TNNLS 27(10), 1997–2008 (2015)
Google Scholar
Chaudhari, S., Mithal, V., Polatkan, G., Ramanath, R.: An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. (TIST) 12(5), 1–32 (2021)
Google Scholar
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: JMLR Workshop, pp. 215–223 (2011)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Gao, H., Pei, J., Huang, H.: Conditional random field enhanced graph convolutional neural networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 276–284 (2019)
Google Scholar
Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: CVPR, pp. 3024–3033 (2019)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Google Scholar
Guo, J., Ma, X., et al: Spanet: spatial pyramid attention network for enhanced image recognition. In: ICME, pp. 1–6. IEEE (2020)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, W., Huang, Z., Liang, M., Liang, S., Yang, H.: Blending pruning criteria for convolutional neural networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds.) ICANN 2021. LNCS, vol. 12894, pp. 3–15. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86380-7_1
Chapter Google Scholar
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory, vol. 9, pp. 1735–1780. MIT Press (1997)
Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. PNAS 79(8), 2554–2558 (1982)
Article MathSciNet Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Google Scholar
Huang, Z., Liang, M., Qin, J., Zhong, S., Lin, L.: Understanding self-attention mechanism via dynamical system perspective. In: ICCV, pp. 1412–1422 (2023)
Google Scholar
Huang, Z., Liang, S., Liang, M., He, W., Yang, H., Lin, L.: The lottery ticket hypothesis for self-attention in convolutional neural network. arXiv preprint arXiv:2207.07858 (2022)
Huang, Z., Liang, S., Liang, M., Yang, H.: Dianet: dense-and-implicit attention network. In: AAAI, pp. 4206–4214 (2020)
Google Scholar
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Convolution-weight-distribution assumption: rethinking the criteria of channel pruning. arXiv preprint arXiv:2004.11627 (2020)
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Rethinking the pruning criteria for convolutional neural network. Adv. Neural. Inf. Process. Syst. 34, 16305–16318 (2021)
Google Scholar
Huang, Z., Zhou, P., Yan, S., Lin, L.: Scalelong: towards more stable training of diffusion model via scaling network long skip connection. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456. PMLR (2015)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lee, H., Kim, H.E., Nam, H.: SRM: a style-based recalibration module for convolutional neural networks. In: ICCV, pp. 1854–1862 (2019)
Google Scholar
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2018)
Google Scholar
Liang, M., Zhou, J., Wei, W., Wu, Y.: Balancing between forgetting and acquisition in incremental subpopulation learning. In: European Conference on Computer Vision, pp. 364–380. Springer (2022)
Google Scholar
Liang, S., Huang, Z., Liang, M., Yang, H.: Instance enhancement batch normalization: an adaptive regulator of batch noise. In: AAAI, vol. 34, pp. 4819–4827 (2020)
Google Scholar
Liang, S., Huang, Z., Liang, M., Yang, H.: Instance enhancement batch normalization: an adaptive regulator of batch noise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4819–4827 (2020)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: ECCV, pp. 740–755. Springer (2014)
Google Scholar
Papoulis, A.: Probability and Statistics. Prentice-Hall, Inc. (1990)
Google Scholar
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: ICCV, pp. 783–792 (2021)
Google Scholar
Russakovsky, O., Deng, J., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Google Scholar
Shen, Z., Liu, Z., Xing, E.: Sliced recursive transformer. In: ECCV, pp. 727–744. Springer (2022)
Google Scholar
44 Shi, C., Ni, H., Li, K., Han, S., Liang, M., Min, M.R.: Exploring compositional visual generation with latent classifier guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 853–862 (2023)
Google Scholar
Sun, S., Chen, W., Wang, L., Liu, X., Liu, T.Y.: On the depth of deep neural networks: a theoretical view. In: AAAI, vol. 30 (2016)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv (2016)
Google Scholar
Wang, J., Chen, Y., Yu, S.X., Cheung, B., LeCun, Y.: Compact and optimal deep learning with recurrent parameter generators. In: WACV, pp. 3900–3910 (2023)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: CVPR (2020)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Google Scholar
Wu, Y., He, K.: Group normalization. In: ECCV, pp. 3–19 (2018)
Google Scholar
Xie, S., Liu, S., Chen, Z., Tu, Z.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2018)
Google Scholar
Xing, X., Liang, M., Wu, Y.: Toa: Task-oriented active VGA. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)
Google Scholar
Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., Wang, J.: Ocnet: object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Google Scholar
Zhang, J., et al.: Minivit: compressing vision transformers with weight multiplexing. In: CVPR, pp. 12145–12154 (2022)
Google Scholar
Zhang, Q.L., Yang, Y.B.: Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP, pp. 2235–2239. IEEE (2021)
Google Scholar
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: ECCV, pp. 286–301 (2018)
Google Scholar
Zhong, S., Huang, Z., Wen, W., Qin, J., Lin, L.: ASR: attention-alike structural re-parameterization. arXiv (2023)
Google Scholar
Zhong, S., Huang, Z., Wen, W., Yang, Z., Qin, J.: ESA: excitation-switchable attention for convolutional neural networks. Neurocomputing 557, 126706 (2023)
Google Scholar
Zhong, S., Wen, W., Qin, J.: SPEM: self-adaptive pooling enhanced attention module for image recognition. In: Dang-Nguyen, D.-T., et al. (eds.) MMM 2023, Part II, pp. 41–53. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27818-1_4
Zhong, S., Wen, W., Qin, J., Chen, Q., Huang, Z.: LSAS: lightweight sub-attention strategy for alleviating attention bias problem. arXiv (2023)
Google Scholar
Zhu, W., Yeh, W., Chen, J., Chen, D., Lin, Y.: Evolutionary convolutional neural networks using ABC. In: ICMLC 2019: International Conference on Machine Learning and Computing (2019)
Google Scholar

Download references

Acknowledgments

This work was partly supported by the National Natural Science Foundation of China under Grant No. 62206314, No. 623B2099, and No. U1711264, GuangDong Basic and Applied Basic Research Foundation under Grant No. 2022A1515011835, Science and Technology Projects in Guangzhou under Grant No. 2024A04J4388.

Author information

Authors and Affiliations

Sun Yat-sen University, Guangzhou, China
Shanshan Zhong, Wushao Wen & Zhongzhan Huang
Guangdong University of Technology, Guangzhou, China
Jinghui Qin

Authors

Shanshan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Wushao Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jinghui Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wushao Wen .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, S., Wen, W., Qin, J., Huang, Z. (2024). DEEPAM: Toward Deeper Attention Module in Residual Convolutional Neural Networks. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-72332-2_26
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72331-5
Online ISBN: 978-3-031-72332-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics