Abstract
The efficacy of depth in boosting the performance of residual convolutional neural networks (CNNs) has been well-established through abundant empirical or theoretical evidences. However, despite the attention module (AM) being a crucial component for high-performance CNNs, most existing research primarily focuses on their structural design, overlooking a direct investigation into the impact of AM depth on performance. Therefore, in this paper, we explore the influence of AM depth under various settings in detail. We observe that (1) appropriately increasing AM depth significantly boosts performance; (2) deepening AM exhibits a higher cost-effectiveness compared to traditional backbone deepening. However, deepening AM introduces inherent challenges in terms of parameter and inference cost. To mitigate them while enjoying the benefit of deepening AM, we propose a novel AM called DEEPAM, leveraging mechanisms from recurrent neural networks and the design of lightweight AMs. Extensive experiments on widely-used benchmarks and popular attention networks validate the effectiveness of our proposed DEEPAM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Behera, A., Wharton, Z., Hewage, P.R., Bera, A.: Context-aware attentional pooling (cap) for fine-grained visual classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 929–937 (2021)
Brahma, P.P., Wu, D., She, Y.: Why deep learning works: a manifold disentanglement perspective. TNNLS 27(10), 1997–2008 (2015)
Chaudhari, S., Mithal, V., Polatkan, G., Ramanath, R.: An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. (TIST) 12(5), 1–32 (2021)
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: JMLR Workshop, pp. 215–223 (2011)
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Fu, J., et al.: Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Gao, H., Pei, J., Huang, H.: Conditional random field enhanced graph convolutional neural networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 276–284 (2019)
Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: CVPR, pp. 3024–3033 (2019)
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Guo, J., Ma, X., et al: Spanet: spatial pyramid attention network for enhanced image recognition. In: ICME, pp. 1–6. IEEE (2020)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
He, W., Huang, Z., Liang, M., Liang, S., Yang, H.: Blending pruning criteria for convolutional neural networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds.) ICANN 2021. LNCS, vol. 12894, pp. 3–15. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86380-7_1
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory, vol. 9, pp. 1735–1780. MIT Press (1997)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. PNAS 79(8), 2554–2558 (1982)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Huang, Z., Liang, M., Qin, J., Zhong, S., Lin, L.: Understanding self-attention mechanism via dynamical system perspective. In: ICCV, pp. 1412–1422 (2023)
Huang, Z., Liang, S., Liang, M., He, W., Yang, H., Lin, L.: The lottery ticket hypothesis for self-attention in convolutional neural network. arXiv preprint arXiv:2207.07858 (2022)
Huang, Z., Liang, S., Liang, M., Yang, H.: Dianet: dense-and-implicit attention network. In: AAAI, pp. 4206–4214 (2020)
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Convolution-weight-distribution assumption: rethinking the criteria of channel pruning. arXiv preprint arXiv:2004.11627 (2020)
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Rethinking the pruning criteria for convolutional neural network. Adv. Neural. Inf. Process. Syst. 34, 16305–16318 (2021)
Huang, Z., Zhou, P., Yan, S., Lin, L.: Scalelong: towards more stable training of diffusion model via scaling network long skip connection. Adv. Neural Inf. Process. Syst. 36 (2024)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456. PMLR (2015)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Lee, H., Kim, H.E., Nam, H.: SRM: a style-based recalibration module for convolutional neural networks. In: ICCV, pp. 1854–1862 (2019)
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2018)
Liang, M., Zhou, J., Wei, W., Wu, Y.: Balancing between forgetting and acquisition in incremental subpopulation learning. In: European Conference on Computer Vision, pp. 364–380. Springer (2022)
Liang, S., Huang, Z., Liang, M., Yang, H.: Instance enhancement batch normalization: an adaptive regulator of batch noise. In: AAAI, vol. 34, pp. 4819–4827 (2020)
Liang, S., Huang, Z., Liang, M., Yang, H.: Instance enhancement batch normalization: an adaptive regulator of batch noise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4819–4827 (2020)
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: ECCV, pp. 740–755. Springer (2014)
Papoulis, A.: Probability and Statistics. Prentice-Hall, Inc. (1990)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: ICCV, pp. 783–792 (2021)
Russakovsky, O., Deng, J., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Shen, Z., Liu, Z., Xing, E.: Sliced recursive transformer. In: ECCV, pp. 727–744. Springer (2022)
44 Shi, C., Ni, H., Li, K., Han, S., Liang, M., Min, M.R.: Exploring compositional visual generation with latent classifier guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 853–862 (2023)
Sun, S., Chen, W., Wang, L., Liu, X., Liu, T.Y.: On the depth of deep neural networks: a theoretical view. In: AAAI, vol. 30 (2016)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv (2016)
Wang, J., Chen, Y., Yu, S.X., Cheung, B., LeCun, Y.: Compact and optimal deep learning with recurrent parameter generators. In: WACV, pp. 3900–3910 (2023)
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: CVPR (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Wu, Y., He, K.: Group normalization. In: ECCV, pp. 3–19 (2018)
Xie, S., Liu, S., Chen, Z., Tu, Z.: Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2018)
Xing, X., Liang, M., Wu, Y.: Toa: Task-oriented active VGA. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)
Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., Wang, J.: Ocnet: object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Zhang, J., et al.: Minivit: compressing vision transformers with weight multiplexing. In: CVPR, pp. 12145–12154 (2022)
Zhang, Q.L., Yang, Y.B.: Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP, pp. 2235–2239. IEEE (2021)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: ECCV, pp. 286–301 (2018)
Zhong, S., Huang, Z., Wen, W., Qin, J., Lin, L.: ASR: attention-alike structural re-parameterization. arXiv (2023)
Zhong, S., Huang, Z., Wen, W., Yang, Z., Qin, J.: ESA: excitation-switchable attention for convolutional neural networks. Neurocomputing 557, 126706 (2023)
Zhong, S., Wen, W., Qin, J.: SPEM: self-adaptive pooling enhanced attention module for image recognition. In: Dang-Nguyen, D.-T., et al. (eds.) MMM 2023, Part II, pp. 41–53. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27818-1_4
Zhong, S., Wen, W., Qin, J., Chen, Q., Huang, Z.: LSAS: lightweight sub-attention strategy for alleviating attention bias problem. arXiv (2023)
Zhu, W., Yeh, W., Chen, J., Chen, D., Lin, Y.: Evolutionary convolutional neural networks using ABC. In: ICMLC 2019: International Conference on Machine Learning and Computing (2019)
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China under Grant No. 62206314, No. 623B2099, and No. U1711264, GuangDong Basic and Applied Basic Research Foundation under Grant No. 2022A1515011835, Science and Technology Projects in Guangzhou under Grant No. 2024A04J4388.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhong, S., Wen, W., Qin, J., Huang, Z. (2024). DEEPAM: Toward Deeper Attention Module in Residual Convolutional Neural Networks. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-72332-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72331-5
Online ISBN: 978-3-031-72332-2
eBook Packages: Computer ScienceComputer Science (R0)