DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection

Gaojie Li ORCID: orcid.org/0000-0002-8876-2573¹,
Fei Xu¹,
He Li¹,
Yaoxuan Yuan¹ &
…
Mingshou An¹

288 Accesses
1 Altmetric
Explore all metrics

Abstract

The traditional object detection model based on convolutional neural network contains a large amount of parameters, so it has poor performance when applied to high-real-time and high-precision scenes. To solve the problem, this paper proposes a deep recurrent attention object detection dynamic model (DRA-ODM). The model based on time domain attention mechanism is constructed by using recurrent neural network and dynamic sampling point mechanism, it simulates the way that human-eyes ignore irrelevant information and pay attention to key information when observing things. DRA-ODM model completes object detection on the premise of extracting only part of image features. In addition, this paper visualizes the position in each sampling, thus, it is convenient to observe the position of sampling points during each cycle. Extensive experiment results demonstrate that DRA-ODM model achieves object detection within 5 time steps using about 20 M parameters, its average accuracy could reach 87. 4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Object Detection Based on Convolutional Block Attention Module

An improved deep learning-based optimal object detection system from images

Article Open access 15 September 2023

Multi-scale coupled attention for visual object detection

Article Open access 16 May 2024

References

Keqi, C., Zhiliang, Z., Xiaoming, D., Cuixia, Ma., Hongan, W.: A review of deep learning for multi-scale object detection[J]. Journal of Software 32(04), 1201–1227 (2021)
MATH Google Scholar
LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]//Proceedings of the IEEE, 1998, 86(11): 2278–2324.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]//Advances in neural information processing systems, 2015, 28: 91–99.
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.
Zhou, Q, Wang, J, Liu, J. RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network.
Mobile Netw Appl 26, 77–87 (2021).
Najibi M, Rastegari M, Davis L S. G-cnn: an iterative grid based object detector[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2369–2377.
Deng, S., Zhao, H., Fang, W., et al.: Edge intelligence: The confluence of edge computing and artificial intelligence[J]. IEEE Internet Things J. 7(8), 7457–7469 (2020)
Article Google Scholar
Itti, L., Koch, C.: Computational modelling of visual attention[J]. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529–533.
Mathe S, Pirinen A, Sminchisescu C. Reinforcement learning for visual object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2894–2902.
Sorokin I, Seleznev A, Pavlov M, et al. Deep attention recurrent Q-network[J]. arXiv preprint arXiv:1512.01693, 2015.
Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Advances in neural information processing systems. 2014: 2204–2212.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
Li X, Wang W, Hu X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 510–519.
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
Jie, L.V.: Visual Attentional Network and Learning Method for Object Search and Recognition[J]. Journal of Mechanical Engineering 55(11), 123 (2019)
Article Google Scholar
Shim D, Kim H J. Gaussian RAM: Lightweight Image Classification via Stochastic Retina-Inspired Glimpse and Reinforcement Learning[C]//2020 20th International Conference on Control, Automation and Systems (ICCAS). IEEE, 2020: 155–160.
Huang Y, Gu C, Wu K, et al. Parallel Search by Reinforcement Learning for Object Detection[C]//Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, Cham, 2018: 272–283.
Liu, S., Huang, D., Wang, Y.: Pay attention to them: deep reinforcement learning-based cascade object detection[J]. IEEE transactions on neural networks and learning systems 31(7), 2544–2556 (2019)
Google Scholar
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. 2009.
Everingham, M., Van Gool, L., Williams, C.K.I., et al.: The pascal visual object classes (voc) challenge[J]. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Lu Y, Javidi T, Lazebnik S. Adaptive object detection using adjacency and zoom prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2351–2359.
Lu, H., Li, Y., Mu, S., et al.: Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning[J]. IEEE Internet Things J. (2018). https://doi.org/10.1109/JIOT.2017.2737479,5(4),pp.2315-2322
Article Google Scholar
Lu, H., Zhang, Y., Li, Y., et al.: User-Oriented Virtual Mobile Network Resource Management for Vehicle Communications[J]. IEEE Trans. Intell. Transp. Syst. (2020). https://doi.org/10.1109/TITS.2020.2991766
Article Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Project of Shaanxi Education Department (18JK0399).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Xi’an Technological University, 710021, Xi’an, People’s Republic of China
Gaojie Li, Fei Xu, He Li, Yaoxuan Yuan & Mingshou An

Authors

Gaojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Fei Xu
View author publications
You can also search for this author in PubMed Google Scholar
He Li
View author publications
You can also search for this author in PubMed Google Scholar
Yaoxuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Mingshou An
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Emerging Blockchain Applications and Technology

Guest Editors: Huimin Lu, Xing Xu, Jože Guna, and Gautam Srivastava

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Xu, F., Li, H. et al. DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection. World Wide Web 25, 1625–1648 (2022). https://doi.org/10.1007/s11280-021-00971-7

Download citation

Received: 07 May 2021
Revised: 07 September 2021
Accepted: 18 October 2021
Published: 30 November 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11280-021-00971-7

DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Object Detection Based on Convolutional Block Attention Module

An improved deep learning-based optimal object detection system from images

Multi-scale coupled attention for visual object detection

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Object Detection Based on Convolutional Block Attention Module

An improved deep learning-based optimal object detection system from images

Multi-scale coupled attention for visual object detection

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now