Abstract
The traditional object detection model based on convolutional neural network contains a large amount of parameters, so it has poor performance when applied to high-real-time and high-precision scenes. To solve the problem, this paper proposes a deep recurrent attention object detection dynamic model (DRA-ODM). The model based on time domain attention mechanism is constructed by using recurrent neural network and dynamic sampling point mechanism, it simulates the way that human-eyes ignore irrelevant information and pay attention to key information when observing things. DRA-ODM model completes object detection on the premise of extracting only part of image features. In addition, this paper visualizes the position in each sampling, thus, it is convenient to observe the position of sampling points during each cycle. Extensive experiment results demonstrate that DRA-ODM model achieves object detection within 5 time steps using about 20 M parameters, its average accuracy could reach 87. 4%.
Similar content being viewed by others
References
Keqi, C., Zhiliang, Z., Xiaoming, D., Cuixia, Ma., Hongan, W.: A review of deep learning for multi-scale object detection[J]. Journal of Software 32(04), 1201–1227 (2021)
LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]//Proceedings of the IEEE, 1998, 86(11): 2278–2324.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]//Advances in neural information processing systems, 2015, 28: 91–99.
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.
Zhou, Q, Wang, J, Liu, J. RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network.
Mobile Netw Appl 26, 77–87 (2021).
Najibi M, Rastegari M, Davis L S. G-cnn: an iterative grid based object detector[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2369–2377.
Deng, S., Zhao, H., Fang, W., et al.: Edge intelligence: The confluence of edge computing and artificial intelligence[J]. IEEE Internet Things J. 7(8), 7457–7469 (2020)
Itti, L., Koch, C.: Computational modelling of visual attention[J]. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529–533.
Mathe S, Pirinen A, Sminchisescu C. Reinforcement learning for visual object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2894–2902.
Sorokin I, Seleznev A, Pavlov M, et al. Deep attention recurrent Q-network[J]. arXiv preprint arXiv:1512.01693, 2015.
Mnih V, Heess N, Graves A. Recurrent models of visual attention[C]//Advances in neural information processing systems. 2014: 2204–2212.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
Li X, Wang W, Hu X, et al. Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 510–519.
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
Jie, L.V.: Visual Attentional Network and Learning Method for Object Search and Recognition[J]. Journal of Mechanical Engineering 55(11), 123 (2019)
Shim D, Kim H J. Gaussian RAM: Lightweight Image Classification via Stochastic Retina-Inspired Glimpse and Reinforcement Learning[C]//2020 20th International Conference on Control, Automation and Systems (ICCAS). IEEE, 2020: 155–160.
Huang Y, Gu C, Wu K, et al. Parallel Search by Reinforcement Learning for Object Detection[C]//Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, Cham, 2018: 272–283.
Liu, S., Huang, D., Wang, Y.: Pay attention to them: deep reinforcement learning-based cascade object detection[J]. IEEE transactions on neural networks and learning systems 31(7), 2544–2556 (2019)
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[J]. 2009.
Everingham, M., Van Gool, L., Williams, C.K.I., et al.: The pascal visual object classes (voc) challenge[J]. Int. J. Comput. Vision 88(2), 303–338 (2010)
Lu Y, Javidi T, Lazebnik S. Adaptive object detection using adjacency and zoom prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2351–2359.
Lu, H., Li, Y., Mu, S., et al.: Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning[J]. IEEE Internet Things J. (2018). https://doi.org/10.1109/JIOT.2017.2737479,5(4),pp.2315-2322
Lu, H., Zhang, Y., Li, Y., et al.: User-Oriented Virtual Mobile Network Resource Management for Vehicle Communications[J]. IEEE Trans. Intell. Transp. Syst. (2020). https://doi.org/10.1109/TITS.2020.2991766
Acknowledgements
This work was supported by Natural Science Project of Shaanxi Education Department (18JK0399).
Author information
Authors and Affiliations
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Emerging Blockchain Applications and Technology
Guest Editors: Huimin Lu, Xing Xu, Jože Guna, and Gautam Srivastava
Rights and permissions
About this article
Cite this article
Li, G., Xu, F., Li, H. et al. DRA-ODM: a faster and more accurate deep recurrent attention dynamic model for object detection. World Wide Web 25, 1625–1648 (2022). https://doi.org/10.1007/s11280-021-00971-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00971-7