Abstract
Object detection is one of the most challenging problems in the field of computer vision, the practicality of object detection requires accuracy and real-time. YOLOv3 is a good real-time object detection algorithm, but with insufficient recall rate and insufficient positioning accuracy. The Attention Mechanism in deep learning is similar to the attention mechanism of human vision, which is to focus attention on important points in many information, select key information, and ignore other unimportant information. In this paper, we integrate Convolutional Block Attention Module (CBAM) in YOLOv3 in order to improves the detection accuracy and keep real-time. Compared to a conventional YOLOv3, we experimentally show the effectiveness and accuracy of the proposed method on the PASCAL VOC and MS-COCO datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1367–1381 (2018)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE (2017)
Kang, K., et al.: T-CNN: tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2896–2907 (2018)
Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Shen, Z., Liu, Z., Li, J., et al.: DSOD: learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision, pp. 1919–1927 (2017)
Huang, G., Liu, Z., et al.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition, pp. 6517–6525. IEEE (2017)
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al.: Microsoft COCO: common objects in context. In: ECCV, pp. 740–755 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision (ECCV) (2014)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2016)
Acknowledgement
This research was supported by the National Natural Science Foundation of China (Nos. 61672203, 61976079 & U1836102) and Anhui Natural Science Funds for Distinguished Young Scholar (No. 170808J08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ban, MY., Tian, WD., Zhao, ZQ. (2020). Real-Time Object Detection Based on Convolutional Block Attention Module. In: Huang, DS., Premaratne, P. (eds) Intelligent Computing Methodologies. ICIC 2020. Lecture Notes in Computer Science(), vol 12465. Springer, Cham. https://doi.org/10.1007/978-3-030-60796-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-60796-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60795-1
Online ISBN: 978-3-030-60796-8
eBook Packages: Computer ScienceComputer Science (R0)