Abstract
Multi-scale vehicle detection is an important application in the field of object detection, and Feature Pyramid Network (FPN) is an important means to deal with multi-scale object detection tasks. However, baseline method is the common method used in most of the existing network structure, which represents the input image information by selecting one from the output layer of FPN, and discard other layers. This not only limits the performance of the network structure, but also performs poorly when dealing with the problem of excessive scale differences. To solve this problem, a novelty candidate region aggregation network (CRAN) is proposed in this paper. The candidate regions of different feature layers are effectively aggregated to improve the network generalization performance. Specifically, calculate the similarity between different feature layers through a feature quality score module, and use this as a quantity factor to determine the number of candidate regions reserved for the corresponding feature layer. Finally, they are aggregated into a more comprehensive candidate region group. Further, in order to improve the detection efficiency of small objects, an area cross entropy loss function is proposed. It makes the model pay more attention to small targets by adding a monotonic decrease based on the area. Finally, the proposed CRAN and the area cross entropy loss function are applied to the advanced detectors. The experimental results in the KITTI and UA-DETRAC datasets show that this method has good performance on vehicle objects in different scenarios, and can meet the requirements of practical application.
Similar content being viewed by others
References
Tian Y, Du Y, Zhang Q, et al. (2020) Depth estimation for advancing intelligent transport systems based on self-improving pyramid stereo network. Inst Eng Technol 14(5):338–345. https://doi.org/10.1049/iet-its.2019.0462
Liu W, Liao S, Hu W (2019) Towards accurate tiny vehicle detection in complex scenes. Neurocomputing 347:24–33
Girshick R, Donahue J, DaT Tell T, Malik J. Rich feature hierarchies for accurate object detection and Semantic segmentation //Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 580–587[DOI:10.1109/ CVPR.2014.81]
Girshick R, landola F, Darrell T, Malik J. Deformahle part models are convolutional neural networks// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 437–446 [DOI: 10. 1109/ CVPR.2015.7298641]
Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks//. Adv Neural Inf Process Syst, IEEE. https://doi.org/10.1109/TPAMI.2016.2577031
Dai J, Li Y, He K, Sun J. R-FCN: Object detection via region-based fully convolutional networks. arXiv preprint https://arXiv.org/1605.06409, 2016
Uijlings JRR, van de Sande KEA, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings IEEE Conference Comput. Vis. Pattern Recognit. (CVPR), pp 779–788
Liu W et al. (2016) SSD: single shot MultiBox detector. In: Proceedings Eur. Conf. Comput. Vis. pp 21–37
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings IEEE Int. Conf. Comput. Vis. pp 2980–2988
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement”, arXiv: 180402767 Cs
Fu CY, Liu W, Ranga A et al. (2017) “DSSD: Deconvolutional single shot detector,”. [Online]. Available: https://arxiv.org/1701.06659
He K, Gkioxari G, Doll´ar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Lin TY, Doll´ar P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp2117–2125
Rossi L, Karimi A, Prati A (2021)A novel region of interest extraction layer for instance segmentation. Comput Vis Pat Recog. https://arxiv.org/2004.13665v2
Farahani G (2017) Dynamic and robust method for detection and locating vehicles in the video images sequences with use of image processing algorithm[J]. Springer International Publishing,(1)
Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection[C]//International Conference on Image Processing, pp 900–903
Lowe DG (2004) Distinctive image features from scale-invariant keypoints[J]. Int J Comput Vis 60(2):91–110
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection[C]//. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition pp 886–893
Felzenszwalb PF, Girshick RB, McAllester D et al (2009) Object detection with discriminatively trained part-based models[J]. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Hong-Peng YIN, Bo CHEN, Yi CHAI et al (2016) Vision-based object detection and tracking: a review [J]. Acta Autom Sin 42(10):1466–1489
Ciresan DC, Meier U, Masci J, et al. (2011) High-performance neural networks for visual object classification [J]. arXiv: 1102. 0183
Everingham M, Eslami SMA, Van Gool L et al (2014) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111:98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Cai Z, Vasconcelos N (2019) Cascade r-cnn: high quality object detection and instance segmentation. In: IEEE transactions on pattern analysis and machine intelligence, pp 1483–1498. https://doi.org/10.1109/TPAMI.2019.2956516
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 845–853
Lee H, Eum S, Kwon H (2017) In: ME R-CNN: multi-expert region-based CNN for object detection. https://arxiv.org/abs/1704.01069v1
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 761–769
Wang X, Shrivastava A, Gupta A (2017) A-fast-RCNN: hard positive generation via adversary for object detection, in: IEEE Conference on Computer Vision and Pattern Recognition, pp 3039–3048
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic seg- mentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1134–1142
Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster R-CNN. In: Proceedings of the European Conference on Computer Vision, Springer, pp 330–348
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog, pp 6517–6525
Lee W-J, Kim DW, Kang T-K, Lim M-T (2018) Convolution neural network with selective multi-stage feature fusion: case study on vehicle rear detection. Appl Sci 8:2468. https://doi.org/10.3390/app8122468
Pae DS, Choi IH, Kang TK et al (2018) Vehicle detection framework for challenging lighting driving environment based on feature fusion method using adaptive neuro-fuzzy inference system. Int J Adv Robot Syst. https://doi.org/10.1177/1729881418770545
Guo Y, Xu Y, Li S (2020) Dense construction vehicle detection based on orientation-aware feature fusion convolutional neural network[J]. Autom Constr 112
Wang P, Sun X, Diao W, Fu K (2020) FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Trans Geosci Remote Sens 58(5):3377–3390
Gu Y, Wang B, Xu B (2018) A FPN-based framework for vehicle detection in aerial images. In: ICVIP 2018: Proceedings of the 2018 the 2nd international conference on video and image processing, pp 60–64. https://doi.org/10.1145/3301506.3301531
Weymar M, LW A, Hman A, et al (2011) The face is more than its parts--brain dynamics of enhanced spatial attention to schematic threat. Neuroimage 58(3):946-954
Wen L, Du D, Cai Z, et al. (2015) UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite.In: Proceedings of the 2012 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR)„IEEE, pp 3354–3361
Acknowledgements
This work was supported by the Nondestructive Detection and Monitoring Technology for High Speed Transportation Facilities, Key Laboratory of Ministry of Industry and Information Technology, and the Fundamental Research Funds for the Central Universities, NO.NJ2020014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, L., Wang, H., Wang, X. et al. Vehicle object detection method based on candidate region aggregation. Pattern Anal Applic 24, 1635–1647 (2021). https://doi.org/10.1007/s10044-021-01009-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01009-4