Abstract
The improvement from region-level to pixel-level and fewer hyper-parameters make anchor-free detectors popular. Most anchor-free algorithms will set a center-ness branch to reduce prediction points far away from the center of the target, which will indirectly weaken the more important features of the head in the pedestrian dataset. However, in a dense crowd, the head features of humans are critical to alleviating the problem of occlusion. In order to alleviate this problem, we have counted the characteristics of the target scale of a dense pedestrian dataset and introduced a Double Parallel Branches FCOS(DPB-FCOS) detector method. Based on the original prediction branch, we add a head branch to generate additional prediction boxes, and redefine the positive sample selection method of this branch, so that it can generate more prediction boxes in the head position of the human body. At the same time, considering the three factors of overlap area, distance, and aspect ratio, we designed a regression loss that is more suitable for anchor-free detectors. The center point distance in DIoU is used instead by the distance between the upper left and lower right corner points, which significantly improves the model’s performance. We verify our method on two popular models. Compared with baseline, FCOS can improve the accuracy by 5.9% and ATSS can improve the accuracy by 3.8% on the CrowdHuman dataset.
Similar content being viewed by others
References
Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection
Bodla N, Singh B, Chellappa R, Davis LS (2017) Improving object detection with one line of code
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Chen Y, Wang L, Li C, Hou Y, Li W (2020) Convnets-based action recognition from skeleton motion maps. Multimedia Tools and Applications, 79(3)
Dai L, Jifeng H, Yi S, Kaiming, Jian (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems 29, pp 379–387
Du X, El-Khamy M, Lee J, Davis LS (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter conference on applications of computer vision (WACV)
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: The IEEE international conference on computer vision (ICCV), pp 6569–6578
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd : Deconvolutional single shot detector coRR
Ge Z, Jie Z, Huang X, Xu R, Yoshie O (2020) Ps-rcnn: Detecting secondary human instances in a crowd via primary object suppression. In: IEEE
Girshick R (2015) Fast r-cnn. In: The IEEE international conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Computer Society
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: The IEEE international conference on computer vision (ICCV), pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Huang Z, Yue K, Deng J, Zhou F (2020) Visible feature guidance for crowd pedestrian detection
Jianan, Li, Xiaodan, Liang, Shengmei, Shen, Tingfa, Xu, Jiashi, Feng (2017) Scale-aware fast r-cnn for pedestrian detection. IEEE Transactions on Multimedia
Jianan, Li, Xiaodan, Liang, Shengmei, Shen, Tingfa, Xu, Jiashi, Feng (2017) Scale-aware fast r-cnn for pedestrian detection. IEEE Transactions on Multimedia
Karen S, Andrew Z (2014) Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: The european conference on computer vision (ECCV), pp 734–750
Leibe B, Matas J, Sebe N, Welling M (2016) [Lecture notes in computer science] computer vision – eccv 2016 volume 9908 —— a unified multi-scale deep convolutional neural network for fast object detection, vol. 10.1007/978-3-319-46493-0, no Chapter 22, 354–370
Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: The IEEE international conference on computer vision (ICCV), pp 2980–2988
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2015) Ssd: Single shot multibox detector. In: The European Conference on Computer Vision (ECCV), pp 21–37
Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Liu W et al (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. Springer, Cham
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18Th international conference on pattern recognition (ICPR’06), vol 3, pp 850–855
Pang C, Wang W, Lan R, Shi Z, Luo X (2020) Bilinear pyramid network for flower species categorization. Multimed Tools Appl 6:1–11
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer vision & pattern recognition
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv e-prints
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Santosh KC, Antani SK (2020) Recent trends in image processing and pattern recognition. Multimed Tools Appl 79(47-48):1–3
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd
Song Q, Yang F, Yang L, Liu C, Xia L (2020) Learning point-guided localization for detection in remote sensing images. J Sel Top Appl Earth Obs Remote Sens, vol PP 99:1–1
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: The IEEE international conference on computer vision (ICCV), pp 9627–9636
Liu W, Liao S, Hu W et al (2017) Denet: Scalable real-time object detection with directed sparse sampling. In: 2017 IEEE International conference on computer vision (ICCV)
Wang X, Chen K, Huang Z, Yao C, Liu W (2017) Point linking network for object detection
Wang S, Cheng J, Liu H, Tang M (2018) Pcn: Part and context information for pedestrian detection with cnns. arXiv
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2017) Repulsion loss: Detecting pedestrians in a crowd
Xiao Y, Tian Z, Yu J, Zhang Y, Lan X (2020) A review of object detection based on deep learning. Multimedia Tools and Applications, (11)
Yang L, Song Q, Wang Z, Hu M, Liu C, Xin X, Jia W, Xu S (2020) Renovating parsing r-cnn for accurate multiple human parsing. In: Proceedings of European Conference on Computer Vision (ECCV)
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. ACM
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR)
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: Detecting pedestrians in a crowd. Springer, Cham
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 4203–4212
Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor r-cnn for human detection in a crowd
Zheng Z, Wang P, Liu W, Li J, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: AAAI Conference on artificial intelligence
Zhou S, Qiu J (2021) Enhanced ssd with interactive multi-scale attention features for object detection. Multimedia Tools and Applications, (1)
Zhou C, Yuan J (2018) Bi-box regression for pedestrian detection and occlusion estimation. Springer, Cham
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, Q., Wang, H., Yang, L. et al. Double parallel branches FCOS for human detection in a crowd. Multimed Tools Appl 81, 15707–15723 (2022). https://doi.org/10.1007/s11042-022-12439-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12439-5