[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3460426.3463588acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

NMS-Loss: Learning with Non-Maximum Suppression for Crowded Pedestrian Detection

Published: 01 September 2021 Publication History

Abstract

Non-Maximum Suppression (NMS) is essential for object detection and affects the evaluation results by incorporating False Positives (FP) and False Negatives (FN), especially in crowd occlusion scenes. In this paper, we raise the problem of weak connection between the training targets and the evaluation metrics caused by NMS and propose a novel NMS-Loss making the NMS procedure can be trained end-to-end without any additional network parameters. Our NMS-Loss punishes two cases when FP is not suppressed and FN is wrongly eliminated by NMS. Specifically, we propose a pull loss to pull predictions with the same target close to each other, and a push loss to push predictions with different targets away from each other. Experimental results show that with the help of NMS-Loss, our detector, namely NMS-Ped, achieves impressive results with Miss Rate of 5.92% on Caltech dataset and 10.08%on CityPersons dataset, which are both better than state-of-the-art competitors.

References

[1]
Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S Davis. 2017. Soft-NMS--Improving Object Detection With One Line of Code. In ICCV. 5561--5569.
[2]
Garrick Brazil and Xiaoming Liu. 2019. Pedestrian Detection with Autoregressive Network Phases. In CVPR. 7231--7240.
[3]
Garrick Brazil, Xi Yin, and Xiaoming Liu. 2017. Illuminating pedestrians via simultaneous detection & segmentation. In ICCV. 4950--4959.
[4]
Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV. Springer, 354--370.
[5]
Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2020 a. Relational learning for joint head and human detection. In AAAI. 10647--10654.
[6]
Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, Xudong Zou, et al. 2020 b. PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes. In AAAI. 10639--10646.
[7]
Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, and Jian Sun. 2020. Detection in Crowded Scenes: One Proposal, Multiple Predictions. In CVPR. 12214--12223.
[8]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, Vol. 20, 3 (1995), 273--297.
[9]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In CVPR. 886--893.
[10]
Piotr Dollár, Ron Appel, Serge Belongie, and Pietro Perona. 2014. Fast feature pyramids for object detection. PAMI, Vol. 36, 8 (2014), 1532--1545.
[11]
Piotr Dollár, Zhuowen Tu, Pietro Perona, and Serge Belongie. 2009. Integral channel features. In BMVC .
[12]
P Dollar, C Wojek, B Schiele, and P Perona. 2009. Pedestrian detection: A benchmark. In CVPR. 304--311.
[13]
Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2011. Pedestrian detection: An evaluation of the state of the art. PAMI, Vol. 34, 4 (2011), 743--761.
[14]
Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, and Larry Davis. 2017. Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection. In WACV. IEEE, 953--961.
[15]
Pedro F Felzenszwalb, Ross B Girshick, and David McAllester. 2010. Cascade object detection with deformable part models. In CVPR. IEEE, 2241--2248.
[16]
Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. PAMI, Vol. 32, 9 (2009), 1627--1645.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[18]
Xin Huang, Zheng Ge, Zequn Jie, and Osamu Yoshie. 2020. NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing. In CVPR. 10750--10759.
[19]
Songtao Liu, Di Huang, and Yunhong Wang. 2019 a. Adaptive NMS: Refining Pedestrian Detection in a Crowd. In CVPR. 6459--6468.
[20]
Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. 2018. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In ECCV. 618--634.
[21]
Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. 2019 b. High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection. In CVPR. 5187--5196.
[22]
Yan Luo, Chongyang Zhang, Muming Zhao, Hao Zhou, and Jun Sun. 2020. Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection. In CVPR. 14065--14073.
[23]
Jiayuan Mao, Tete Xiao, Yuning Jiang, and Zhimin Cao. 2017. What can help pedestrian detection?. In CVPR. 3127--3136.
[24]
Woonhyun Nam, Piotr Dollár, and Joon Hee Han. 2014. Local decorrelation for improved pedestrian detection. In NIPS. 424--432.
[25]
Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. 2019. Mask-Guided Attention Network for Occluded Pedestrian Detection. In ICCV. 4967--4975.
[26]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[27]
Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, and Li Xu. 2017. Accurate single stage detector using recurrent rolling convolution. In CVPR. 5420--5428.
[28]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS. 91--99.
[29]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[30]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In CVPR. I--I.
[31]
Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. 2018. Repulsion loss: Detecting pedestrians in a crowd. In CVPR. 7774--7783.
[32]
Jialian Wu, Chunluan Zhou, Ming Yang, Qian Zhang, Yuan Li, and Junsong Yuan. 2020. Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians. In CVPR. 13430--13439.
[33]
Jiaolong Xu, Sebastian Ramos, David Vázquez, and Antonio M López. 2014. Domain adaptation of deformable part-based models. PAMI, Vol. 36, 12 (2014), 2367--2380.
[34]
Junjie Yan, Zhen Lei, Longyin Wen, and Stan Z Li. 2014. The fastest deformable part model for object detection. In CVPR. 2497--2504.
[35]
Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He. 2016. Is Faster R-CNN doing well for pedestrian detection?. In ECCV. Springer, 443--457.
[36]
Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. Citypersons: A diverse dataset for pedestrian detection. In CVPR. 3213--3221.
[37]
Shanshan Zhang, Rodrigo Benenson, Bernt Schiele, et al. 2015. Filtered channel features for pedestrian detection. In CVPR, Vol. 1. 4.
[38]
Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: detecting pedestrians in a crowd. In ECCV. 637--653.
[39]
Chengju Zhou, Meiqing Wu, and Siew-Kei Lam. 2019. SSA-CNN: Semantic Self-Attention CNN for Pedestrian Detection. arXiv preprint arXiv:1902.09080 (2019).

Cited By

View all
  • (2025)Boundary-Aware Feature Fusion With Dual-Stream Attention for Remote Sensing Small Object DetectionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351437663(1-13)Online publication date: 2025
  • (2024)FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment AnythingSensors10.3390/s2409288924:9(2889)Online publication date: 30-Apr-2024
  • (2024)Complex Scene Occluded Object Detection with Fusion of Mixed Local Channel Attention and Multi-Detection Layer Anchor-Free OptimizationAutomation10.3390/automation50200115:2(176-189)Online publication date: 17-Jun-2024
  • Show More Cited By

Index Terms

  1. NMS-Loss: Learning with Non-Maximum Suppression for Crowded Pedestrian Detection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
    August 2021
    715 pages
    ISBN:9781450384636
    DOI:10.1145/3460426
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. loss function
    2. non-maximum suppression
    3. pedestrian detection

    Qualifiers

    • Short-paper

    Conference

    ICMR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Boundary-Aware Feature Fusion With Dual-Stream Attention for Remote Sensing Small Object DetectionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.351437663(1-13)Online publication date: 2025
    • (2024)FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment AnythingSensors10.3390/s2409288924:9(2889)Online publication date: 30-Apr-2024
    • (2024)Complex Scene Occluded Object Detection with Fusion of Mixed Local Channel Attention and Multi-Detection Layer Anchor-Free OptimizationAutomation10.3390/automation50200115:2(176-189)Online publication date: 17-Jun-2024
    • (2024)A Pedestrian Detection Network Based on an Attention Mechanism and Pose InformationApplied Sciences10.3390/app1418821414:18(8214)Online publication date: 12-Sep-2024
    • (2024)PSTL-Net: A Patchwise Self-Texture-Learning Network for Transmission Line InspectionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.334111873(1-14)Online publication date: 2024
    • (2024)H2P×PKD: Progressive Training Pipeline with Knowledge Distillation for Lightweight Backbones in Pedestrian Detection2024 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)10.1109/MAPR63514.2024.10660994(1-6)Online publication date: 15-Aug-2024
    • (2024)Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian DetectionIEEE Access10.1109/ACCESS.2024.340696812(76392-76403)Online publication date: 2024
    • (2024)CPH DETR: Comprehensive Regression Loss for End-to-End Object DetectionArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_7(93-107)Online publication date: 17-Sep-2024
    • (2024)Segmenting hydrogen‐induced cracking defects in steel through scanning acoustic microscopy and deep neural networksEngineering Reports10.1002/eng2.12933Online publication date: 5-Aug-2024
    • (2023)Використання нейромережевих засобів для розпізнавання об'єктів у мобільних системах з обходом перешкодScientific Bulletin of UNFU10.36930/4033041233:4(84-89)Online publication date: 31-Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media