[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1609/aaai.v37i1.25190guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Correlation loss: enforcing correlation between classification and localization

Published: 07 February 2023 Publication History

Abstract

Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art.

References

[1]
Blondel, M.; Teboul, O.; Berthet, Q.; and Djolonga, J. 2020. Fast differentiable sorting and ranking. In International Conference on Machine Learning (ICML).
[2]
Bolya, D.; Zhou, C.; Xiao, F.; and Lee, Y. J. 2019. YOLACT: Real-time Instance Segmentation. In IEEE/CVF International Conference on Computer Vision (ICCV).
[3]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; and Zagoruyko, S. 2020. End-to-End Object Detection with Transformers. In European Conference on Computer Vision (ECCV).
[4]
Chen, K.; Lin, W.; li, J.; See, J.; Wang, J.; and Zou, J. 2020. AP-Loss for Accurate One-Stage Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 1-1.
[5]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; Zhang, Z.; Cheng, D.; Zhu, C.; Cheng, T.; Zhao, Q.; Li, B.; Lu, X.; Zhu, R.; Wu, Y.; Dai, J.; Wang, J.; Shi, J.; Ouyang, W.; Loy, C. C.; and Lin, D. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv, 1906.07155.
[6]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; and Schiele, B. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7]
Dai, X.; Chen, Y.; Yang, J.; Zhang, P.; Yuan, L.; and Zhang, L. 2021. Dynamic DETR: End-to-End Object Detection With Dynamic Attention. In IEEE/CVF International Conference on Computer Vision (ICCV).
[8]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M. R.; and Huang, W. 2021. TOOD: Task-aligned One-stage Object Detection. In The International Conference on Computer Vision (ICCV).
[9]
He, K.; Gkioxari, G.; Dollar, P.; and Girshick, R. 2017. Mask R-CNN. In IEEE/CVF International Conference on Computer Vision (ICCV).
[10]
He, Y.; Zhu, C.; Wang, J.; Savvides, M.; and Zhang, X. 2019. Bounding Box Regression With Uncertainty for Accurate Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; and Wang, X. 2019. Mask Scoring R-CNN. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; and Jiang, Y. 2018. Acquisition of Localization Confidence for Accurate Object Detection. In The European Conference on Computer Vision (ECCV).
[13]
Kim, K.; and Lee, H. S. 2020. Probabilistic Anchor Assignment with IoU Prediction for Object Detection. In The European Conference on Computer Vision (ECCV).
[14]
Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; and Shi, J. 2020. FoveaBox: Beyound Anchor-Based Object Detection. IEEE Transactions on Image Processing, 29: 7389-7398.
[15]
Law, H.; and Deng, J. 2018. CornerNet: Detecting Objects as Paired Keypoints. In The European Conference on Computer Vision (ECCV).
[16]
Li, S.; He, C.; Li, R.; and Zhang, L. 2022. A Dual Weighting Label Assignment Scheme for Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Li, X.; Wang, W.; Hu, X.; Li, J.; Tang, J.; and Yang, J. 2019. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; and Yang, J. 2020. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Advances in Neural Information Processing Systems (NeurIPS).
[19]
Lin, T.; Dollár, P.; Girshick, R. B.; He, K.; Hariharan, B.; and Belongie, S. J. 2017. Feature Pyramid Networks for Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; and Dollár, P. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(2): 318-327.
[21]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; and Zitnick, C. L. 2014. Microsoft COCO: Common Objects in Context. In The European Conference on Computer Vision (ECCV).
[22]
Liu, J.; Li, D.; Zheng, R.; Tian, L.; and Shan, Y. 2021. RankDetNet: Delving Into Ranking Constraints for Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 264-273.
[23]
Oksuz, K.; Cam, B. C.; Akbas, E.; and Kalkan, S. 2020. A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection. In Advances in Neural Information Processing Systems (NeurIPS).
[24]
Oksuz, K.; Cam, B. C.; Akbas, E.; and Kalkan, S. 2021a. Rank & Sort Loss for Object Detection and Instance Segmentation. In The International Conference on Computer Vision (ICCV).
[25]
Oksuz, K.; Cam, B. C.; Kalkan, S.; and Akbas, E. 2021b. One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-1.
[26]
Ren, S.; He, K.; Girshick, R.; and Sun, J. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(6): 1137-1149.
[27]
Roh, B.; Shin, J.; Shin, W.; and Kim, S. 2022. Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity. In The International Conference on Learning Representations (ICLR).
[28]
Sun, P.; Jiang, Y.; Xie, E.; Shao, W.; Yuan, Z.; Wang, C.; and Luo, P. 2021a. What Makes for End-to-End Object Detection? In International Conference on Machine Learning (ICML).
[29]
Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; and Luo, P. 2021b. SparseR-CNN: End-to-End Object Detection with Learnable Proposals. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30]
Sun, Z.; Cao, S.; Yang, Y.; and Kitani, K. M. 2021c. Rethinking Transformer-Based Set Prediction for Object Detection. In IEEE/CVF International Conference on Computer Vision (ICCV).
[31]
Tian, Z.; Shen, C.; Chen, H.; and He, T. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In IEEE/CVF International Conference on Computer Vision (ICCV).
[32]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C. C.; and Lin, D. 2019. CARAFE: Content-Aware ReAssembly of FEatures. In IEEE/CVF International Conference on Computer Vision (ICCV).
[33]
Zhang, H.; Wang, Y.; Dayoub, F.; and Sunderhauf, N. 2021. VarifocalNet: An IoU-aware Dense Object Detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34]
Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; and Li, S. Z. 2020. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35]
Zhang, X.; Wan, F.; Liu, C.; Ji, R.; and Ye, Q. 2019. FreeAnchor: Learning to Match Anchors for Visual Object Detection. In Advances in Neural Information Processing Systems (NeurIPS).
[36]
Zhu, X.; Hu, H.; Lin, S.; and Dai, J. 2019. Deformable ConvNets V2: More Deformable, Better Results. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; and Dai, J. 2021. Deformable {DETR}: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations (ICLR).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence
February 2023
16496 pages
ISBN:978-1-57735-880-0

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 07 February 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media