[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Gaussian guided IoU: : A better metric for balanced learning on object detection

Published: 01 June 2022 Publication History

Abstract

Most anchor‐based detectors use intersection over union (IoU) to assign targets to anchors during training. However, IoU did not pay enough attention to the proximity of the anchor's centre to the centre of the truth box, resulting in two issues: (1) the most slender objects were given just one anchor, resulting in insufficient supervision information for slender objects during training; (2) IoU cannot accurately represent the degree of alignment between the feature's receptive field at the anchor's centre and the object. As a result, some features with good alignment degrees are missing, while others with poor alignment degrees are used, reducing the model's localisation accuracy. To address these issues, we first created a Gaussian Guided IoU (GGIoU), which prioritises the proximity of the anchor's centre to the truth box's centre. We then proposed GGIoU‐balanced learning methods, including GGIoU‐guided assignment strategy and GGIoU‐balanced localisation loss. This method can assign multiple anchors to each slender object, favouring features that are well‐aligned with the objects during the training process. A large number of experiments show that GGIoU‐balanced learning can solve the aforementioned problems and significantly improve the detection model's performance.

References

[1]
Fu, CY., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
[2]
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
[3]
Ren, S., et al.: Faster R‐CNN: towards real‐time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
[4]
Wu, S., Li, X., Wang, X.: IOU‐aware single‐stage object detector for accurate localization. Image Vis. Comput. 97, 103911 (2020)
[5]
Ye, X.‐Y., et al.: A two‐stage real‐time YOLOv2‐based road marking detector with lightweight spatial transformation‐invariant classification. Image Vis. Comput. 102, 103978 (2020)
[6]
Kong, T., et al.: FoveaBox: beyound anchor‐based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
[7]
Redmon, J., et al.: You only look once: unified, real‐time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
[8]
Tian, Z., et al.: FCOS: fully convolutional one‐stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
[9]
Yu, J., et al.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)
[10]
Lin, T.‐Y., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
[11]
Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
[12]
Everingham, M., et al.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
[13]
Lin, T.‐Y., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
[14]
Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
[15]
Zheng, Z., et al.: Distance‐IOU loss: faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34, 12993–13000 (2020)
[16]
Zhang, X., et al.: Learning to match anchors for visual object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
[17]
Ke, W., et al.: Multiple anchor learning for visual object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10206–10215 (2020)
[18]
Zhang, S., et al.: Bridging the gap between anchor‐based and anchor‐free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
[19]
Cai, Z., Vasconcelos, N.: Cascade R‐CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
[20]
He, K., et al.: Mask R‐CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
[21]
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
[22]
Zhu, X., et al.: Deformable ConvNets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
[23]
Wang, J., et al.: Region proposal by guided anchoring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2965–2974 (2019)
[24]
Chen, Y., et al.: Revisiting feature alignment for one‐stage object detection. arXiv preprint arXiv:1908.01570 (2019)
[25]
Wu, S., et al.: Iou‐balanced loss functions for single‐stage object detection. arXiv preprint arXiv:1908.05641 (2019)
[26]
Yang, J., et al.: Scd: a stacked carton dataset for detection and segmentation. arXiv preprint arXiv:2102.12808 (2021)
[27]
Gou, L., et al.: Carton dataset synthesis method for loading‐and‐unloading carton detection based on deep learning. Int. J. Adv. Manuf. Technol., 1–18 (2022)
[28]
Chen, K., et al.: MMdetection: open MMlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
[29]
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
[30]
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IET Computer Vision
IET Computer Vision  Volume 16, Issue 6
September 2022
88 pages
EISSN:1751-9640
DOI:10.1049/cvi2.v16.6
Issue’s Table of Contents
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 June 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media