Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5 †
<p>Comparison of the actual scene detection result between HBB and RBB.</p> "> Figure 2
<p>Two popular representations of rotated bounding boxes. (<b>a</b>) OpenCV representation. (<b>b</b>) Long-side representation.</p> "> Figure 3
<p>Boundary problems with OpenCV and long-side representation. (<b>a</b>) Boundary problems of OpenCV representation. (<b>b</b>) Boundary problems of long-side representation.</p> "> Figure 4
<p>Overall structure of the proposed R_YOLOv5 network.</p> "> Figure 5
<p>The process of training rectangular box coordinates, category confidence, class probability (<math display="inline"><semantics> <mover accent="true"> <mi>λ</mi> <mo>→</mo> </mover> </semantics></math>), and 180-dimensional category angle probability(<math display="inline"><semantics> <mover accent="true"> <mi>θ</mi> <mo>→</mo> </mover> </semantics></math>).</p> "> Figure 6
<p>Gaussian function mapping.</p> "> Figure 7
<p>Inference process.</p> "> Figure 8
<p>Visualization results of the R_YOLOv5 evaluation.</p> "> Figure 9
<p>Comparison of PR curves between R_YOLOv5 and YOLOv5 on the FO dataset.</p> "> Figure 10
<p>Comparison of detection results before and after YOLOv5 changes. The blue rectangle on the left is the detection effect of HBB, and the green rectangle on the right is the detection effect of RBB. It shows that the improved RBB not only can detect the seine accurately but also can provide the orientation information of the seine.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Deep General Object Detection
2.2. Arbitrarily Oriented Object Detection
2.3. Discussion of Related Works
3. Proposed Method
3.1. Ground Truth Generation
3.2. Network Architecture
3.3. Training Objective
3.4. Inference
4. Experiments
4.1. Datasets
4.2. Metrics
4.3. Results
4.3.1. Results on DOTA-v1.5
4.3.2. Results on FO Dataset
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xie, T.; Wang, K.; Li, R.; Tang, X.; Zhao, L. Panet: A pixel-level attention network for 6d pose estimation with embedding vector features. IEEE Robot. Autom. Lett. 2021, 7, 1840–1847. [Google Scholar] [CrossRef]
- Liu, K.; Peng, L.; Tang, S. Underwater Object Detection Using TC-YOLO with Attention Mechanisms. Sensors 2023, 23, 2567. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Li, J. YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery. Sensors 2023, 23, 2522. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, msterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 2016; pp. 779–788. [Google Scholar]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Chen, X.; Chen, W.; Su, L.; Li, T. Slender Flexible Object Segmentation Based on Object Correlation Module and Loss Function Optimization. IEEE Access 2023, 11, 29684–29697. [Google Scholar] [CrossRef]
- Kong, Z.; Zhang, N.; Guan, X.; Le, X. Detecting slender objects with uncertainty based on keypoint-displacement representation. Neural Netw. 2021, 139, 246–254. [Google Scholar] [CrossRef] [PubMed]
- Wan, Z.; Chen, Y.; Deng, S.; Chen, K.; Yao, C.; Luo, J. Slender object detection: Diagnoses and improvements. arXiv 2020, arXiv:2011.08529. [Google Scholar]
- Jiang, S.; Yao, W.; Wong, M.S.; Li, G.; Hong, Z.; Kuc, T.Y.; Tong, X. An optimized deep neural network detecting small and narrow rectangular objects in Google Earth images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1068–1081. [Google Scholar] [CrossRef]
- Wu, B.; Iandola, F.; Jin, P.H.; Keutzer, K. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 129–137. [Google Scholar]
- Tao, A.; Barker, J.S.S. Deep neural network for object detection in digits. Parallel Forall 2016, 4. Available online: https://devblogs.nvidia.com/detectnet-deep-neural-network-object-detection-digits (accessed on 8 March 2023).
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Wong, A.; Famuori, M.; Shafiee, M.J.; Li, F.; Chwyl, B.; Chung, J. YOLO nano: A highly compact you only look once convolutional neural network for object detection. In Proceedings of the 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), Vancouver, BC, Canada, 13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 22–25. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Wang, X.; Zhang, H.; Huang, W.; Scott, M.R. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6388–6397. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 390–391. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9657–9666. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
- Qian, W.; Yang, X.; Peng, S.; Guo, Y.; Yan, J. Learning modulated loss for rotated object detection. arXiv 2019, arXiv:1911.08299. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 677–694. [Google Scholar]
- Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P. Towards multi-class object detection in unconstrained remote sensing imagery. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 150–165. [Google Scholar]
- Yang, X.; Liu, Q.; Yan, J.; Li, A.; Zhang, Z.; Yu, G. R3det: Refined single-stage detector with feature refinement for rotating object. arXiv 2019, arXiv:1908.05612. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wu, Y.; Zhang, K.; Wang, J.; Wang, Y.; Wang, Q.; Li, Q. CDD-Net: A context-driven detection network for multiclass object detection. IEEE Geosci. Remote Sens. Lett. 2020, 19, 8004905. [Google Scholar] [CrossRef]
- Zhang, K.; Wu, Y.; Wang, J.; Wang, Y.; Wang, Q. Semantic context-aware network for multiscale object detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8009705. [Google Scholar] [CrossRef]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
Category | Center X | Center Y | Long Side | Short Side | Angle |
---|---|---|---|---|---|
0 (seine) | 0–1 | 0–1 | 0–1 | 0–1 | |
1 (fence) | 0–1 | 0–1 | 0–1 | 0–1 |
CSP Structure | R_YOLOv5s | R_YOLOv5m | R_YOLOv5l | R_YOLOv5x |
---|---|---|---|---|
① | CSP1_1 | CSP1_2 | CSP1_3 | CSP1_4 |
② | CSP1_3 | CSP1_6 | CSP1_9 | CSP1_12 |
③ | CSP1_3 | CSP1_6 | CSP1_9 | CSP1_12 |
④ | CSP2_1 | CSP2_2 | CSP2_3 | CSP2_4 |
⑤ | CSP2_1 | CSP2_2 | CSP2_3 | CSP2_4 |
⑥ | CSP2_1 | CSP2_2 | CSP2_3 | CSP2_4 |
⑦ | CSP2_1 | CSP2_2 | CSP2_3 | CSP2_4 |
⑧ | CSP2_1 | CSP2_2 | CSP2_3 | CSP2_4 |
Convolution Kernels Number | R_YOLOv5s | R_YOLOv5m | R_YOLOv5l | R_YOLOv5x |
---|---|---|---|---|
(1) | 32 | 48 | 64 | 80 |
(2) | 64 | 96 | 128 | 160 |
(3) | 128 | 196 | 256 | 320 |
(4) | 256 | 384 | 512 | 640 |
(5) | 512 | 768 | 1024 | 1280 |
Methods | Detector | mAP/% | BD | TC | SH | BC | PL | GTF | HA | BR | SV | SBF | LV | HC | SP | ST | RA | CC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FPN with Faster RCNN [9] | HBB | 57.30 | 70.10 | 86.00 | 40.30 | 69.40 | 78.60 | 68.50 | 59.50 | 55.10 | 23.70 | 61.10 | 45.40 | 68.30 | 64.50 | 46.40 | 56.20 | 24.40 |
RetinaNet [36] | HBB | 33.50 | 44.50 | 75.10 | 33.40 | 30.80 | 76.00 | 32.50 | 35.80 | 32.60 | 10.70 | 13.00 | 33.30 | 0.20 | 43.90 | 31.20 | 42.40 | 0.00 |
YOLOv4 [37] | HBB | 55.60 | 61.70 | 88.30 | 79.50 | 55.60 | 85.20 | 35.20 | 69.80 | 32.60 | 37.00 | 34.40 | 64.00 | 67.60 | 58.50 | 64.80 | 54.20 | 0.70 |
PANet [26] | HBB | 61.20 | 74.10 | 89.60 | 58.40 | 67.00 | 85.90 | 64.50 | 67.90 | 51.50 | 27.70 | 63.40 | 56.20 | 71.30 | 73.40 | 61.30 | 59.20 | 7.60 |
CDD-Net [38] | HBB | 61.30 | 74.70 | 89.80 | 49.20 | 71.40 | 81.40 | 70.10 | 69.90 | 55.30 | 25.30 | 65.60 | 51.50 | 71.30 | 60.40 | 53.30 | 58.20 | 32.70 |
SCANet [39] | HBB | 64.00 | 77.20 | 90.30 | 53.70 | 73.20 | 81.10 | 72.50 | 70.50 | 62.40 | 25.60 | 65.30 | 52.70 | 77.60 | 68.80 | 52.80 | 63.50 | 36.70 |
HTC [40] | RBB | 64.47 | 74.41 | 90.34 | 79.89 | 75.17 | 78.41 | 63.17 | 72.13 | 53.41 | 52.45 | 48.44 | 63.56 | 56.42 | 74.02 | 67.64 | 69.94 | 12.14 |
Mask R-CNN [41] | RBB | 64.54 | 77.41 | 90.31 | 79.74 | 74.28 | 78.36 | 56.94 | 70.77 | 53.36 | 52.17 | 45.49 | 63.60 | 61.49 | 73.87 | 66.41 | 71.32 | 17.11 |
ReDet [42] | RBB | 67.66 | 82.63 | 90.83 | 87.82 | 75.81 | 79.51 | 69.82 | 75.57 | 53.81 | 52.76 | 49.11 | 75.64 | 58.29 | 75.17 | 68.78 | 71.65 | 15.36 |
R_YOLOv5s | RBB | 71.20 | 82.00 | 96.10 | 93.30 | 77.10 | 96.40 | 66.90 | 83.00 | 53.30 | 67.80 | 58.40 | 82.00 | 67.30 | 67.90 | 70.50 | 56.50 | 20.10 |
R_YOLOv5m | RBB | 73.10 | 84.20 | 96.20 | 94.90 | 80.00 | 97.00 | 62.80 | 85.20 | 56.00 | 70.20 | 61.40 | 83.60 | 73.40 | 71.70 | 74.70 | 59.90 | 17.70 |
R_YOLOv5l | RBB | 73.60 | 86.00 | 96.50 | 94.90 | 83.40 | 97.40 | 69.20 | 84.70 | 56.00 | 70.70 | 66.60 | 84.10 | 76.50 | 71.50 | 70.40 | 63.40 | 5.60 |
R_YOLOv5x | RBB | 74.50 | 86.00 | 97.00 | 96.00 | 83.10 | 97.60 | 66.70 | 86.80 | 60.20 | 74.30 | 62.80 | 84.60 | 67.30 | 70.10 | 78.60 | 57.70 | 23.90 |
Method | Object | AP/% | mAP/% | Recall/% | Precision/% | Time/ms | FPS |
---|---|---|---|---|---|---|---|
YOLOv5s | Seine | 51.60 | 47.70 | 50.70 | 74.50 | 22.68 | 44.10 |
Fence | 43.80 | 39.20 | 89.80 | ||||
R_YOLOv5s | Seine | 67.72 | 57.90 | 57.70 | 70.20 | 23.87 | 41.90 |
Fence | 48.08 | 42.00 | 93.30 | ||||
YOLOv5m | Seine | 57.60 | 60.70 | 55.30 | 62.80 | 35.21 | 28.40 |
Fence | 63.70 | 76.00 | 86.90 | ||||
R_YOLOv5m | Seine | 75.70 | 62.90 | 53.50 | 71.70 | 38.17 | 26.20 |
Fence | 48.90 | 75.50 | 85.30 | ||||
YOLOv5l | Seine | 62.20 | 65.50 | 70.50 | 69.90 | 42.02 | 23.80 |
Fence | 68.90 | 77.10 | 85.70 | ||||
R_YOLOv5l | Seine | 79.40 | 68.90 | 80.00 | 80.10 | 44.84 | 22.30 |
Fence | 58.40 | 76.20 | 84.90 | ||||
YOLOv5x | Seine | 62.00 | 69.60 | 80.50 | 74.50 | 51.02 | 19.60 |
Fence | 77.30 | 87.20 | 85.80 | ||||
R_YOLOv5x | Seine | 74.80 | 71.25 | 80.30 | 74.10 | 54.05 | 18.50 |
Fence | 67.70 | 76.90 | 74.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Su, L.; Lin, Z.; Chen, Y.; Ji, J.; Li, T. Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5. Sensors 2023, 23, 4925. https://doi.org/10.3390/s23104925
Wu J, Su L, Lin Z, Chen Y, Ji J, Li T. Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5. Sensors. 2023; 23(10):4925. https://doi.org/10.3390/s23104925
Chicago/Turabian StyleWu, Jiajun, Lumei Su, Zhiwei Lin, Yuhan Chen, Jiaming Ji, and Tianyou Li. 2023. "Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5" Sensors 23, no. 10: 4925. https://doi.org/10.3390/s23104925
APA StyleWu, J., Su, L., Lin, Z., Chen, Y., Ji, J., & Li, T. (2023). Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5. Sensors, 23(10), 4925. https://doi.org/10.3390/s23104925