Oriented R-CNN and Beyond

Xingxing Xie¹,
Gong Cheng ORCID: orcid.org/0000-0001-5030-0683¹,
Jiabao Wang¹,
Ke Li²,
Xiwen Yao¹ &
…
Junwei Han¹

1388 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Currently, two-stage oriented detectors are superior to single-stage competitors in accuracy, but the step of generating oriented proposals is still time-consuming, thus hindering the inference speed. This paper proposes an Oriented Region Proposal Network (Oriented RPN) to produce high-quality oriented proposals in a nearly cost-free manner. To this end, we present a novel representation manner of oriented objects, named midpoint offset representation, which avoids the complicated design of oriented proposal generation network. Built on Oriented RPN, we develop a simple yet effective oriented object detection framework, called Oriented R-CNN, which could accurately and efficiently detect oriented objects. Moreover, we extend Oriented R-CNN to the task of instance segmentation and realize a new proposal-based instance segmentation method, termed Oriented Mask R-CNN. Without bells and whistles, Oriented R-CNN achieves state-of-the-art accuracy on all seven commonly-used oriented object detection datasets. More importantly, our method has the fastest speed among all detectors. For instance segmentation, Oriented Mask R-CNN also achieves the top results on the large-scale aerial instance segmentation dataset, named iSAID. We hope our methods could serve as solid baselines for oriented object detection and instance segmentation. Code is available at https://github.com/jbwang1997/OBBDetection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

RatiO R-CNN: An Efficient and Accurate Detection Method for Oriented Object Detection

DRPDDet: Dynamic Rotated Proposals Decoder for Oriented Object Detection

Accurate Oriented Instance Segmentation in Aerial Images

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during the current study are available in the DOTA repository (https://captain-whu.github.io/DOTA/), the DIOR-R repository (https://gcheng-nwpu.github.io/#Datasets), the HRSC2016 repository (https://sites.google.com/site/hrsc2016), and the iSAID repository (https://captain-whu.github.io/iSAID/). The source code is available in Github at https://github.com/jbwang1997/OBBDetection.

References

Bolya, D., Zhou, C., Xiao, F., et al. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp 9157–9166
Cai, Z., & Vasconcelos, N. (2021). Cascade r-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.
Article Google Scholar
Cao, J., Cholakkal, H., Anwer, R.M., et al. (2020). D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,485–11,494
Chen, K., Pang, J., Wang, J., et al. (2019). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4974–4983
Chen, Z., Chen, K., Lin, W., et al. (2020). PIoU Loss: Towards accurate oriented object detection in complex environments. In Proceedings of the European Conference on Computer Vision, pp 195–211
Cheng, G., Wang, J., Li, K., et al. (2022). Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.
Google Scholar
Cheng, G., Lai, P., Gao, D., et al. (2023a). Class attention network for image recognition. Science China Information Sciences, 66(3), 1–13.
Cheng, G., Lang, C., & Han, J. (2023b). Holistic prototype activation for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4650–4666.
Cheng, G., Li, Q., Wang, G., et al. (2023c). SFRNet: Fine-grained oriented object recognition via separate feature refinement. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–10.
Cheng, G., Yuan, X., Yao, X., et al. (2023d). Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 13467–13488.
Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255
Ding, J., Xue, N., Long, Y., et al. (2019). Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2849–2858
Ding, J., Xue, N., Xia, G. S., et al. (2021). Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3117983
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Everingham, M., Eslami, S. A., Van Gool, L., et al. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Article Google Scholar
Follmann, P., & König, R. (2019). Oriented boxes for accurate instance segmentation. arXiv preprint arXiv:1911.07732
Gao, S. H., Cheng, M. M., Zhao, K., et al. (2021). Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 652–662.
Article Google Scholar
Guo, Z., Liu, C., Zhang, X., et al. (2021). Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8792–8801
Han, J., Ding, J., Li, J., et al. (2021). Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.
Google Scholar
Han, J., Ding, J., Xue, N., et al. (2021b). Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2786–2795
He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
He, K., Gkioxari, G., Dollár, P., et al. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Hou, L., Lu, K., Xue, J., et al. (2022). Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 923–932
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Huang, G., Liu, Z., van der Maaten, L., et al. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Huang, Z., Huang, L., Gong, Y., et al. (2019). Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6409–6418
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., et al. (2015). Icdar 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition, pp 1156–1160
Lang, C., Cheng, G., Tu, B., et al. (2023). Few-shot segmentation via divide-and-conquer proxies. International Journal of Computer Vision. https://doi.org/10.1007/s11263-023-01886-8
Article Google Scholar
Lang, C., Cheng, G., Tu, B., et al. (2023). Base and meta: A new perspective on few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10669–10686.
Article Google Scholar
Li, J., Lin, Y., Liu, R., et al. (2021). RSCA: Real-time segmentation-based context-aware scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2349–2358
Li, W., Chen, Y., Hu, K., et al. (2022). Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1829–1838
Li, Y., Hou, Q., Zheng, Z., et al. (2023). Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 16,794–16,805
Liao, M., Zhu, Z., Shi, B., et al. (2018). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5909–5918
Liao, M., Zou, Z., Wan, Z., et al. (2022). Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2022.3155612
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al. (2017a). Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Lin, T.Y., Goyal, P., Girshick, R., et al. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Article Google Scholar
Liu, S., Qi, L., Qin, H., et al. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Liu, Z., Wang, H., Weng, L., et al. (2016). Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geoscience and Remote Sensing Letters, 13(8), 1074–1078.
Article Google Scholar
Long, S., Ruan, J., Zhang, W., et al. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision, pp 20–36
Lyu, P., Yao, C., Wu, W., et al. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7553–7563
Ma, J., Shao, W., Ye, H., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), 3111–3122.
Article Google Scholar
Ming, Q., Zhou, Z., Miao, L., et al. (2021). Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2355–2363
Pan, X., Ren, Y., Sheng, K., et al. (2020). Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,207–11,216
Qian, W., Yang, X., Peng, S., et al. (2021). Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2458–2466
Ren, S., He, K., Girshick, R., et al. (2017). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Article Google Scholar
Sun, X., Wang, P., Yan, Z., et al. (2022). FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184, 116–130.
Article Google Scholar
Tang, J., Yang, Z., Wang, Y., et al. (2019). Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96, 6954–6966.
Article Google Scholar
Tian, Z., Shu, M., Lyu, P., et al. (2019). Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243
Tian, Z., Shen, C., & Chen, H. (2020). Conditional convolutions for instance segmentation. In Proceedings of the European Conference on Computer Vision, pp 282–298
Wang, H., Lu, P., Zhang, H., et al. (2020a). All you need is boundary: Toward arbitrary-shaped text spotting. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,160–12,167
Wang, W., Xie, E., Li, X., et al. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9336–9345
Wang, X., Kong, T., Shen, C., et al. (2020b). Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, pp 649–665
Waqas Zamir, S., Arora, A., Gupta, A., et al. (2019). isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 28–37
Xia, G.S., Bai, X., Ding, J., et al. (2018). Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3974–3983
Xie, E., Sun, P., Song, X., et al. (2020). Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12,193–12,202
Xie, S., Girshick, R., Dollar, P., et al. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Xie, X., Cheng, G., Wang, J., et al. (2021). Oriented r-cnn for object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 3520–3529
Xie, X., Cheng, G., Li, Q., et al. (2023). Fewer is more: Efficient object detection in large aerial images. Science China Information Sciences. https://doi.org/10.1007/s11432-022-3718-5
Article Google Scholar
Xie, X., Lang, C., Miao, S., et al. (2023). Mutual-assistance learning for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2023.3319634
Article Google Scholar
Xu, Y., Fu, M., Wang, Q., et al. (2021). Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4), 1452–1459.
Article Google Scholar
Yang, J., Liu, Q., & Zhang, K. (2017). Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 79–87
Yang, X., & Yan, J. (2020). Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, pp 677–694
Yang, X., Yang, J., Yan, J., et al. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision, pp 8232–8241
Yang, X., Hou, L., Zhou, Y., et al. (2021a). Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 15,819–15,829
Yang, X., Liu, Q., Yan, J., et al. (2021b). R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 3163–3171
Yang, X., Yan, J., Ming, Q., et al. (2021c). Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, pp 11,830–11,841
Yang, X., Yang, X., Yang, J., et al. (2021d). Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In Proceedings of the Advances in Neural Information Processing Systems
Yang, X., Zhou, Y., Zhang, G., et al. (2023). The KFIoU loss for rotated object detection. In Proceedings of the International Conference on Learning Representation
Yi, J., Wu, P., Liu, B., et al. (2021). Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 2150–2159
Zhang, S., Chi, C., Yao, Y., et al. (2020a). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9759–9768
Zhang, S.X., Zhu, X., Hou, J.B., et al. (2020b). Deep relational reasoning graph network for arbitrary shape text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9699–9708
Zhou, X., Yao, C., Wen, H., et al. (2017). EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5551–5560
Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850
Zhu, Y., Chen, J., Liang, L., et al. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3123–3131

Download references

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grants 62376223 and 62136007, in part by the Natural Science Basic Research Program of Shaanxi under Grants 2021JC-16 and 2023-JCZD-36, and in part by the Doctorate Foundation of Northwestern Polytechnical University under Grant CX2021082. We also thank Chunbo Lang for his valuable and constructive suggestions during the revision of the manuscript.

Author information

Authors and Affiliations

Northwestern Polytechnical University, Xi’an, China
Xingxing Xie, Gong Cheng, Jiabao Wang, Xiwen Yao & Junwei Han
Zhengzhou Institute of Surveying and Mapping, Zhengzhou, China
Ke Li

Authors

Xingxing Xie
View author publications
You can also search for this author in PubMed Google Scholar
Gong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiabao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ke Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiwen Yao
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gong Cheng.

Additional information

Communicated by Jifeng Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, X., Cheng, G., Wang, J. et al. Oriented R-CNN and Beyond. Int J Comput Vis 132, 2420–2442 (2024). https://doi.org/10.1007/s11263-024-01989-w

Download citation

Received: 19 February 2023
Accepted: 01 January 2024
Published: 29 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11263-024-01989-w

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RatiO R-CNN: An Efficient and Accurate Detection Method for Oriented Object Detection

DRPDDet: Dynamic Rotated Proposals Decoder for Oriented Object Detection

Accurate Oriented Instance Segmentation in Aerial Images

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Oriented R-CNN and Beyond

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RatiO R-CNN: An Efficient and Accurate Detection Method for Oriented Object Detection

DRPDDet: Dynamic Rotated Proposals Decoder for Oriented Object Detection

Accurate Oriented Instance Segmentation in Aerial Images

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation