Abstract
Currently, two-stage oriented detectors are superior to single-stage competitors in accuracy, but the step of generating oriented proposals is still time-consuming, thus hindering the inference speed. This paper proposes an Oriented Region Proposal Network (Oriented RPN) to produce high-quality oriented proposals in a nearly cost-free manner. To this end, we present a novel representation manner of oriented objects, named midpoint offset representation, which avoids the complicated design of oriented proposal generation network. Built on Oriented RPN, we develop a simple yet effective oriented object detection framework, called Oriented R-CNN, which could accurately and efficiently detect oriented objects. Moreover, we extend Oriented R-CNN to the task of instance segmentation and realize a new proposal-based instance segmentation method, termed Oriented Mask R-CNN. Without bells and whistles, Oriented R-CNN achieves state-of-the-art accuracy on all seven commonly-used oriented object detection datasets. More importantly, our method has the fastest speed among all detectors. For instance segmentation, Oriented Mask R-CNN also achieves the top results on the large-scale aerial instance segmentation dataset, named iSAID. We hope our methods could serve as solid baselines for oriented object detection and instance segmentation. Code is available at https://github.com/jbwang1997/OBBDetection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during the current study are available in the DOTA repository (https://captain-whu.github.io/DOTA/), the DIOR-R repository (https://gcheng-nwpu.github.io/#Datasets), the HRSC2016 repository (https://sites.google.com/site/hrsc2016), and the iSAID repository (https://captain-whu.github.io/iSAID/). The source code is available in Github at https://github.com/jbwang1997/OBBDetection.
References
Bolya, D., Zhou, C., Xiao, F., et al. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp 9157–9166
Cai, Z., & Vasconcelos, N. (2021). Cascade r-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.
Cao, J., Cholakkal, H., Anwer, R.M., et al. (2020). D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,485–11,494
Chen, K., Pang, J., Wang, J., et al. (2019). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4974–4983
Chen, Z., Chen, K., Lin, W., et al. (2020). PIoU Loss: Towards accurate oriented object detection in complex environments. In Proceedings of the European Conference on Computer Vision, pp 195–211
Cheng, G., Wang, J., Li, K., et al. (2022). Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.
Cheng, G., Lai, P., Gao, D., et al. (2023a). Class attention network for image recognition. Science China Information Sciences, 66(3), 1–13.
Cheng, G., Lang, C., & Han, J. (2023b). Holistic prototype activation for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4650–4666.
Cheng, G., Li, Q., Wang, G., et al. (2023c). SFRNet: Fine-grained oriented object recognition via separate feature refinement. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–10.
Cheng, G., Yuan, X., Yao, X., et al. (2023d). Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 13467–13488.
Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255
Ding, J., Xue, N., Long, Y., et al. (2019). Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2849–2858
Ding, J., Xue, N., Xia, G. S., et al. (2021). Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3117983
Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Everingham, M., Eslami, S. A., Van Gool, L., et al. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Follmann, P., & König, R. (2019). Oriented boxes for accurate instance segmentation. arXiv preprint arXiv:1911.07732
Gao, S. H., Cheng, M. M., Zhao, K., et al. (2021). Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 652–662.
Guo, Z., Liu, C., Zhang, X., et al. (2021). Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8792–8801
Han, J., Ding, J., Li, J., et al. (2021). Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.
Han, J., Ding, J., Xue, N., et al. (2021b). Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2786–2795
He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
He, K., Gkioxari, G., Dollár, P., et al. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Hou, L., Lu, K., Xue, J., et al. (2022). Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 923–932
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Huang, G., Liu, Z., van der Maaten, L., et al. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Huang, Z., Huang, L., Gong, Y., et al. (2019). Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6409–6418
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., et al. (2015). Icdar 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition, pp 1156–1160
Lang, C., Cheng, G., Tu, B., et al. (2023). Few-shot segmentation via divide-and-conquer proxies. International Journal of Computer Vision. https://doi.org/10.1007/s11263-023-01886-8
Lang, C., Cheng, G., Tu, B., et al. (2023). Base and meta: A new perspective on few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10669–10686.
Li, J., Lin, Y., Liu, R., et al. (2021). RSCA: Real-time segmentation-based context-aware scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2349–2358
Li, W., Chen, Y., Hu, K., et al. (2022). Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1829–1838
Li, Y., Hou, Q., Zheng, Z., et al. (2023). Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 16,794–16,805
Liao, M., Zhu, Z., Shi, B., et al. (2018). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5909–5918
Liao, M., Zou, Z., Wan, Z., et al. (2022). Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2022.3155612
Lin, T.Y., Dollár, P., Girshick, R., et al. (2017a). Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Lin, T.Y., Goyal, P., Girshick, R., et al. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Liu, S., Qi, L., Qin, H., et al. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Liu, Z., Wang, H., Weng, L., et al. (2016). Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geoscience and Remote Sensing Letters, 13(8), 1074–1078.
Long, S., Ruan, J., Zhang, W., et al. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision, pp 20–36
Lyu, P., Yao, C., Wu, W., et al. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7553–7563
Ma, J., Shao, W., Ye, H., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), 3111–3122.
Ming, Q., Zhou, Z., Miao, L., et al. (2021). Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2355–2363
Pan, X., Ren, Y., Sheng, K., et al. (2020). Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,207–11,216
Qian, W., Yang, X., Peng, S., et al. (2021). Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2458–2466
Ren, S., He, K., Girshick, R., et al. (2017). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Sun, X., Wang, P., Yan, Z., et al. (2022). FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184, 116–130.
Tang, J., Yang, Z., Wang, Y., et al. (2019). Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96, 6954–6966.
Tian, Z., Shu, M., Lyu, P., et al. (2019). Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243
Tian, Z., Shen, C., & Chen, H. (2020). Conditional convolutions for instance segmentation. In Proceedings of the European Conference on Computer Vision, pp 282–298
Wang, H., Lu, P., Zhang, H., et al. (2020a). All you need is boundary: Toward arbitrary-shaped text spotting. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,160–12,167
Wang, W., Xie, E., Li, X., et al. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9336–9345
Wang, X., Kong, T., Shen, C., et al. (2020b). Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, pp 649–665
Waqas Zamir, S., Arora, A., Gupta, A., et al. (2019). isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 28–37
Xia, G.S., Bai, X., Ding, J., et al. (2018). Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3974–3983
Xie, E., Sun, P., Song, X., et al. (2020). Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12,193–12,202
Xie, S., Girshick, R., Dollar, P., et al. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Xie, X., Cheng, G., Wang, J., et al. (2021). Oriented r-cnn for object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 3520–3529
Xie, X., Cheng, G., Li, Q., et al. (2023). Fewer is more: Efficient object detection in large aerial images. Science China Information Sciences. https://doi.org/10.1007/s11432-022-3718-5
Xie, X., Lang, C., Miao, S., et al. (2023). Mutual-assistance learning for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2023.3319634
Xu, Y., Fu, M., Wang, Q., et al. (2021). Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4), 1452–1459.
Yang, J., Liu, Q., & Zhang, K. (2017). Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 79–87
Yang, X., & Yan, J. (2020). Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, pp 677–694
Yang, X., Yang, J., Yan, J., et al. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision, pp 8232–8241
Yang, X., Hou, L., Zhou, Y., et al. (2021a). Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 15,819–15,829
Yang, X., Liu, Q., Yan, J., et al. (2021b). R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 3163–3171
Yang, X., Yan, J., Ming, Q., et al. (2021c). Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, pp 11,830–11,841
Yang, X., Yang, X., Yang, J., et al. (2021d). Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In Proceedings of the Advances in Neural Information Processing Systems
Yang, X., Zhou, Y., Zhang, G., et al. (2023). The KFIoU loss for rotated object detection. In Proceedings of the International Conference on Learning Representation
Yi, J., Wu, P., Liu, B., et al. (2021). Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 2150–2159
Zhang, S., Chi, C., Yao, Y., et al. (2020a). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9759–9768
Zhang, S.X., Zhu, X., Hou, J.B., et al. (2020b). Deep relational reasoning graph network for arbitrary shape text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9699–9708
Zhou, X., Yao, C., Wen, H., et al. (2017). EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5551–5560
Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850
Zhu, Y., Chen, J., Liang, L., et al. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3123–3131
Acknowledgements
This work was supported in part by the National Science Foundation of China under Grants 62376223 and 62136007, in part by the Natural Science Basic Research Program of Shaanxi under Grants 2021JC-16 and 2023-JCZD-36, and in part by the Doctorate Foundation of Northwestern Polytechnical University under Grant CX2021082. We also thank Chunbo Lang for his valuable and constructive suggestions during the revision of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jifeng Dai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, X., Cheng, G., Wang, J. et al. Oriented R-CNN and Beyond. Int J Comput Vis 132, 2420–2442 (2024). https://doi.org/10.1007/s11263-024-01989-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-01989-w