[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Oriented R-CNN and Beyond

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Currently, two-stage oriented detectors are superior to single-stage competitors in accuracy, but the step of generating oriented proposals is still time-consuming, thus hindering the inference speed. This paper proposes an Oriented Region Proposal Network (Oriented RPN) to produce high-quality oriented proposals in a nearly cost-free manner. To this end, we present a novel representation manner of oriented objects, named midpoint offset representation, which avoids the complicated design of oriented proposal generation network. Built on Oriented RPN, we develop a simple yet effective oriented object detection framework, called Oriented R-CNN, which could accurately and efficiently detect oriented objects. Moreover, we extend Oriented R-CNN to the task of instance segmentation and realize a new proposal-based instance segmentation method, termed Oriented Mask R-CNN. Without bells and whistles, Oriented R-CNN achieves state-of-the-art accuracy on all seven commonly-used oriented object detection datasets. More importantly, our method has the fastest speed among all detectors. For instance segmentation, Oriented Mask R-CNN also achieves the top results on the large-scale aerial instance segmentation dataset, named iSAID. We hope our methods could serve as solid baselines for oriented object detection and instance segmentation. Code is available at https://github.com/jbwang1997/OBBDetection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated during the current study are available in the DOTA repository (https://captain-whu.github.io/DOTA/), the DIOR-R repository (https://gcheng-nwpu.github.io/#Datasets), the HRSC2016 repository (https://sites.google.com/site/hrsc2016), and the iSAID repository (https://captain-whu.github.io/iSAID/). The source code is available in Github at https://github.com/jbwang1997/OBBDetection.

References

  • Bolya, D., Zhou, C., Xiao, F., et al. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp 9157–9166

  • Cai, Z., & Vasconcelos, N. (2021). Cascade r-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.

    Article  Google Scholar 

  • Cao, J., Cholakkal, H., Anwer, R.M., et al. (2020). D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,485–11,494

  • Chen, K., Pang, J., Wang, J., et al. (2019). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4974–4983

  • Chen, Z., Chen, K., Lin, W., et al. (2020). PIoU Loss: Towards accurate oriented object detection in complex environments. In Proceedings of the European Conference on Computer Vision, pp 195–211

  • Cheng, G., Wang, J., Li, K., et al. (2022). Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.

    Google Scholar 

  • Cheng, G., Lai, P., Gao, D., et al. (2023a). Class attention network for image recognition. Science China Information Sciences, 66(3), 1–13.

  • Cheng, G., Lang, C., & Han, J. (2023b). Holistic prototype activation for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4650–4666.

  • Cheng, G., Li, Q., Wang, G., et al. (2023c). SFRNet: Fine-grained oriented object recognition via separate feature refinement. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–10.

  • Cheng, G., Yuan, X., Yao, X., et al. (2023d). Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11), 13467–13488.

  • Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255

  • Ding, J., Xue, N., Long, Y., et al. (2019). Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2849–2858

  • Ding, J., Xue, N., Xia, G. S., et al. (2021). Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3117983

    Article  Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Everingham, M., Eslami, S. A., Van Gool, L., et al. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.

    Article  Google Scholar 

  • Follmann, P., & König, R. (2019). Oriented boxes for accurate instance segmentation. arXiv preprint arXiv:1911.07732

  • Gao, S. H., Cheng, M. M., Zhao, K., et al. (2021). Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 652–662.

    Article  Google Scholar 

  • Guo, Z., Liu, C., Zhang, X., et al. (2021). Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8792–8801

  • Han, J., Ding, J., Li, J., et al. (2021). Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–11.

    Google Scholar 

  • Han, J., Ding, J., Xue, N., et al. (2021b). Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2786–2795

  • He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  • He, K., Gkioxari, G., Dollár, P., et al. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  • Hou, L., Lu, K., Xue, J., et al. (2022). Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 923–932

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  • Huang, G., Liu, Z., van der Maaten, L., et al. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

  • Huang, Z., Huang, L., Gong, Y., et al. (2019). Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6409–6418

  • Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., et al. (2015). Icdar 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition, pp 1156–1160

  • Lang, C., Cheng, G., Tu, B., et al. (2023). Few-shot segmentation via divide-and-conquer proxies. International Journal of Computer Vision. https://doi.org/10.1007/s11263-023-01886-8

    Article  Google Scholar 

  • Lang, C., Cheng, G., Tu, B., et al. (2023). Base and meta: A new perspective on few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10669–10686.

    Article  Google Scholar 

  • Li, J., Lin, Y., Liu, R., et al. (2021). RSCA: Real-time segmentation-based context-aware scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2349–2358

  • Li, W., Chen, Y., Hu, K., et al. (2022). Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1829–1838

  • Li, Y., Hou, Q., Zheng, Z., et al. (2023). Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 16,794–16,805

  • Liao, M., Zhu, Z., Shi, B., et al. (2018). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5909–5918

  • Liao, M., Zou, Z., Wan, Z., et al. (2022). Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2022.3155612

    Article  Google Scholar 

  • Lin, T.Y., Dollár, P., Girshick, R., et al. (2017a). Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  • Lin, T.Y., Goyal, P., Girshick, R., et al. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988

  • Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.

    Article  Google Scholar 

  • Liu, S., Qi, L., Qin, H., et al. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  • Liu, Z., Wang, H., Weng, L., et al. (2016). Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geoscience and Remote Sensing Letters, 13(8), 1074–1078.

    Article  Google Scholar 

  • Long, S., Ruan, J., Zhang, W., et al. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision, pp 20–36

  • Lyu, P., Yao, C., Wu, W., et al. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7553–7563

  • Ma, J., Shao, W., Ye, H., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), 3111–3122.

    Article  Google Scholar 

  • Ming, Q., Zhou, Z., Miao, L., et al. (2021). Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2355–2363

  • Pan, X., Ren, Y., Sheng, K., et al. (2020). Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11,207–11,216

  • Qian, W., Yang, X., Peng, S., et al. (2021). Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 2458–2466

  • Ren, S., He, K., Girshick, R., et al. (2017). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

    Article  Google Scholar 

  • Sun, X., Wang, P., Yan, Z., et al. (2022). FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184, 116–130.

    Article  Google Scholar 

  • Tang, J., Yang, Z., Wang, Y., et al. (2019). Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96, 6954–6966.

    Article  Google Scholar 

  • Tian, Z., Shu, M., Lyu, P., et al. (2019). Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243

  • Tian, Z., Shen, C., & Chen, H. (2020). Conditional convolutions for instance segmentation. In Proceedings of the European Conference on Computer Vision, pp 282–298

  • Wang, H., Lu, P., Zhang, H., et al. (2020a). All you need is boundary: Toward arbitrary-shaped text spotting. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,160–12,167

  • Wang, W., Xie, E., Li, X., et al. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9336–9345

  • Wang, X., Kong, T., Shen, C., et al. (2020b). Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision, pp 649–665

  • Waqas Zamir, S., Arora, A., Gupta, A., et al. (2019). isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 28–37

  • Xia, G.S., Bai, X., Ding, J., et al. (2018). Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3974–3983

  • Xie, E., Sun, P., Song, X., et al. (2020). Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12,193–12,202

  • Xie, S., Girshick, R., Dollar, P., et al. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500

  • Xie, X., Cheng, G., Wang, J., et al. (2021). Oriented r-cnn for object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp 3520–3529

  • Xie, X., Cheng, G., Li, Q., et al. (2023). Fewer is more: Efficient object detection in large aerial images. Science China Information Sciences. https://doi.org/10.1007/s11432-022-3718-5

    Article  Google Scholar 

  • Xie, X., Lang, C., Miao, S., et al. (2023). Mutual-assistance learning for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2023.3319634

    Article  Google Scholar 

  • Xu, Y., Fu, M., Wang, Q., et al. (2021). Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4), 1452–1459.

    Article  Google Scholar 

  • Yang, J., Liu, Q., & Zhang, K. (2017). Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 79–87

  • Yang, X., & Yan, J. (2020). Arbitrary-oriented object detection with circular smooth label. In Proceedings of the European Conference on Computer Vision, pp 677–694

  • Yang, X., Yang, J., Yan, J., et al. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision, pp 8232–8241

  • Yang, X., Hou, L., Zhou, Y., et al. (2021a). Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 15,819–15,829

  • Yang, X., Liu, Q., Yan, J., et al. (2021b). R3Det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, pp 3163–3171

  • Yang, X., Yan, J., Ming, Q., et al. (2021c). Rethinking rotated object detection with Gaussian Wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, pp 11,830–11,841

  • Yang, X., Yang, X., Yang, J., et al. (2021d). Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In Proceedings of the Advances in Neural Information Processing Systems

  • Yang, X., Zhou, Y., Zhang, G., et al. (2023). The KFIoU loss for rotated object detection. In Proceedings of the International Conference on Learning Representation

  • Yi, J., Wu, P., Liu, B., et al. (2021). Oriented object detection in aerial images with box boundary-aware vectors. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 2150–2159

  • Zhang, S., Chi, C., Yao, Y., et al. (2020a). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9759–9768

  • Zhang, S.X., Zhu, X., Hou, J.B., et al. (2020b). Deep relational reasoning graph network for arbitrary shape text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9699–9708

  • Zhou, X., Yao, C., Wen, H., et al. (2017). EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5551–5560

  • Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850

  • Zhu, Y., Chen, J., Liang, L., et al. (2021). Fourier contour embedding for arbitrary-shaped text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3123–3131

Download references

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grants 62376223 and 62136007, in part by the Natural Science Basic Research Program of Shaanxi under Grants 2021JC-16 and 2023-JCZD-36, and in part by the Doctorate Foundation of Northwestern Polytechnical University under Grant CX2021082. We also thank Chunbo Lang for his valuable and constructive suggestions during the revision of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gong Cheng.

Additional information

Communicated by Jifeng Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Cheng, G., Wang, J. et al. Oriented R-CNN and Beyond. Int J Comput Vis 132, 2420–2442 (2024). https://doi.org/10.1007/s11263-024-01989-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-024-01989-w

Keywords