[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Feature refinement with multi-level context for object detection

  • ORIGINAL PAPER
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Robust multi-scale object detection is challenging as it requires both spatial details and semantic knowledge to deal with problems including high scale variation and cluttered background. Appropriate fusion of high-resolution features with deep semantic features is the key issue to achieve better performance. Different approaches have been developed to extract and combine deep features with shallow layer spatial features, such as feature pyramid network. However, high-resolution feature maps contain noisy and distractive features. Directly combines shallow features with semantic features might degrade detection accuracy. Besides, contextual information is also important for multi-scale object detection. In this work, we present a feature refinement scheme to tackle the feature fusion problem. The proposed feature refinement module increases feature resolution and refine feature maps progressively with the guidance from deep features. Meanwhile, we propose a context extraction method to capture global and local contextual information. The method utilizes a multi-level cross-pooling unit to extract global context and a cascaded context module to extract local context. The proposed object detection framework has been evaluated on PASCAL VOC and MS COCO datasets. Experimental results demonstrate that the proposed method performs favorably against state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data that support the findings of this study are available in “The pascal visual object classes (VOC) challenge,” https://doi.org/10.1007/s11263-009-0275-4 and the “Microsoft COCO,” https://doi.org/10.1007/978-3-319-10602-1_48.

Code availability

Not applicable.

References

  1. Ma, Y., Deng, L., Chen, X., Guo, N.: Integrating orientation cue with EOH-OLBP-based multilevel features for human detection. IEEE Trans. Circuits Syst. Video Technol. 23(10), 1755–1766 (2013)

    Google Scholar 

  2. Keren, Fu., Zhao, Qijun, Irene Yu-Hua, Gu.: Refinet: a deep segmentation assisted refinement network for salient object detection. IEEE Trans. Multimedia 21(2), 457–469 (2019)

    Google Scholar 

  3. Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: design backbone for object detection. In ECCV, Munich, Germany (2018)

    Google Scholar 

  4. Zhang, P., Liu, W., Zeng, Yi., Le, Y., Huchuan, Lu.: Looking for the detail and context devils: high-resolution salient object detection. IEEE Trans. Image Processing 30, 3204–3216 (2021)

    Google Scholar 

  5. Qiu, Heqian, Li, Hongliang, Qingbo, Wu., Meng, Fanman, Linfeng, Xu., Ngan, King Ngi, Shi, Hengcan: Hierarchical context features embedding for object detection. IEEE Trans. Multimedia 22(12), 3039–3050 (2020)

    Google Scholar 

  6. Lin, Tsung-Yi., Dollar, Piotr, Girshick, Ross, He, Kaiming, Hariharan, Bharath, Belongie, Serge: Feature pyramid networks for object detection. In CVPR, Honolulu, HI, USA (2017)

    Google Scholar 

  7. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In CVPR, Honolulu, HI, USA (2017)

    Google Scholar 

  8. Mingliang, Xu., Cui, Lisha, Lv, Pei, Jiang, Xiaoheng, Niu, Jianwei, Zhou, Bing, Wang, Meng: MDSSD: multi-scale deconvolutional single shot detector for small objects. Sci. China Inf. Sci. 63, 120113 (2020)

    Google Scholar 

  9. Kong T., Sun F., Huang W. and Liu H., Deep feature pyramid reconfiguration for object detection. In ECCV, Munich, Germany, (2018).

  10. Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In ICCV, Seoul, Korea (2019)

    Google Scholar 

  11. Zhao, J., Cao, Y., Fan, D., Cheng, M., Li, X., Zhang, L.: Contrast prior and fluid pyramid integration for rgbd salient object detection. In CVPR, Long Beach, CA, USA (2019)

    Google Scholar 

  12. Alamri, Faisal, Pugeault, Nicolas: Improving object detection performance using scene contextual constraints. IEEE Trans. Cogn. Dev. Sys. (2020). https://doi.org/10.1109/TCDS.2020.3008213

    Article  Google Scholar 

  13. Yu F., and Koltun V., Multi-scale context aggregation by dilated convolutions. In ICLR, Caribe Hilton, San Juan, Puerto Rico, (2016).

  14. Yang, Maoke, Kun, Yu., Zhang, Chi, Li, Zhiwei, Yang, Kuiyuan: DenseASPP for semantic segmentation in street scenes. In CVPR, Salt Lake City, UT, USA (2018)

    Google Scholar 

  15. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In CVPR, Honolulu, HI, USA (2017)

    Google Scholar 

  16. Tian, Zhi, Shen, Chunhua, Chen, Hao, He, Tong: Fcos: fully convolutional one-stage object detection. In ICCV, Seoul, Korea (2019)

    Google Scholar 

  17. Jiahao, Xu., Tian, H., Wang, Z., Wang, Y., Kang, W., Chen, F.: joint input and output space learning for multi-label image classification. IEEE Trans. Multimedia 23, 1696–1707 (2020)

    Google Scholar 

  18. Wei, LiHua, Ma, YingDong: Multi-module spatial semantic network for semantic segmentation. In ICIEV, Kitakyushu, Japan (2020)

    Google Scholar 

  19. Wang, X., Ma, Y.: Multi-level feature and context pyramid network for object detection. Int. J. Comput. Vision Signal Process 1, 1–8 (2020)

    Google Scholar 

  20. Girshick R., Fast R-CNN. In ICCV, Santiago, Chile, Dec. (2015).

  21. Ren, Shaoqing, He, Kaiming, Girshick, Ross, Sun, Jian: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137 (2017)

    Google Scholar 

  22. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., and Berg A.C., Ssd: single shot multibox detector. In ECCV, pp. 21–37, Amsterdam, The Netherlands, (2016).

  23. Redmon, Joseph, Divvala, Santosh, Girshick, Ross, Farhadi, Ali: You only look once: unified, real-time object detection. In CVPR, Las Vegas, NV, USA (2016)

    Google Scholar 

  24. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)

    Google Scholar 

  25. Shen, Zhiqiang, Liu, Zhuang, Li, Jianguo, Jiang, Yu-Gang., Chen, Yurong, Xue, Xiangyang: DSOD: learning deeply supervised object detectors from scratch. In ICCV, Venice, Italy (2017)

    Google Scholar 

  26. Jie, Hu., Shen, Li., Albanie, S., Sun, G., Enhua, Wu.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)

    Google Scholar 

  27. Park, Jongchan, Woo, Sanghyun, Lee, Joon-Young.: In So Kweon, BAM: bottleneck attention module. In BMVC, Newcastle, UK (2018)

    Google Scholar 

  28. Woo, Sanghyun, Park, Jongchan, Lee, Joon-Young.: In so Kweon, CBAM: convolutional block attention module. In ECCV, Munich, Germany (2018)

    Google Scholar 

  29. Jun, Fu., Liu, Jing, Tian, Haijie, Li, Yong, Bao, Yongjun, Fang, Zhiwei, Hanqing, Lu.: Dual attention network for scene segmentation. In CVPR, Long Beach, CA, USA (2019)

    Google Scholar 

  30. Li H., Xiong P., An J., Wang L., Pyramid attention network for semantic segmentation. arXiv:1805.10180, (2018).

  31. Chen, S., Tan, X., Wang, B., Huchuan, Lu., Xuelong, Hu., Yun, Fu.: Reverse attention-based residual network for salient object detection. IEEE Trans. Image Processing 29, 3763–3776 (2020)

    MATH  Google Scholar 

  32. Chen L.-C., Papandreou G., Schroff F., and Adam H., Rethinking Atrous Convolution for Semantic Image Segmentation. CoRR abs/1706.05587 (2017).

  33. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Patten Anal. Mach. Intell. 40(4), 834–848 (2018)

    Google Scholar 

  34. Fu, Cheng-Yang, Liu, Wei, Ranga, Ananth, Tyagi, Ambrish, and Berg, Alexander C, Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, (2017).

  35. Changqian, Yu., Wang, Jingbo, Peng, Chao, Gao, Changxin, Gang, Yu., Sang, Nong: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In ECCV, Munich, Germany (2018)

    Google Scholar 

  36. Wang A., Ou W., Ren Chunhong., Liu Y., Cross-level feature aggregation and fusion network for light field salient object detection. International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom), Rhodes, Greece, Nov. (2020).

  37. Poudel R.P.K., Bonde U., Liwicki S., Zach C., ContextNet: exploring context and detail for semantic segmentation in real-time. In BMVC, Newcastle, UK, (2018).

  38. Nie, J., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., Shao, L.: Enriched feature guided refinement network for object detection. In ICCV, Seoul, Korea (2019)

    Google Scholar 

  39. Zhang, P., Liu, W., Zeng, Yi., Lei, Y., Huchuan, Lu.: looking for the detail and context devils: high-resolution salient object detection. IEEE Trans. Image Processing 30, 3204–3216 (2021)

    Google Scholar 

  40. Hou, Qibin, Zhang, Li., Cheng, Ming-Ming., Feng, Jiashi: Strip pooling: rethinking spatial pooling for scene parsing. In CVPR, Seattle, WA, USA (2020)

    Google Scholar 

  41. Cai, Zhaowei, Vasconcelos, Nuno: Cascade r-cnn: Delving into high quality object detection. In CVPR, Salt Lake City, UT, USA (2018)

    Google Scholar 

  42. He, Kaiming, Gkioxari, Georgia, Dollár, Piotr, Girshick, Ross: Mask r-cnn. In ICCV, Venice, Italy (2017)

    Google Scholar 

  43. Liu, Ziming, Gao, Guangyu, Sun, Lin, Fang, Li.: IPG-Net: image pyramid guidance network for small object detection. In CVPR Workshops, Seattle, WA, USA (2020)

    Google Scholar 

  44. Li, Yanghao, Chen, Yuntao, Wang, Naiyan, Zhang, Zhao-Xiang.: Scale-aware trident networks for object detection. In ICCV, Seoul, Korea (2019)

    Google Scholar 

  45. Zhao Q., Sheng T., Wang Y., Tang Z., Chen Y., Cai L, and Ling H., M2Det: a single-shot object detector based on muti-level feature pyramid network. In AAAI, pp.9259–9266, (2019).

  46. Zhang, S., Wen, L., Lei, Z., Li, S.Z.: RefineDet++: single-shot refinement neural network for object detection. IEEE Trans. Circuits Syst. Video Technol. 31(2), 674–687 (2021)

    Google Scholar 

  47. Kim, Seung-Wook., Kook, Hyong-Keun., Sun, Jee-Young., Kang, Mun-Cheon., Ko, Sung-Jea.: Parallel feature pyramid network for object detection. In ECCV, Amsterdam, Netherlands (2018)

    Google Scholar 

  48. Law, Hei, Deng, Jia: Cornernet: detecting objects as paired keypoints. In ECCV, Munich, Germany (2018)

    Google Scholar 

  49. Zhou X., Wang D., Krähenbühl P., Objects as points. [J]. arXiv preprint arXiv:1904. 07850, (2019).

  50. Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)

    MATH  Google Scholar 

  51. Redmon J., and Farhadi Ali., YOLOv3: an incremental improvement. arXiv:1804.02767, (2018).

  52. Bochkovskiy A., Wang C.-Y. and Liao H.Y.M., YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934, (2020).

  53. Zhu B., Wang J., Jiang Z., Zong F., Liu S., Li Z., and Sun J., Autoassign: differentiable label assignment for dense object detection. arXiv preprint arXiv:2007.03496, (2020).

  54. Kim K., Lee H.S., Probabilistic anchor assignment with iou prediction for object detection. In ECCV, pp. 355–371, (2020).

  55. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. Adv. Neural Inf. Processing Syst. 32, 1 (2019)

    Google Scholar 

  56. Zhu C., Chen F., Shen Z., and Savvides M., Soft anchor-point object detection. In ECCV, pp.91–107. (2020).

  57. Ge Z., Liu S., Li Z., Yoshie O., and Sun J., Ota: optimal transport assignment for object detection. In CVPR, pp. 303–312, (2021).

  58. Tan M., Pang R., and Le Q.V., Efficientdet: scalable and efficient object detection. In CVPR, pp. 10781–10790, (2020).

  59. Kong, Tao, Yao, Anbang, Chen, Yurong, Sun, Fuchun: Hypernet: towards accurate region proposal generation and joint object detection. In CVPR, Las Vegas, NV, USA (2016)

    Google Scholar 

  60. Zhu, Yousong, Zhao, Chaoyang, Jinqiao Wang, Xu., Zhao, Yi Wu., Hanqing, Lu.: CoupleNet: coupling global structure with local parts for object detection. In ICCV, Venice, Italy (2017)

    Google Scholar 

  61. Bell, S., Lawrence Zitnick, C., Bala, Kavita, Girshick, Ross: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In CVPR, Las Vegas, NV, USA (2016)

    Google Scholar 

  62. Dai, J., Li, Yi., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks, pp. 379–387. In NIPS, Barcelona (2016)

    Google Scholar 

  63. Jeong, Jisoo, Park, Hyojin, and Kwak, Nojun, Enhancement of ssd by concatenating feature maps for object detection,” In BMVC, (2017).

  64. YOLOv5, https://github.com/ultralytics/yolov5, (2022).

  65. Hongyu, Xu., Lv, X., Wang, X., Ren, Z., Bodla, N., Chellappa, R.: Deep regionlets: blended representation and deep learning for generic object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 1914–1927 (2021)

    Google Scholar 

  66. Shuai, Wu., Yong, Xu., Zhang, B., Yang, J., Zhang, D.: Deformable template network (DTN) for object detection. IEEE Trans. Multimedia 24, 2058–2068 (2022)

    Google Scholar 

  67. Chen, L., Zheng, H., Yan, Z., Li, Ye.: Discriminative region mining for object detection. IEEE Trans. Multimedia 23, 4297–4310 (2021)

    Google Scholar 

  68. Dai Z., Cai B., Lin Y., Chen J., UP-DETR: unsupervised Pre-training for object detection with transformers. In CVPR, (2021).

  69. Dai X., Chen Y., Yang J., Zhang P., Yuan L.and Zhang L., Dynamic DETR: end-to-end object detection with dynamic attention. In ICCV, (2021).

  70. Liang, T., Chu, X., Liu, Y., Wang, Y., Tang, Z., Chu, W., Chen, J., Ling, H.: CBNet: a Composite BACKBONE NETWORK ARCHITECTURE FOR OBJECT DETECTIOn. IEEE Trans. Image Process. 31, 6893–6906 (2022)

    Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

YM contributed to the central idea and programming and wrote the draft of the manuscript; YW collected the data and did the programming. All authors discussed the results and revised the manuscript.

Corresponding author

Correspondence to Yingdong Ma.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing interests.

Consent to participate

Not applicable.

Consent for publication

Manuscript has been approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Wang, Y. Feature refinement with multi-level context for object detection. Machine Vision and Applications 34, 49 (2023). https://doi.org/10.1007/s00138-023-01402-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01402-5

Keywords

Navigation