Feature refinement with multi-level context for object detection

322 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Robust multi-scale object detection is challenging as it requires both spatial details and semantic knowledge to deal with problems including high scale variation and cluttered background. Appropriate fusion of high-resolution features with deep semantic features is the key issue to achieve better performance. Different approaches have been developed to extract and combine deep features with shallow layer spatial features, such as feature pyramid network. However, high-resolution feature maps contain noisy and distractive features. Directly combines shallow features with semantic features might degrade detection accuracy. Besides, contextual information is also important for multi-scale object detection. In this work, we present a feature refinement scheme to tackle the feature fusion problem. The proposed feature refinement module increases feature resolution and refine feature maps progressively with the guidance from deep features. Meanwhile, we propose a context extraction method to capture global and local contextual information. The method utilizes a multi-level cross-pooling unit to extract global context and a cascaded context module to extract local context. The proposed object detection framework has been evaluated on PASCAL VOC and MS COCO datasets. Experimental results demonstrate that the proposed method performs favorably against state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale semantic enhancement network for object detection

Article Open access 03 May 2023

Object detector with enriched global context information

Article 11 August 2020

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

Article 15 December 2023

Data availability

The data that support the findings of this study are available in “The pascal visual object classes (VOC) challenge,” https://doi.org/10.1007/s11263-009-0275-4 and the “Microsoft COCO,” https://doi.org/10.1007/978-3-319-10602-1_48.

Code availability

Not applicable.

References

Ma, Y., Deng, L., Chen, X., Guo, N.: Integrating orientation cue with EOH-OLBP-based multilevel features for human detection. IEEE Trans. Circuits Syst. Video Technol. 23(10), 1755–1766 (2013)
Google Scholar
Keren, Fu., Zhao, Qijun, Irene Yu-Hua, Gu.: Refinet: a deep segmentation assisted refinement network for salient object detection. IEEE Trans. Multimedia 21(2), 457–469 (2019)
Google Scholar
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: design backbone for object detection. In ECCV, Munich, Germany (2018)
Google Scholar
Zhang, P., Liu, W., Zeng, Yi., Le, Y., Huchuan, Lu.: Looking for the detail and context devils: high-resolution salient object detection. IEEE Trans. Image Processing 30, 3204–3216 (2021)
Google Scholar
Qiu, Heqian, Li, Hongliang, Qingbo, Wu., Meng, Fanman, Linfeng, Xu., Ngan, King Ngi, Shi, Hengcan: Hierarchical context features embedding for object detection. IEEE Trans. Multimedia 22(12), 3039–3050 (2020)
Google Scholar
Lin, Tsung-Yi., Dollar, Piotr, Girshick, Ross, He, Kaiming, Hariharan, Bharath, Belongie, Serge: Feature pyramid networks for object detection. In CVPR, Honolulu, HI, USA (2017)
Google Scholar
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In CVPR, Honolulu, HI, USA (2017)
Google Scholar
Mingliang, Xu., Cui, Lisha, Lv, Pei, Jiang, Xiaoheng, Niu, Jianwei, Zhou, Bing, Wang, Meng: MDSSD: multi-scale deconvolutional single shot detector for small objects. Sci. China Inf. Sci. 63, 120113 (2020)
Google Scholar
Kong T., Sun F., Huang W. and Liu H., Deep feature pyramid reconfiguration for object detection. In ECCV, Munich, Germany, (2018).
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In ICCV, Seoul, Korea (2019)
Google Scholar
Zhao, J., Cao, Y., Fan, D., Cheng, M., Li, X., Zhang, L.: Contrast prior and fluid pyramid integration for rgbd salient object detection. In CVPR, Long Beach, CA, USA (2019)
Google Scholar
Alamri, Faisal, Pugeault, Nicolas: Improving object detection performance using scene contextual constraints. IEEE Trans. Cogn. Dev. Sys. (2020). https://doi.org/10.1109/TCDS.2020.3008213
Article Google Scholar
Yu F., and Koltun V., Multi-scale context aggregation by dilated convolutions. In ICLR, Caribe Hilton, San Juan, Puerto Rico, (2016).
Yang, Maoke, Kun, Yu., Zhang, Chi, Li, Zhiwei, Yang, Kuiyuan: DenseASPP for semantic segmentation in street scenes. In CVPR, Salt Lake City, UT, USA (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In CVPR, Honolulu, HI, USA (2017)
Google Scholar
Tian, Zhi, Shen, Chunhua, Chen, Hao, He, Tong: Fcos: fully convolutional one-stage object detection. In ICCV, Seoul, Korea (2019)
Google Scholar
Jiahao, Xu., Tian, H., Wang, Z., Wang, Y., Kang, W., Chen, F.: joint input and output space learning for multi-label image classification. IEEE Trans. Multimedia 23, 1696–1707 (2020)
Google Scholar
Wei, LiHua, Ma, YingDong: Multi-module spatial semantic network for semantic segmentation. In ICIEV, Kitakyushu, Japan (2020)
Google Scholar
Wang, X., Ma, Y.: Multi-level feature and context pyramid network for object detection. Int. J. Comput. Vision Signal Process 1, 1–8 (2020)
Google Scholar
Girshick R., Fast R-CNN. In ICCV, Santiago, Chile, Dec. (2015).
Ren, Shaoqing, He, Kaiming, Girshick, Ross, Sun, Jian: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137 (2017)
Google Scholar
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., and Berg A.C., Ssd: single shot multibox detector. In ECCV, pp. 21–37, Amsterdam, The Netherlands, (2016).
Redmon, Joseph, Divvala, Santosh, Girshick, Ross, Farhadi, Ali: You only look once: unified, real-time object detection. In CVPR, Las Vegas, NV, USA (2016)
Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)
Google Scholar
Shen, Zhiqiang, Liu, Zhuang, Li, Jianguo, Jiang, Yu-Gang., Chen, Yurong, Xue, Xiangyang: DSOD: learning deeply supervised object detectors from scratch. In ICCV, Venice, Italy (2017)
Google Scholar
Jie, Hu., Shen, Li., Albanie, S., Sun, G., Enhua, Wu.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Google Scholar
Park, Jongchan, Woo, Sanghyun, Lee, Joon-Young.: In So Kweon, BAM: bottleneck attention module. In BMVC, Newcastle, UK (2018)
Google Scholar
Woo, Sanghyun, Park, Jongchan, Lee, Joon-Young.: In so Kweon, CBAM: convolutional block attention module. In ECCV, Munich, Germany (2018)
Google Scholar
Jun, Fu., Liu, Jing, Tian, Haijie, Li, Yong, Bao, Yongjun, Fang, Zhiwei, Hanqing, Lu.: Dual attention network for scene segmentation. In CVPR, Long Beach, CA, USA (2019)
Google Scholar
Li H., Xiong P., An J., Wang L., Pyramid attention network for semantic segmentation. arXiv:1805.10180, (2018).
Chen, S., Tan, X., Wang, B., Huchuan, Lu., Xuelong, Hu., Yun, Fu.: Reverse attention-based residual network for salient object detection. IEEE Trans. Image Processing 29, 3763–3776 (2020)
MATH Google Scholar
Chen L.-C., Papandreou G., Schroff F., and Adam H., Rethinking Atrous Convolution for Semantic Image Segmentation. CoRR abs/1706.05587 (2017).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Patten Anal. Mach. Intell. 40(4), 834–848 (2018)
Google Scholar
Fu, Cheng-Yang, Liu, Wei, Ranga, Ananth, Tyagi, Ambrish, and Berg, Alexander C, Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659, (2017).
Changqian, Yu., Wang, Jingbo, Peng, Chao, Gao, Changxin, Gang, Yu., Sang, Nong: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In ECCV, Munich, Germany (2018)
Google Scholar
Wang A., Ou W., Ren Chunhong., Liu Y., Cross-level feature aggregation and fusion network for light field salient object detection. International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom), Rhodes, Greece, Nov. (2020).
Poudel R.P.K., Bonde U., Liwicki S., Zach C., ContextNet: exploring context and detail for semantic segmentation in real-time. In BMVC, Newcastle, UK, (2018).
Nie, J., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., Shao, L.: Enriched feature guided refinement network for object detection. In ICCV, Seoul, Korea (2019)
Google Scholar
Zhang, P., Liu, W., Zeng, Yi., Lei, Y., Huchuan, Lu.: looking for the detail and context devils: high-resolution salient object detection. IEEE Trans. Image Processing 30, 3204–3216 (2021)
Google Scholar
Hou, Qibin, Zhang, Li., Cheng, Ming-Ming., Feng, Jiashi: Strip pooling: rethinking spatial pooling for scene parsing. In CVPR, Seattle, WA, USA (2020)
Google Scholar
Cai, Zhaowei, Vasconcelos, Nuno: Cascade r-cnn: Delving into high quality object detection. In CVPR, Salt Lake City, UT, USA (2018)
Google Scholar
He, Kaiming, Gkioxari, Georgia, Dollár, Piotr, Girshick, Ross: Mask r-cnn. In ICCV, Venice, Italy (2017)
Google Scholar
Liu, Ziming, Gao, Guangyu, Sun, Lin, Fang, Li.: IPG-Net: image pyramid guidance network for small object detection. In CVPR Workshops, Seattle, WA, USA (2020)
Google Scholar
Li, Yanghao, Chen, Yuntao, Wang, Naiyan, Zhang, Zhao-Xiang.: Scale-aware trident networks for object detection. In ICCV, Seoul, Korea (2019)
Google Scholar
Zhao Q., Sheng T., Wang Y., Tang Z., Chen Y., Cai L, and Ling H., M2Det: a single-shot object detector based on muti-level feature pyramid network. In AAAI, pp.9259–9266, (2019).
Zhang, S., Wen, L., Lei, Z., Li, S.Z.: RefineDet++: single-shot refinement neural network for object detection. IEEE Trans. Circuits Syst. Video Technol. 31(2), 674–687 (2021)
Google Scholar
Kim, Seung-Wook., Kook, Hyong-Keun., Sun, Jee-Young., Kang, Mun-Cheon., Ko, Sung-Jea.: Parallel feature pyramid network for object detection. In ECCV, Amsterdam, Netherlands (2018)
Google Scholar
Law, Hei, Deng, Jia: Cornernet: detecting objects as paired keypoints. In ECCV, Munich, Germany (2018)
Google Scholar
Zhou X., Wang D., Krähenbühl P., Objects as points. [J]. arXiv preprint arXiv:1904. 07850, (2019).
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
MATH Google Scholar
Redmon J., and Farhadi Ali., YOLOv3: an incremental improvement. arXiv:1804.02767, (2018).
Bochkovskiy A., Wang C.-Y. and Liao H.Y.M., YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934, (2020).
Zhu B., Wang J., Jiang Z., Zong F., Liu S., Li Z., and Sun J., Autoassign: differentiable label assignment for dense object detection. arXiv preprint arXiv:2007.03496, (2020).
Kim K., Lee H.S., Probabilistic anchor assignment with iou prediction for object detection. In ECCV, pp. 355–371, (2020).
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. Adv. Neural Inf. Processing Syst. 32, 1 (2019)
Google Scholar
Zhu C., Chen F., Shen Z., and Savvides M., Soft anchor-point object detection. In ECCV, pp.91–107. (2020).
Ge Z., Liu S., Li Z., Yoshie O., and Sun J., Ota: optimal transport assignment for object detection. In CVPR, pp. 303–312, (2021).
Tan M., Pang R., and Le Q.V., Efficientdet: scalable and efficient object detection. In CVPR, pp. 10781–10790, (2020).
Kong, Tao, Yao, Anbang, Chen, Yurong, Sun, Fuchun: Hypernet: towards accurate region proposal generation and joint object detection. In CVPR, Las Vegas, NV, USA (2016)
Google Scholar
Zhu, Yousong, Zhao, Chaoyang, Jinqiao Wang, Xu., Zhao, Yi Wu., Hanqing, Lu.: CoupleNet: coupling global structure with local parts for object detection. In ICCV, Venice, Italy (2017)
Google Scholar
Bell, S., Lawrence Zitnick, C., Bala, Kavita, Girshick, Ross: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In CVPR, Las Vegas, NV, USA (2016)
Google Scholar
Dai, J., Li, Yi., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks, pp. 379–387. In NIPS, Barcelona (2016)
Google Scholar
Jeong, Jisoo, Park, Hyojin, and Kwak, Nojun, Enhancement of ssd by concatenating feature maps for object detection,” In BMVC, (2017).
YOLOv5, https://github.com/ultralytics/yolov5, (2022).
Hongyu, Xu., Lv, X., Wang, X., Ren, Z., Bodla, N., Chellappa, R.: Deep regionlets: blended representation and deep learning for generic object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 1914–1927 (2021)
Google Scholar
Shuai, Wu., Yong, Xu., Zhang, B., Yang, J., Zhang, D.: Deformable template network (DTN) for object detection. IEEE Trans. Multimedia 24, 2058–2068 (2022)
Google Scholar
Chen, L., Zheng, H., Yan, Z., Li, Ye.: Discriminative region mining for object detection. IEEE Trans. Multimedia 23, 4297–4310 (2021)
Google Scholar
Dai Z., Cai B., Lin Y., Chen J., UP-DETR: unsupervised Pre-training for object detection with transformers. In CVPR, (2021).
Dai X., Chen Y., Yang J., Zhang P., Yuan L.and Zhang L., Dynamic DETR: end-to-end object detection with dynamic attention. In ICCV, (2021).
Liang, T., Chu, X., Liu, Y., Wang, Y., Tang, Z., Chu, W., Chen, J., Ling, H.: CBNet: a Composite BACKBONE NETWORK ARCHITECTURE FOR OBJECT DETECTIOn. IEEE Trans. Image Process. 31, 6893–6906 (2022)
Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, 235 West Daxue Road, Hohhot, China
Yingdong Ma & Yanan Wang

Authors

Yingdong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YM contributed to the central idea and programming and wrote the draft of the manuscript; YW collected the data and did the programming. All authors discussed the results and revised the manuscript.

Corresponding author

Correspondence to Yingdong Ma.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing interests.

Consent to participate

Not applicable.

Consent for publication

Manuscript has been approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, Y., Wang, Y. Feature refinement with multi-level context for object detection. Machine Vision and Applications 34, 49 (2023). https://doi.org/10.1007/s00138-023-01402-5

Download citation

Received: 17 November 2022
Revised: 08 April 2023
Accepted: 19 April 2023
Published: 12 May 2023
DOI: https://doi.org/10.1007/s00138-023-01402-5

Feature refinement with multi-level context for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale semantic enhancement network for object detection

Object detector with enriched global context information

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Feature refinement with multi-level context for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale semantic enhancement network for object detection

Object detector with enriched global context information

Bi-AFN++CA: Bi-directional adaptive fusion network combining context augmentation for small object detection

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation