Abstract
Due to the high level of information similarity between camouflage soldier objects and their background, traditional deep learning-based object detection networks encounter distinct error detection rates and miss detection rates when attempting to detect camouflage soldiers. To address these challenges, we proposed a camouflage soldier object detection network (AFSNet) based on attention mechanism and multi-scale feature fusion strategy. We employed an attention module to enhance the network’s capability for feature extraction. Furthermore, we proposed a novel strategy for multi-scale feature fusion based on pyramidal feature shrinking, aiming to mitigate interference caused by interpolation and prevent information loss resulting from pooling during the process of feature fusion. Moreover, we introduced a novel information handle module that enhances the network’s capability for feature fusion by regulating the information transmission pathway. Experiments demonstrated that our network exhibits a better camouflage object detection performance than state-of-arts networks. Compared to YOLOv7, our network can achieve 93\(\%\) AP, which is increased by 6.7\(\%\) with almost no computation overhead.
Similar content being viewed by others
Availability of data and materials
The datasets generated during the current study are not avaiable.
Code Availability
The Code generated during the current study are not avaiable.
References
Tankus A, Yeshurun Y (1998) Detection of regions of interest and camouflage breaking by direct convexity estimation. In: Proceedings 1998 IEEE workshop on visual surveillance. pp 42–48. https://doi.org/10.1109/WVS.1998.646019
Wu G-J, Lv X-L, Xing H-N, Zhang L-T, Teng Y-H (2015) The application of 3d convex surface analysis in camouflage detection. J PLA Univ Technol (Nat Sci Ed) 16(6):582–586
Xian X-D, Li K-W (2013) Detection of camouflage miner based on color and texture features. Comput Appl 33(2):539–542
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process Syst 29
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7464–7475
Ge Z, Liu S-T, Wang F, Li Z-M, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Jocher G, Chaurasia A, Qiu J (2022) YOLO by Ultralytics. https://github.com/ultralytics/ultralytics
Fan D-P, Ji G-P, Sun G, Cheng M-M, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2777–2787
Liu Y, Zhang D, Zhang Q, Han J (2021) Integrating part-object relationship and contrast for camouflaged object detection. IEEE Trans Inf Forensic Secur 16:5154–5166
Le T-N, Nguyen TV, Nie Z, Tran M-T, Sugimoto A (2019) Anabranch network for camouflaged object segmentation. Comput Vision Image Understand 184:45–56
Skurowski P, Abdulameer H, Baszczyk J, Depta T, Kornacki A, Kozie P (2018) Animal camouflage analysis: Chameleon database. Unpublished Manuscript 2(6):7
Fan D-P, Ji G-P, Sun G-L, Cheng M-M, Shen J-B, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2777–2787
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Zhang Q-L, Yang Y-B (2021) Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2235–2239
Yu T, Li X, Cai Y, Sun M, Li P (2021) S \(\hat{2}\)-mlpv2: improved spatial-shift mlp architecture for vision. arXiv:2108.01072
Zhou H, Li J, Peng J, Zhang S, Zhang S (2021) Triplet attention: rethinking the similarity in transformers. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp 2378–2388
Zhu L, Wang X, Ke Z, Zhang W, Lau RW (2023) Biformer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10323–10333
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp 3–19
Falahat S, Karami A (2023) Maize tassel detection and counting using a yolov5-based model. Multimedia Tools Appl 82(13):19521–19538
Xiong L, Yi H, Huang X, Huang W (2023) An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales. Multimedia Tools Appl 82(9):13929–13949
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412
Hendrycks D, Gimpel K (2016) Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv:1606.08415
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Luo N, Pan Y, Sun R, Zhang T, Xiong Z, Wu F (2023) Camouflaged instance segmentation via explicit de-camouflaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17918–17927
He C, Li K, Zhang Y, Tang L, Zhang Y, Guo Z, Li X (2023) Camouflaged object detection with feature decomposition and edge reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 22046–22055
Li A, Zhang J, Lv Y, Zhang T, Zhong Y, He M, Dai Y (2023) Joint salient object detection and camouflaged object detection via uncertainty-aware learning. arXiv:2307.04651
Lv Y, Zhang J, Dai Y, Li A, Barnes N, Fan D-P (2023) Towards deeper understanding of camouflaged object detection. IEEE Trans Circ Syst Video Technol
YAN L (2023) Research on masquerade object detection algorithm for aggregating multi scale scene context features. Master’s thesis, Nanjing University of Information Science and Technology
Lamdouar H, Xie W, Zisserman A (2023) The making and breaking of camouflage. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 832–842
Xue F, Yong C, Xu S, Dong H, Luo Y, Jia W (2016) Camouflage performance analysis and evaluation framework based on features fusion. Multimedia Tools Appl 75:4065–4082
Huang Z, Dai H, Xiang T-Z, Wang S, Chen H-X, Qin J, Xiong H (2023) Feature shrinkage pyramid for camouflaged object detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5557–5566
Ma M, Xia C, Li J (2021) Pyramidal feature shrinking for salient object detection. Proceedings of the AAAI conference on artificial intelligence 35:2311–2318
Deng C, Wang M, Liu L, Liu Y, Jiang Y (2021) Extended feature pyramid network for small object detection. IEEE Trans Multimedia 24:1968–1979
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 658–666
Liu Y, Zhang D, Zhang Q, Han J (2021) Part-object relational visual saliency. IEEE Trans Pattern Anal Mach Intell 44(7):3688–3704
Liu Y, Dong X, Zhang D, Xu S (2024) Deep unsupervised part-whole relational visual saliency. Neurocomput 563:126916
Xu S, Gu J, Hua Y, Liu Y (2023) Dktnet: dual-key transformer network for small object detection. Neurocomput 525:29–41
Liu Y, Cheng D, Zhang D, Xu S, Han J (2024) Capsule networks with residual pose routing. IEEE Trans Neural Netw Learn Syst
Funding
This research was supported by the Defense Industrial Technology Development Program(JCKY2021602B029)
Author information
Authors and Affiliations
Contributions
Conceptualization, Y.P. and J.W.; methodology, Y.P.; software, Y.P.; validation, Y.P. and Z.Y.; formal analysis, Y.P. and Y.Y.; investigation, Y.P. and Y.S.; resources,J.W.; data curation, Y.P. and Z.Y.; writing-original draft preparation, Y.P.; writing-review and editing, J.W.; visualization, J.W.; supervision, J.W.; project administration, J.W.; funding acquisition,J.W
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
All authors approved the final manuscript and the submission to this journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, Y., Wang, J., Yu, Z. et al. Camouflage soldier object detection network based on the attention mechanism and pyramidal feature shrinking. Multimed Tools Appl 83, 79917–79938 (2024). https://doi.org/10.1007/s11042-024-18618-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18618-w