MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation
<p>Examples of areas where semantic segmentation can be used. (<b>a</b>) Medical image; (<b>b</b>) 3D point clouds image; (<b>c</b>) Remote sensing image; (<b>d</b>) Lane mark detection image.</p> "> Figure 2
<p>Overall network framework of the proposed MCEENet. The query image and support image are fed into the feature extraction network (weight-shared) to extract middle-level features (inside the larger gray dotted box), green/blue represent ViT/Resnet-50 feature extraction network. The extracted middle-level features then enhanced by the MCE module. The prior generation unit generates the prior mask of the query image using support image, support mask, and query image. The EAS module uses the Sobel operator to obtain the edge guidance feature of query image. Finally, the segmentation results are obtained through a feature aggregation unit and an upsampling unit.</p> "> Figure 3
<p>The visual illustration of the MCE module, which receives two ResNet-50 features and one ViT feature, and then uses multi-scale pooling operations with different pooling rates and four parallel ASPP modules to generate enhanced features.</p> "> Figure 4
<p>The visual illustration of the EAS module, which receives three shallow ResNet-50 features of the query image, and uses the Sobel operator to generate the edge guidance feature.</p> "> Figure 5
<p>Qualitative ablation results in 1-way 1-shot segmentation on PASCAL-<math display="inline"><semantics> <msup> <mn>5</mn> <mi>i</mi> </msup> </semantics></math>. Specifically, the first column is the support images with ground-truths, the second column is the query images with ground-truths, and the third, fourth, fifth, and sixth columns are the segmentation results of the query images obtained by MCEENet without Vision Transformer, MCEENet without the MCE modules, MCEENet without the EAS module, and MCEENet, respectively. (<b>a</b>) support; (<b>b</b>) ground-truth; (<b>c</b>) MCEENet without Vision Transformer; (<b>d</b>) MCEENet without the MCE modules; (<b>e</b>) MCEENet without the EAS module; (<b>f</b>) MCEENet.</p> "> Figure 6
<p>Qualitative segmentation results in 1-way 1-shot segmentation on PASCAL-<math display="inline"><semantics> <msup> <mn>5</mn> <mi>i</mi> </msup> </semantics></math>. Specifically, the first column is the support images with ground-truths, the second column is the query images with ground-truths, and the third, fourth, fifth, and sixth columns are the segmentation results of the query images obtained by CANet, ASGNet, PFENet, and MCEENet, respectively. (<b>a</b>) support; (<b>b</b>) ground-truth; (<b>c</b>) CANet; (<b>d</b>) ASGNet; (<b>e</b>) PFENet; (<b>f</b>) MCEENet.</p> ">
Abstract
:1. Introduction
- We proposed two MCE modules to enhance the contextual information of the support and query image features. Each MCE module first concatenates the ResNet-50 and Vision Transformer features and employs pooling operations with different pooling rates to generate multi-scale features. Then, it fuses the features of adjacent scales through cross-scale feature fusion, and uses multi-scale dilated convolutions to mine and enrich the contextual information of the fused features;
- We designed an EAS module to improve edge parts of the segmentation results. The EAS module combines the shallow features of the query image extracted by ResNet-50 (including details of objects) with the edge features calculated by the Sobel operator (including boundaries of objects) to generate an edge guidance feature. Subsequently, this edge guidance feature was used as a clue for segmentation prediction, thereby improving edge details in FSS;
- The effectiveness of MCEENet was demonstrated on the PASCAL-5 dataset. The comparative results suggest that MCEENet achieves superior semantic segmentation performance compared with state-of-the-art methods for FSS.
2. Related Work
3. Methodology
3.1. Problem Definition
3.2. Architecture Overview
3.3. Feature Extraction Networks
3.4. MCE Module
3.5. EAS Module
3.6. Loss Function
4. Experimental Studies
4.1. Dataset and Evaluation Metrics
4.2. Experimental Design
4.3. Ablation Study
4.4. Comparison with State-of-the-Art Methods
4.5. Computational Complexity
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Voulodimos, A.; Protopapadakis, E.; Katsamenis, I.; Doulamis, A.; Doulamis, N. A few-shot U-net deep learning model for COVID-19 infected area segmentation in CT images. Sensors 2021, 21, 2215. [Google Scholar] [CrossRef] [PubMed]
- Bello, S.A.; Yu, S.; Wang, C.; Adam, J.M.; Li, J. Deep learning on 3D point clouds. Remote Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
- He, M.; Jiang, P.; Deng, F. A study of microseismic first arrival pickup based on image semantic segmentation. In Proceedings of the 2022 3rd International Conference on Geology, Mapping and Remote Sensing (ICGMRS), Zhoushan, China, 22–24 April 2022; pp. 269–274. [Google Scholar]
- Lu, C.; Xia, M.; Lin, H. Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation. Neural Comput. Appl. 2022, 34, 6149–6162. [Google Scholar] [CrossRef]
- Qu, Y.; Xia, M.; Zhang, Y. Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow. Comput. Geosci. 2021, 157, 104940. [Google Scholar] [CrossRef]
- Chen, B.; Xia, M.; Qian, M.; Huang, J. MANet: A multi-level aggregation network for semantic segmentation of high-resolution remote sensing images. Int. J. Remote Sens. 2022, 43, 5874–5894. [Google Scholar] [CrossRef]
- Gao, J.; Weng, L.; Xia, M.; Lin, H. MLNet: Multichannel feature fusion lozenge network for land segmentation. J. Appl. Remote Sens. 2022, 16, 016513. [Google Scholar] [CrossRef]
- Miao, S.; Xia, M.; Qian, M.; Zhang, Y.; Liu, J.; Lin, H. Cloud/shadow segmentation based on multi-level feature enhanced network for remote sensing imagery. Int. J. Remote Sens. 2022, 43, 5940–5960. [Google Scholar] [CrossRef]
- Song, L.; Xia, M.; Weng, L.; Lin, H.; Qian, M.; Chen, B. Axial Cross Attention Meets CNN: Bibranch Fusion Network for Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 32–43. [Google Scholar] [CrossRef]
- Dong, G.; Yan, Y.; Shen, C.; Wang, H. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3258–3274. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Shaban, A.; Bansal, S.; Liu, Z.; Essa, I.; Boots, B. One-shot learning for semantic segmentation. arXiv 2017, arXiv:1709.03410. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Nguyen, K.; Todorovic, S. Feature weighting and boosting for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 622–631. [Google Scholar]
- Yang, B.; Liu, C.; Li, B.; Jiao, J.; Ye, Q. Prototype mixture models for few-shot semantic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 763–778. [Google Scholar]
- Liu, Y.; Zhang, X.; Zhang, S.; He, X. Part-aware prototype network for few-shot semantic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 142–158. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4080–4090. [Google Scholar]
- Fan, Q.; Pei, W.; Tai, Y.W.; Tang, C.K. Self-support few-shot semantic segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 701–719. [Google Scholar]
- Zhang, C.; Lin, G.; Liu, F.; Guo, J.; Wu, Q.; Yao, R. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proceedings of the of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9587–9595. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Bhunia, A.K.; Bhunia, A.K.; Ghose, S.; Das, A.; Roy, P.P.; Pal, U. A deep one-shot network for query-based logo retrieval. Pattern Recognit. 2019, 96, 106965. [Google Scholar] [CrossRef] [Green Version]
- Tian, P.; Wu, Z.; Qi, L.; Wang, L.; Shi, Y.; Gao, Y. Differentiable meta-learning model for few-shot semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12087–12094. [Google Scholar]
- Dong, N.; Xing, E.P. Few-shot semantic segmentation with prototype learning. In Proceedings of the British Machine Vision Conference, Northumbria University, Newcastle, UK, 2–6 September 2018; Volume 3, pp. 6–18. [Google Scholar]
- Yang, Y.; Meng, F.; Li, H.; Wu, Q.; Xu, X.; Chen, S. A new local transformation module for few-shot segmentation. In Proceedings of the International Conference on Multimedia Modeling, Daejeon, Republic of Korea, 5–8 January 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 76–87. [Google Scholar]
- Gairola, S.; Hemani, M.; Chopra, A.; Krishnamurthy, B. Simpropnet: Improved similarity propagation for few-shot image segmentation. arXiv 2020, arXiv:2004.15014. [Google Scholar]
- Zhang, X.; Wei, Y.; Yang, Y.; Huang, T.S. SG-One: Similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 2020, 50, 3855–3865. [Google Scholar] [CrossRef]
- Li, G.; Jampani, V.; Sevilla-Lara, L.; Sun, D.; Kim, J.; Kim, J. Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8334–8343. [Google Scholar]
- Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. Mining latent classes for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 8721–8730. [Google Scholar]
- Liu, C.; Fu, Y.; Xu, C.; Yang, S.; Li, J.; Wang, C.; Zhang, L. Learning a few-shot embedding model with contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence, held virtually, 2–9 February 2021; Volume 35, pp. 8635–8643. [Google Scholar]
- Xie, G.S.; Liu, J.; Xiong, H.; Shao, L. Scale-aware graph neural network for few-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5475–5484. [Google Scholar]
- Lu, Z.; He, S.; Zhu, X.; Zhang, L.; Song, Y.Z.; Xiang, T. Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 8741–8750. [Google Scholar]
- Siam, M.; Doraiswamy, N.; Oreshkin, B.N.; Yao, H.; Jagersand, M. Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings. arXiv 2020, arXiv:2001.09540. [Google Scholar]
- Liu, L.; Cao, J.; Liu, M.; Guo, Y.; Chen, Q.; Tan, M. Dynamic extension nets for few-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1441–1449. [Google Scholar]
- Zhang, C.; Lin, G.; Liu, F.; Yao, R.; Shen, C. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5217–5226. [Google Scholar]
- Tian, Z.; Zhao, H.; Shu, M.; Yang, Z.; Li, R.; Jia, J. Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1050–1065. [Google Scholar] [CrossRef]
- Zhang, B.; Xiao, J.; Qin, T. Self-guided and cross-guided learning for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8312–8321. [Google Scholar]
- Liu, W.; Zhang, C.; Lin, G.; Liu, F. Crnet: Cross-reference networks for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4165–4173. [Google Scholar]
- Yang, X.; Wang, B.; Chen, K.; Zhou, X.; Yi, S.; Ouyang, W.; Zhou, L. Brinet: Towards bridging the intra-class and inter-class gaps in one-shot segmentation. arXiv 2020, arXiv:2008.06226. [Google Scholar]
- Xie, G.S.; Xiong, H.; Liu, J.; Yao, Y.; Shao, L. Few-shot semantic segmentation with cyclic memory network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7293–7302. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Hariharan, B.; Arbeláez, P.; Bourdev, L.; Maji, S.; Malik, J. Semantic contours from inverse detectors. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 991–998. [Google Scholar]
- Min, J.; Kang, D.; Cho, M. Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6941–6952. [Google Scholar]
- Li, X.; Wei, T.; Chen, Y.P.; Tai, Y.W.; Tang, C.K. FSS-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2869–2878. [Google Scholar]
Dataset | Test Classes |
---|---|
Fold-0 | aeroplane, bicycle, bird, boat, bottle |
Fold-1 | bus, car, cat, chair, cow |
Fold-2 | dining table, dog, horse, motorbike, person |
Fold-3 | potted plant, sheep, sofa, train, tv/monitor |
Methods | 1-Shot | 5-Shot |
---|---|---|
MCEENet without Vision Transformer | 62.6 | 63.2 |
MCEENet without the MCE modules | 61.3 | 62.6 |
MCEENet without the EAS module | 63.1 | 64.2 |
MCEENet | 63.5 | 64.7 |
1-Shot | 5-Shot | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Methods | Backbone | Fold-0 | Fold-1 | Fold-2 | Fold-3 | Average | Fold-0 | Fold-1 | Fold-2 | Fold-3 | Average |
OSLSM [13] (BMVC’18) | VGG-16 | 33.6 | 55.3 | 40.9 | 33.5 | 40.8 | 35.9 | 58.1 | 42.7 | 39.1 | 44.0 |
FWB [15] (ICCV’19) | 47.0 | 59.6 | 52.6 | 48.3 | 51.9 | 50.9 | 62.9 | 56.5 | 50.1 | 55.1 | |
PANet [14] (ICCV’19) | 42.3 | 58.0 | 51.1 | 41.2 | 48.1 | 51.8 | 64.6 | 59.8 | 46.5 | 55.7 | |
SG-One [30] (TCYB’20) | 42.2 | 58.4 | 48.4 | 38.4 | 46.3 | 41.9 | 58.6 | 48.6 | 39.4 | 47.1 | |
CRNet [41] (CVPR’20) | − | − | − | − | 55.2 | − | − | − | − | 58.5 | |
FSS-1000 [52] (CVPR’20) | − | − | − | − | − | 37.4 | 60.9 | 46.6 | 42.2 | 56.8 | |
HSNet [51] (ICCV’21) | 59.6 | 65.7 | 59.6 | 54.0 | 59.7 | 64.9 | 69.0 | 64.1 | 58.6 | 64.1 | |
CANet [38] (CVPR’19) | ResNet-50 | 52.5 | 65.9 | 51.3 | 51.9 | 55.4 | 55.5 | 67.8 | 51.9 | 53.2 | 57.1 |
PFENet [39] (TPAMI’20) | 61.7 | 69.5 | 55.4 | 56.3 | 60.8 | 63.1 | 70.7 | 55.8 | 57.9 | 61.9 | |
CWT [35] (ICCV’21) | 56.3 | 62.0 | 47.2 | 56.4 | 61.3 | 68.5 | 56.6 | 63.7 | |||
SCL_PFENet [40] (CVPR’21) | 63.0 | 70.0 | 56.5 | 61.8 | 64.5 | 70.9 | 57.3 | 58.7 | 62.9 | ||
ASGNet [31] (CVPR’21) | 58.8 | 67.9 | 56.8 | 53.7 | 59.3 | 63.7 | 70.6 | 64.1 | 57.4 | 63.9 | |
SAGNN [34] (CVPR’21) | 64.7 | 69.6 | 57.0 | 57.3 | 62.1 | 64.9 | 70.0 | 57.0 | 62.8 | ||
MCEENet | ResNet-50 and Vision Transformer | 59.4 | 57.0 | 60.0 | 58.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, H.; Zhang, R.; He, X.; Li, N.; Wang, Y.; Shen, S. MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation. Sensors 2023, 23, 2922. https://doi.org/10.3390/s23062922
Zhou H, Zhang R, He X, Li N, Wang Y, Shen S. MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation. Sensors. 2023; 23(6):2922. https://doi.org/10.3390/s23062922
Chicago/Turabian StyleZhou, Hongjie, Rufei Zhang, Xiaoyu He, Nannan Li, Yong Wang, and Sheng Shen. 2023. "MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation" Sensors 23, no. 6: 2922. https://doi.org/10.3390/s23062922
APA StyleZhou, H., Zhang, R., He, X., Li, N., Wang, Y., & Shen, S. (2023). MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation. Sensors, 23(6), 2922. https://doi.org/10.3390/s23062922