DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture
<p>(<b>a</b>) CNN-based methods excel at handling detailed information but struggle to capture long-range dependencies. They have difficulty understanding context when external conditions change significantly. (<b>b</b>) In contrast, transformer-based methods lead to unclear edge information in the results. (<b>c</b>) DSC-Net includes both a CNN-based branch and a transformer-based branch. This design effectively addresses both context and edge details.</p> "> Figure 2
<p>Overview of DSC-Net. An encoder–decoder structure with skip connections is employed, which establishes the relationship between the encoder and decoder. The encoder incorporates a transformer-based global context branch and a CNN-based detail branch, processing images and capturing multi-scale information. These branches are merged and upsampled by the decoder to generate segmentation outputs.</p> "> Figure 3
<p>The structure of Spatial Blending Module (SBM). Statistical features are captured along horizontal and vertical directions, reshaped through matrix multiplication. Finally, they are integrated with the input features. SBM can further enhance the interaction of global contextual information.</p> "> Figure 4
<p>The structure of the Inverted Residual Module (IRM) is designed to accelerate computation speed. The number of image channels is expanded to extract features from each channel. The channel count is then reduced back to the original.</p> "> Figure 5
<p>The structure of the hybrid attention module (HAM). Input features are processed through channel and spatial attention branches. Global pooling, multilayer perceptrons, and convolutions extract key channel and spatial features. These features are then integrated with the input features to produce the output features. HAM focuses more on the edge information of occlusions.</p> "> Figure 6
<p>Comparison of semantic segmentation results from Cityscapes dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) Deeplabv3. (<b>d</b>) Swin-Transformer. (<b>e</b>) TransUnet. (<b>f</b>) ViT. (<b>g</b>) DSC-Net. The rectangles highlight areas where our approach exhibits superior performance. DSC-Net delivers enhanced edge precision for objects including poles, traffic signs, and motorcycles.</p> "> Figure 7
<p>Comparison of an enlarged view of the results from the Cityscapes dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) Deeplabv3. (<b>d</b>) Swin-Transformer. (<b>e</b>) TransUnet. (<b>f</b>) ViT. (<b>g</b>) DSC-Net.</p> "> Figure 8
<p>Comparison of semantic segmentation results from Blind Roads and Crosswalks dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) SEM_FPN. (<b>d</b>) Deeplabv3. (<b>e</b>) Swin-Transformer. (<b>f</b>) TransUnet. (<b>g</b>) ViT-large. (<b>h</b>) DSC-Net. The rectangles highlight areas where our approach exhibits superior performance. DSC-Net precisely discerns horizontal blind roads and crosswalks. Additionally, it demonstrates enhanced accuracy on discontinuous vertical blind roads.</p> "> Figure 9
<p>Comparison of an enlarged view of the results from the Blind Roads and Crosswalks dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) SEM_FPN. (<b>d</b>) Deeplabv3. (<b>e</b>) Swin-Transformer. (<b>f</b>) TransUnet. (<b>g</b>) ViT. (<b>h</b>) DSC-Net.</p> "> Figure 10
<p>Comparison of semantic segmentation results from Blind Roads dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) Deeplabv3. (<b>d</b>) Swin-Transformer. (<b>e</b>) TransUnet. (<b>f</b>) ViT. (<b>g</b>) DSC-Net. The rectangles highlight areas where our approach exhibits superior performance. DSC-Net sustains improved contextual relationships on discontinuous blind roads and delivers more distinct edges in the presence of obstructions.</p> "> Figure 11
<p>Comparison of an enlarged view of the results from the Blind Roads dataset. (<b>a</b>) U-Net. (<b>b</b>) Bisenetv1. (<b>c</b>) Deeplabv3. (<b>d</b>) Swin-Transformer. (<b>e</b>) TransUnet. (<b>f</b>) ViT. (<b>g</b>) DSC-Net.</p> ">
Abstract
:1. Introduction
- We propose a parallel architecture combining CNN and transformer technologies to precisely detect blind roads. We have also created a semantic segmentation dataset for blind roads that includes samples from complex environments.
- The Inverted Residual Module with depthwise separable convolution enhances segmentation speed, while the hybrid attention module optimizes feature representation. The Spatial Blending Module is engineered to improve global information perception.
- Performance tests on the Cityscapes dataset, Blind Roads and Crosswalks dataset, and Blind Roads dataset were conducted to validate the efficacy of our method.
2. Related Work
2.1. Semantic Segmentation of Blind Roads
2.2. Global Context Information
2.3. Occlusion Edge Features
3. Methods
3.1. Architecture
3.2. Spatial Blending Module
3.3. Inverted Residual Module
3.4. Hybrid Attention Module
4. Experiments
4.1. Datasets
4.2. Implementation
5. Experimental Results and Analysis
5.1. Comparative Experiment
5.1.1. Cityscapes Dataset
5.1.2. Blind Roads and Crosswalks Dataset
5.1.3. Blind Roads Dataset
5.2. Module Effectiveness
5.2.1. Module Ablation
5.2.2. Loss Function Weight Ablation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lv, H.; Du, Y.; Ma, Y.; Yuan, Y. Object detection and monocular stable distance estimation for road environments: A fusion architecture using yolo-redeca and abnormal jumping change filter. Electronics 2024, 13, 3058. [Google Scholar] [CrossRef]
- Tapu, R.; Mocanu, B.; Zaharia, T. Wearable assistive devices for visually impaired: A state of the art survey. Pattern Recognit. Lett. 2020, 137, 37–52. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Li, Y.; Wang, Z.; Yin, L.; Zhu, Z.; Qi, G.; Liu, Y. X-net: A dual encoding–decoding method in medical image segmentation. Vis. Comput. 2023, 39, 2223–2233. [Google Scholar] [CrossRef]
- Xu, G.; Zhang, X.; He, X.; Wu, X. Levit-unet: Make faster encoders with transformer for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 42–53. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Dewi, C.; Chen, R.C.; Yu, H.; Jiang, X. Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient Intell. Humaniz. Comput. 2023, 14, 8135–8152. [Google Scholar] [CrossRef]
- Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized feature pyramid for object detection. IEEE Trans. Image Process. 2023, 32, 4341–4354. [Google Scholar] [CrossRef]
- Yuan, H.; Zhu, J.; Wang, Q.; Cheng, M.; Cai, Z. An improved DeepLab v3+ deep learning network applied to the segmentation of grape leaf black rot spots. Front. Plant Sci. 2022, 13, 795410. [Google Scholar] [CrossRef]
- Wu, Y.; Jiang, J.; Huang, Z.; Tian, Y. FPANet: Feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell. 2022, 52, 3319–3336. [Google Scholar] [CrossRef]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Zhang, J.; Li, X.; Tian, J.; Luo, H.; Yin, S. An integrated multi-head dual sparse self-attention network for remaining useful life prediction. Reliab. Eng. Syst. Saf. 2023, 233, 109096. [Google Scholar] [CrossRef]
- Kavianpour, P.; Kavianpour, M.; Jahani, E.; Ramezani, A. A CNN-BiLSTM model with attention mechanism for earthquake prediction. J. Supercomput. 2023, 79, 19194–19226. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Cambridge, MA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Wu, J.; Ji, W.; Fu, H.; Xu, M.; Jin, Y.; Xu, Y. MedSegDiff-V2: Diffusion-Based Medical Image Segmentation with Transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 6030–6038. [Google Scholar]
- Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 4701117. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhao, J. Algorithm for occluded blind track detection based on edge feature points screening. Sci. Technol. Eng. 2021, 21, 14567–14664. [Google Scholar]
- Wei, T.; Yuan, L. Highly real-time blind sidewalk recognition algorithm based on boundary tracking. Opto-Electron. Eng. 2017, 44, 676–684. [Google Scholar]
- Liu, X.; Zhao, X.; Wang, S. Blind sidewalk segmentation based on the lightweight semantic segmentation network. J. Phys. Conf. Ser. 2021, 1976, 012004. [Google Scholar] [CrossRef]
- Cao, Z.; Xu, X.; Hu, B.; Zhou, M. Rapid detection of blind roads and crosswalks by using a lightweight semantic segmentation network. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6188–6197. [Google Scholar] [CrossRef]
- Nguyen, T.N.A.; Phung, S.L.; Bouzerdoum, A. Hybrid deep learning-Gaussian process network for pedestrian lane detection in unstructured scenes. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5324–5338. [Google Scholar] [CrossRef]
- Chen, J.; Bai, X. Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with a Large-Scale Dataset TBRSD. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 1053–1063. [Google Scholar]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
- Gupta, A.; Narayan, S.; Joseph, K.; Khan, S.; Khan, F.S.; Shah, M. Ow-detr: Open-world detection transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9235–9244. [Google Scholar]
- Dehmeshki, J.; Amin, H.; Valdivieso, M.; Ye, X. Segmentation of pulmonary nodules in thoracic CT scans: A region growing approach. IEEE Trans. Med. Imaging 2008, 27, 467–480. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Mei, Y.; Fan, Y.; Zhang, Y.; Yu, J.; Zhou, Y.; Liu, D.; Fu, Y.; Huang, T.S.; Shi, H. Pyramid attention network for image restoration. Int. J. Comput. Vis. 2023, 131, 3207–3225. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Xu, Q.; Cong, R.; Huang, Q. Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10599–10606. [Google Scholar]
- Fu, L.; Zhang, D.; Ye, Q. Recurrent thrifty attention network for remote sensing scene recognition. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8257–8268. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Zhang, T.; Li, L.; Cao, S.; Pu, T.; Peng, Z. Attention-guided pyramid context networks for detecting infrared small target under complex background. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 4250–4261. [Google Scholar] [CrossRef]
- Yang, H.; Yang, D. CSwin-PNet: A CNN-Swin Transformer combined pyramid network for breast lesion segmentation in ultrasound images. Expert Syst. Appl. 2023, 213, 119024. [Google Scholar] [CrossRef]
- Xu, X.; Li, J.; Chen, Z. TCIANet: Transformer-based context information aggregation network for remote sensing image change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1951–1971. [Google Scholar] [CrossRef]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef]
- Li, X.; Diao, W.; Mao, Y.; Gao, P.; Mao, X.; Li, X.; Sun, X. OGMN: Occlusion-guided multi-task network for object detection in UAV images. ISPRS J. Photogramm. Remote Sens. 2023, 199, 242–257. [Google Scholar] [CrossRef]
- Zheng, C.; Nie, J.; Wang, Z.; Song, N.; Wang, J.; Wei, Z. High-order semantic decoupling network for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5401415. [Google Scholar] [CrossRef]
- Qi, J.; Gao, Y.; Hu, Y.; Wang, X.; Liu, X.; Bai, X.; Belongie, S.; Yuille, A.; Torr, P.H.; Bai, S. Occluded video instance segmentation: A benchmark. Int. J. Comput. Vis. 2022, 130, 2022–2039. [Google Scholar] [CrossRef]
- Zhang, T.; Tian, X.; Wu, Y.; Ji, S.; Wang, X.; Zhang, Y.; Wan, P. Dvis: Decoupled video instance segmentation framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 1282–1291. [Google Scholar]
- Qin, Z.; Lu, X.; Nie, X.; Liu, D.; Yin, Y.; Wang, W. Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA J. Autom. Sin. 2023, 10, 1192–1208. [Google Scholar] [CrossRef]
- Chen, H.; Hou, L.; Zhang, G.K.; Wu, S. Using Context-Guided data Augmentation, lightweight CNN, and proximity detection techniques to improve site safety monitoring under occlusion conditions. Saf. Sci. 2023, 158, 105958. [Google Scholar] [CrossRef]
- Ke, L.; Tai, Y.W.; Tang, C.K. Deep occlusion-aware instance segmentation with overlapping bilayers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4019–4028. [Google Scholar]
- Chen, S.; Zou, X.; Zhou, X.; Xiang, Y.; Wu, M. Study on fusion clustering and improved YOLOv5 algorithm based on multiple occlusion of Camellia oleifera fruit. Comput. Electron. Agric. 2023, 206, 107706. [Google Scholar] [CrossRef]
- Wang, M.; Fu, B.; Fan, J.; Wang, Y.; Zhang, L.; Xia, C. Sweet potato leaf detection in a natural scene based on faster R-CNN with a visual attention mechanism and DIoU-NMS. Ecol. Inform. 2023, 73, 101931. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Li, Y.; He, J.; Zhang, T.; Liu, X.; Zhang, Y.; Wu, F. Diverse part discovery: Occluded person re-identification with part-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2898–2907. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Kirillov, A.; Girshick, R.; He, K.; Dollár, P. Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6399–6408. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Environment | Version |
---|---|
Operating System | Ubuntu 22.04 |
CPU | Intel Xeon Gold 6326 |
GPU | NVIDIA Tesla V100 32 GB |
Compiling Environment | Python 3.8 |
CUDA | 11.7 |
Deep Learning Framework | Pytorch 1.12.1 |
Parameter | Value |
---|---|
Batch Size | 8 |
Init Learning Rate | 0.001 |
Min Learning Rate | |
Image Size | 512 × 512 |
Optimizer | Adam |
Epoch | 100 |
Method | mIoU (%) | F1-Score | Params | FPS |
---|---|---|---|---|
U-Net [4] | 63.88 | 71.06 | 28.99 M | 4.97 |
Bisenetv1 [54] | 67.42 | 79.18 | 13.27 M | 48.31 |
Deeplabv3 [35] | 65.87 | 77.75 | 65.74 M | 2.72 |
Swin-Transformer [20] | 73.27 | 80.83 | 58.94 M | 9.39 |
TransUnet [55] | 74.84 | 81.54 | 100.44 M | 7.78 |
Vit-base [19] | 71.47 | 78.16 | 142 M | 7.33 |
Vit-large [19] | 73.12 | 80.28 | 307 M | 5.23 |
DSC-Net | 76.31 | 83.20 | 133.08 M | 7.02 |
Method | mIoU (%) | F1-Score | Params | FPS |
---|---|---|---|---|
U-Net [4] | 93.19 | 96.69 | 28.99 M | 46.40 |
Bisenetv1 [54] | 89.81 | 95.48 | 13.27 M | 134.66 |
SE M_FPN [56] | 92.47 | 96.34 | 28.49 M | 69.37 |
Deeplabv3 [57] | 92.21 | 95.85 | 65.74 M | 37.93 |
Swin-Transformer [20] | 93.31 | 96.64 | 58.94 M | 26.11 |
TransUnet [55] | 93.62 | 96.80 | 100.44 M | 22.26 |
Vit [19] | 93.75 | 96.87 | 307 M | 13.64 |
DSC-Net | 94.54 | 97.07 | 133.08 M | 20.29 |
Method | mIoU (%) | F1-Score | Params | FPS |
---|---|---|---|---|
U-Net [4] | 95.18 | 97.22 | 28.99 M | 47.14 |
Bisenetv1 [54] | 91.91 | 95.46 | 13.27 M | 115.55 |
Deeplabv3 [57] | 94.97 | 96.91 | 65.74 M | 36.53 |
Swin-Transformer [20] | 95.68 | 97.53 | 58.94 M | 26.13 |
TransUnet [55] | 96.17 | 97.78 | 100.44 M | 21.81 |
Vit-large [19] | 97.33 | 98.36 | 307 M | 12.33 |
DSC-Net | 97.72 | 98.83 | 133.08 M | 19.59 |
Method | Module | IoU (%) | mIoU (%) | F1-Score | FPS | ||
---|---|---|---|---|---|---|---|
SBM | IRM | HAM | |||||
TransUnet | 93.54 | 96.54 | 98.22 | 18.00 | |||
TransUnet | ✔ | 94.42 | 97.02 | 98.47 | 17.38 | ||
TransUnet | ✔ | 92.34 | 95.91 | 97.87 | 21.84 | ||
TransUnet | ✔ | 94.73 | 97.18 | 98.55 | 15.70 | ||
TransUnet | ✔ | ✔ | 93.93 | 96.75 | 98.33 | 20.35 | |
TransUnet | ✔ | ✔ | 94.41 | 97.01 | 98.46 | 20.09 | |
TransUnet | ✔ | ✔ | 95.82 | 97.78 | 98.86 | 16.56 | |
TransUnet | ✔ | ✔ | ✔ | 95.73 | 97.72 | 98.83 | 19.59 |
Method | Dice Loss | BCE Loss | mIoU (%) | F1-Score |
---|---|---|---|---|
DSC-Net | 0.1 | 0.9 | 97.29 | 98.66 |
DSC-Net | 0.3 | 0.7 | 97.26 | 98.65 |
DSC-Net | 0.5 | 0.5 | 97.49 | 98.73 |
DSC-Net | 0.7 | 0.3 | 97.72 | 98.83 |
DSC-Net | 0.9 | 0.1 | 97.09 | 98.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Y.; Du, Y.; Ma, Y.; Lv, H. DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture. Sensors 2024, 24, 6075. https://doi.org/10.3390/s24186075
Yuan Y, Du Y, Ma Y, Lv H. DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture. Sensors. 2024; 24(18):6075. https://doi.org/10.3390/s24186075
Chicago/Turabian StyleYuan, Ying, Yu Du, Yan Ma, and Hejun Lv. 2024. "DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture" Sensors 24, no. 18: 6075. https://doi.org/10.3390/s24186075
APA StyleYuan, Y., Du, Y., Ma, Y., & Lv, H. (2024). DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture. Sensors, 24(18), 6075. https://doi.org/10.3390/s24186075