Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera
<p>EDB module. (BN: Batch normalization; ReLU: Activation function, rectified linear unit; c: Channel number; colorblueM: A positive number divisible by 2; s: Convolution stride; d: Dilation rate; g: Group number).</p> "> Figure 2
<p>Structure of EDBNet. (c: Channel number; s: Convolution stride; d: Dilation rate; g: Group number).</p> "> Figure 3
<p>Semantic segmentation results on the CamVid. (<b>a</b>) Original images. (<b>b</b>) EDBNet. (<b>c</b>) Ground truth.</p> "> Figure 4
<p>Semantic segmentation results on the Cityscapes. (<b>a</b>) Original images. (<b>b</b>) EDBNet. (<b>c</b>) Ground truth.</p> "> Figure 5
<p>Semantic segmentation results of road edge information on the campus environment.</p> "> Figure 6
<p>Hardware and software architecture of the robotic distributed system.</p> "> Figure 7
<p>Semantic segmentation results in the real-world environment using the six wheel-legged mobile robot. (<b>a</b>) Original images. (<b>b</b>) EDBNet (proposed). (<b>c</b>) DABNet. (<b>d</b>) ENet.</p> ">
Abstract
:1. Introduction
- A shallow EDB package is suggested to capture a wealth of information from two aspects in the situation of instrument detection on mobile robots. Firstly, this module consists of two branches jointly extracting local and contextual information. Secondly, two-dimensional standard convolution is divided into two parallel one-dimensional convolutions in each branch, widening the non-linear layers and strengthening the non-linear relationship.
- The mobile robot system can accurately and quickly draw conclusions while interpreting a scene. Studies using the CamVid and Cityscapes datasets demonstrate the efficacy of two real-world experiments on mobile robot systems, as well as the high accuracy and rapid inference speed that EDBNet accomplishes while creating a few parameters.
2. Related Works
2.1. Multi-Scale Strategies
2.2. Lightweight Networks
3. Proposed Network
3.1. Edb Module
3.2. EDBNet Architecture Design
4. Experiments
4.1. Implementation Details
4.2. Ablation Experiment
4.3. Performance Evaluation of the Accuracy and Parameters
4.4. Performance Evaluation of the Inference Speed on a Single GTX 1070Ti Card
4.5. Results on a Practical Mobile Robot in the Real World
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hasheminasab, S.M.; Zhou, T.; Lin, Y.C.; Habib, A. Linear Feature-Based Triangulation for Large-Scale Orthophoto Generation Over Mechanized Agricultural Fields. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5621718. [Google Scholar] [CrossRef]
- Lin, Y.C.; Shao, J.; Shin, S.Y.; Saka, Z.; Joseph, M.; Manish, R.; Fei, S.; Habib, A. Comparative Analysis of Multi-Platform, Multi-Resolution, Multi-Temporal LiDAR Data for Forest Inventory. Remote Sens. 2022, 14, 649. [Google Scholar] [CrossRef]
- Lin, Y.C.; Zhou, T.; Wang, T.; Crawford, M.; Habib, A. New Orthophoto Generation Strategies from UAV and Ground Remote Sensing Platforms for High-Throughput Phenotyping. Remote Sens. 2021, 13, 860. [Google Scholar] [CrossRef]
- Chen, X.; Li, Y.; Fan, J.; Wang, R. RGAM: A novel network architecture for 3D point cloud semantic segmentation in indoor scenes. Inf. Sci. 2021, 571, 87–103. [Google Scholar] [CrossRef]
- Tang, X.; Tu, W.; Li, K.; Cheng, J. DFFNet: An IoT-perceptive dual feature fusion network for general real-time semantic segmentation. Inf. Sci. 2021, 565, 326–343. [Google Scholar] [CrossRef]
- He, J.; Gu, H.; Wang, Z. Multi-instance multi-label learning based on Gaussian process with application to visual mobile robot navigation. Inf. Sci. 2012, 190, 162–177. [Google Scholar] [CrossRef]
- Li, L.; Dong, Z.; Yang, T.; Cao, H. Deep Learning-Based Automatic Monitoring Method for Grain Quantity Change in Warehouse Using Semantic Segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 3056743. [Google Scholar] [CrossRef]
- Su, X.; Philip Chen, C.; Liu, Z. Adaptive fuzzy control for uncertain nonlinear systems subject to full state constraints and actuator faults. Inf. Sci. 2021, 581, 553–566. [Google Scholar] [CrossRef]
- Peng, G.; Chen, C.L.P.; Yang, C. Neural Networks Enhanced Optimal Admittance Control of Robot-Environment Interaction Using Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.; Peng, G.; Cheng, L.; Na, J.; Li, Z. Force Sensorless Admittance Control for Teleoperation of Uncertain Robot Manipulator Using Neural Networks. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 3282–3292. [Google Scholar] [CrossRef]
- Li, J.; Li, R.; Li, J.; Wang, J.; Wu, Q.; Liu, X. Dual-view 3D object recognition and detection via Lidar point cloud and camera image. Robot. Auton. Syst. 2022, 150, 103999. [Google Scholar] [CrossRef]
- Qiu, Z.; Zhuang, Y.; Yan, F.; Hu, H.; Wang, W. RGB-DI Images and Full Convolution Neural Network-Based Outdoor Scene Understanding for Mobile Robots. IEEE Trans. Instrum. Meas. 2019, 68, 27–37. [Google Scholar] [CrossRef]
- Jia, C.; Shi, F.; Zhao, M.; Zhang, Y.; Cheng, X.; Wang, M.; Chen, S. Semantic Segmentation with Light Field Imaging and Convolutional Neural Networks. IEEE Trans. Instrum. Meas. 2021, 70, 3115204. [Google Scholar] [CrossRef]
- Li, J.; Wang, J.; Peng, H.; Hu, Y.; Su, H. Fuzzy-Torque Approximation-Enhanced Sliding Mode Control for Lateral Stability of Mobile Robot. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 2491–2500. [Google Scholar] [CrossRef]
- Yang, C.; Wu, H.; Li, Z.; He, W.; Wang, N.; Su, C.Y. Mind Control of a Robotic Arm With Visual Fusion Technology. IEEE Trans. Ind. Inform. 2018, 14, 3822–3830. [Google Scholar] [CrossRef]
- Li, J.; Wang, J.; Peng, H.; Zhang, L.; Hu, Y.; Su, H. Neural fuzzy approximation enhanced autonomous tracking control of the wheel-legged robot under uncertain physical interaction. Neurocomputing 2020, 410, 342–353. [Google Scholar] [CrossRef]
- Li, J.; Wang, J.; Wang, S.; Yang, C. Human-robot skill transmission for mobile robot via learning by demonstration. Neural Comput. Appl. 2021, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Zhang, X.; Li, J.; Liu, Y.; Wang, J. Building and optimization of 3D semantic map based on Lidar and camera fusion. Neurocomputing 2020, 409, 394–407. [Google Scholar] [CrossRef]
- LeCun, Y. Back propagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Siam, M.; Gamal, M.; Abdel-Razek, M.; Yogamani, S.; Jägersand, M. Real-time semantic segmentation comparative study. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 1603–1607. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Howard, A.G.; Zhu, M.L.; Chen, B.; Kalenichenko, D.; Wang, W.J.; Weyang, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Networks for Mobile Vision Application. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Dai, Y.; Wang, J.; Li, J.; Li, J. MDRNet: A lightweight network for real-time semantic segmentation in street scenes. Assem. Autom. 2021, 41, 725–733. [Google Scholar] [CrossRef]
- Li, J.; Qin, H.; Wang, J.; Li, J. OpenStreetMap-based autonomous navigation for the four wheel-legged robot via 3D-Lidar and CCD camera. IEEE Trans. Ind. Electron. 2022, 69, 2708–2717. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net:Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Zhao, H.S.; Shi, J.P.; Qi, X.J.; Wang, X.G.; Jia, J.Y. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Lin, G.S.; Milan, A.; Shen, C.; Reid, I. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5168–5177. [Google Scholar]
- Zhao, H.S.; Qi, X.J.; Shen, X.Y.; Shi, J.P.; Jia, J.Y. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- Li, G.; Yun, I.; Kim, J.; Kim, J. DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation. In Proceedings of the 30th British Machine Vision Conference (BMVC), Cardiff, UK, 9–12 September 2019; pp. 418–434. [Google Scholar]
- Yu, C.Q.; Wang, J.B.; Peng, C.; Gao, C.X.; Yu, G.; Sang, N. BiseNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 334–349. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size. In Proceedings of the 5th International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Gao, G.; Xu, G.; Yu, Y.; Xie, J.; Yang, J.; Yue, D. MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 2021, 1–11. [Google Scholar] [CrossRef]
- Sun, Y.; Pan, B.; Fu, Y. Lightweight Deep Neural Network for Real-Time Instrument Semantic Segmentation in Robot Assisted Minimally Invasive Surgery. IEEE Robot. Autom. Lett. 2021, 6, 3870–3877. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Zhu, S.L.; Dong, X.; Su, H. Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4918–4927. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.L.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Loffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 2018, 19, 263–272. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1440–1448. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes dataset for Semantic Urban Scene Understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the 2015 International Conference on Learning Representations (ICLR), Beijing, China, 6–9 December 2015; pp. 1–13. [Google Scholar]
- Ding, H.; Jiang, X.; Shuai, B.; Liu, A.Q.; Wang, G. Semantic Correlation Promoted Shape-Variant Context for Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8877–8886. [Google Scholar]
- Ding, H.H.; Jiang, X.D.; Shuai, B.; Liu, A.Q.; Wang, G. Semantic Segmentation With Context Encoding and Multi-Path Decoding. IEEE Trans. Image Process. 2020, 29, 3520–3533. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M.; Shapiro, L.; Hajishirzi, H. ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9182–9192. [Google Scholar]
- Zhang, X.T.; Chen, Z.X.; Jonathan, W.Q.M.; Cai, L.; Lu, D.; Li, X.M. Fast Semantic Segmentation for Scene Perception. IEEE Trans. Ind. Inform. 2019, 15, 1183–1192. [Google Scholar] [CrossRef]
- Li, H.C.; Xiong, P.F.; Fan, H.Q.; Sun, J. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9514–9523. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilated network with guided aggregation for real-time semantic segmentation. arXiv 2017, arXiv:2004.02147. [Google Scholar] [CrossRef]
- Wang, S.; Chen, Z.; Li, J.; Wang, J.; Li, J.; Zhao, J. Flexible motion framework of the six wheel-legged robot: Experimental results. IEEE/ASME Trans. Mechatronics 2021, 1–9. [Google Scholar] [CrossRef]
- Li, J.; Dai, Y.; Wang, J.; Su, X.; Ma, R. Towards broad learning networks on unmanned mobile robot for semantic segmentation. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 9228–9234. [Google Scholar]
Models | MIoU(%) | FPS | Parameters |
---|---|---|---|
EDBNet without Branch 2 | 61.77 | 78.13 | 0.80 M |
EDBNet without Branch 1 | 66.70 | 68.49 | 0.81 M |
EDBNet with extended Stage 2 | 68.45 | 46.08 | 1.40 M |
EDBNet with extended Stage 3 | 67.88 | 35.97 | 1.06 M |
EDBNet with fixed dilation rate | 67.26 | 61.73 | 1.03 M |
EDBNet(ours) | 68.58 | 61.73 | 1.03 M |
Models | GTX 1070Ti | Parameters | ||
---|---|---|---|---|
CamVid | Cityscapes | |||
Large Models | FCN-8s [27] | 57.0 | 65.3 | 134.5 M |
SegNet [29] | 60.1 | - | 29.45 M | |
Dilation10 [51] | 65.3 | 67.1 | 140.5 M | |
PSPNet [30] | 69.1 | 78.4 | 65.7 M | |
DeepLab v3 [31] | - | 81.3 | >30 M | |
SVCNet [52] | 75.4 | 81.0 | - | |
CGBNet [53] | - | 81.2 | - | |
Lightweight Models | ENet [23] | 51.3 | 58.3 | 0.37 M |
ICNet [33] | 67.1 | 69.5 | 26.6 M | |
BiseNet [35] | 65.5 | 68.4 | 12.5 M | |
ERFNet [47] | - | 68.0 | 2.1 M | |
ESPNet V2 [54] | - | 66.2 | <10 M | |
FSSNet [55] | 58.6 | 58.8 | 0.2 M | |
DABNet [34] | 66.4 | 70.1 | 0.76 M | |
DFANet [56] | 64.7 | 70.3 | 7.8 M | |
BiseNet v2 [57] | 72.4 | 72.6 | 49 M | |
EDBNet (proposed) | 68.6 | 71.2 | 1.03 M |
Models | 512 × 1024 | |
---|---|---|
ms | fps | |
SegNet | 80.6 | 12.4 |
ENet | 18.2 | 54.9 |
ICNet | 15.0 | 67.2 |
DABNet | 14.6 | 68.5 |
ESPNet | 12.7 | 78.7 |
DFANet | 12.6 | 79.4 |
BiseNet v2 | 9.7 | 103.1 |
EDBNet (proposed) | 12.3 | 81.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Dai, Y.; Su, X.; Wu, W. Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera. Remote Sens. 2022, 14, 3925. https://doi.org/10.3390/rs14163925
Li J, Dai Y, Su X, Wu W. Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera. Remote Sensing. 2022; 14(16):3925. https://doi.org/10.3390/rs14163925
Chicago/Turabian StyleLi, Jiehao, Yingpeng Dai, Xiaohang Su, and Weibin Wu. 2022. "Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera" Remote Sensing 14, no. 16: 3925. https://doi.org/10.3390/rs14163925
APA StyleLi, J., Dai, Y., Su, X., & Wu, W. (2022). Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera. Remote Sensing, 14(16), 3925. https://doi.org/10.3390/rs14163925