An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS)
<p>Basic Testing CNN Model (BTCM).</p> "> Figure 2
<p>An illustration of the proposed CSSA architecture.</p> "> Figure 3
<p>Comparison of different activation functions.</p> "> Figure 4
<p>Pooling methods’ accuracy curves.</p> "> Figure 5
<p>CNN network results without Batch Normalization (BN) layers.</p> "> Figure 6
<p>Analysis of CNN by skipping pooling layers.</p> "> Figure 7
<p>CSSA vs. SegNet.</p> "> Figure 8
<p>CSSA image dataset results.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Algorithm Overview
- It is a novel encoder-decoder-based semantic pixel-wise segmentation engine.
- It offers a simple training process, which simultaneously trains encoder and decoder networks.
- It significantly reduces the memory consumption due to the smart decoder architecture.
- It employs the latest activation function to improve the potential performance.
- This offers a flexible architecture to tune any size of input images.
Algorithm 1 Training a CNN encoder: The network will accept input having a volume of size . It requires hypermeters, K number of filters, spatial extent, S stride rate and P amount of zero padding. The size of output volume is . Here, C denotes the cost function for mini-batch. The learning rate decay factor is denoted by , and the numbers of layers are represented as L. denotes the batch-normalize activations on given parameters. The performs the exponential linear units’ activation of the given input. The next function performs the convolution operations with a constant rate of , and . The performs the pooling operation with Max pooling filter and a constant rate of and . |
Require: a mini-batch of inputs and targets , previous weights W, previous BatchNorm parameters , weights initialization coefficients from , and previous learning rate . Employed the He et al.’s [45] proposed method for network initialization weight parameters. |
Ensure: updated weights , updated BatchNorm parameters and updated learning rate . |
1. Encoder Computations : |
1.1. Producing Feature Maps: |
Input image |
Initialize |
for k = 1 to L do |
if k < L then |
else |
return |
end if |
end for |
Algorithm 2 Training a CNN decoder: The network will accept input having a volume of size pooling mask and pooling indices from encoder network. It requires hypermeters, K number of filters, F spatial extent, S stride rate and P amount of zero padding. The output volume is a size of . Here, C denotes the cost function for mini-batch. The learning rate decay factor is denoted by , and the numbers of layers are represented as L. denotes the function that up-samples the given inputs. This operation is similar to un-pooling, where the given input is merged to produce an extended-sized feature map. The and will perform the same as the encoder configurations. is a multi-class soft-max classifier to output the class probabilities intended for every pixel. |
Require: a mini-batch of feature maps extracted from encoder network . Previous weights W, previous BatchNorm parameters , weights initialization coefficients from , and previous learning rate . The network initialization weight parameters specified using the He et al.’s [45] proposed method. |
Ensure: updated weights , updated BatchNorm parameters and updated learning rate . |
2. Decoder Computations: |
2.1 Up-sampling Feature Maps: |
Input |
for k = L to 1 do |
if k > 1 then |
else |
return |
end if |
end for |
Algorithm 3 Batch normalization [46]: Let be the normalized values and be their linear transformations. The transformation is referred to as: . In this algorithm, is a constant added to the mini-batch variance for numerical stability. |
Require: Values of x over a mini-batch: ; Parameters to be learned: |
Ensure: |
//mini-batch mean |
//mini-batch variance |
//normalize |
//scale and shift |
4. CSSA Architecture
5. Experiments
5.1. Implementation Details
5.2. Training
5.2.1. Datasets
5.2.2. Optimization
6. Results and Analysis
7. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Abbreviations
ADAS | Advanced Driver Assistance System |
CNN | Convolution Neural Network |
CSSA | CNN for Semantic Segmentation for ADAS |
BN | Batch Normalization |
FCN | Fully-Convolutional Networks |
ReLU | Rectified Linear Unit |
RNN | Recurrent Neural Network |
LReLU | Leaky ReLUs |
References
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Proc. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Desai, P.R.; Desai, P.N.; Ajmera, K.D.; Mehta, K. A review paper on oculus rift-a virtual reality headset. arXiv, 2014; arXiv:1408.1173. [Google Scholar]
- Gottmer, M. Merging Reality and Virtuality with Microsoft HoloLens. Master’s Thesis, Universiteit Utrecht, Utrecht, The Netherlands, 2015. [Google Scholar]
- Shotton, J.; Sharp, T.; Kipman, A.; Fitzgibbon, A.; Finocchio, M.; Blake, A.; Cook, M.; Moore, R. Real-time human pose recognition in parts from single depth images. Commun. ACM 2013, 56, 116–124. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Couprie, C.; Farabet, C.; Najman, L.; LeCun, Y. Indoor semantic segmentation using depth information. arXiv, 2013; arXiv:1301.3572. [Google Scholar]
- Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. arXiv, 2014; 647–655arXiv:1310.1531. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous detection and segmentation. In European Conference on Computer Vision; Springer: Zurich, Switzerland, 2014; pp. 297–312. [Google Scholar]
- Ciresan, D.; Giusti, A.; Gambardella, L.M.; Schmidhuber, J. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 2843–2851. [Google Scholar]
- Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed]
- Ravì, D.; Bober, M.; Farinella, G.M.; Guarnera, M.; Battiato, S. Semantic segmentation of images exploiting DCT based features and random forest. Pattern Recognit. 2016, 52, 260–273. [Google Scholar] [CrossRef]
- Visin, F.; Ciccone, M.; Romero, A.; Kastner, K.; Cho, K.; Bengio, Y.; Matteucci, M.; Courville, A. Reseg: A recurrent neural network-based model for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 27–30 June 2016; pp. 41–48. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv, 2014; arXiv:1412.7062. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Shotton, J.; Winn, J.; Rother, C.; Criminisi, A. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 2009, 81, 2–23. [Google Scholar] [CrossRef]
- Bottou, L. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT’2010; Springer: Paris, France, 2010; pp. 177–186. [Google Scholar]
- Al Machot, F.; Ali, M.; Haj Mosa, A.; SchwarzlmxFC;ller, C.; Gutmann, M.; Kyamakya, K. Real-time raindrop detection based on cellular neural networks for ADAS. J. Real-Time Image Proc. 2016, 1–13. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- Tompson, J.J.; Jain, A.; LeCun, Y.; Bregler, C. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems; NIPS: Montreal, QC, Canada, 2014; pp. 1799–1807. [Google Scholar]
- Gupta, S.; Girshick, R.; Arbeláez, P.; Malik, J. Learning rich features from RGB-D images for object detection and segmentation. In European Conference on Computer Vision; Springer: Zurich, Switzerland, 2014; pp. 345–360. [Google Scholar]
- Guo, X.; Shen, C.; Chen, L. Deep Fault Recognizer: An Integrated Model to Denoise and Extract Features for Fault Diagnosis in Rotating Machinery. Appl. Sci. 2016, 7, 41. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, G.; Shi, J. 2D Gaze Estimation Based on Pupil-Glint Vector Using an Artificial Neural Network. Appl. Sci. 2016, 6, 174. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Matan, O.; Burges, C.J.; LeCun, Y.; Denker, J.S. Multi-Digit Recognition Using a Space Displacement Neural Network; NIPS: San Mateo, CA, USA, 1991; pp. 488–495. [Google Scholar]
- Wolf, R.; Platt, J.C. Postal address block location using a convolutional locator network. In Advances in Neural Information Processing Systems; NIPS: Denver, CO, USA, 1994; p. 745. [Google Scholar]
- Ning, F.; Delhomme, D.; LeCun, Y.; Piano, F.; Bottou, L.; Barbano, P.E. Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Proc. 2005, 14, 1360–1371. [Google Scholar] [CrossRef]
- Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst. 2011, 2, 1–9. [Google Scholar]
- Mostajabi, M.; Yadollahpour, P.; Shakhnarovich, G. Feedforward semantic segmentation with zoom-out features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3376–3385. [Google Scholar]
- Dai, J.; He, K.; Sun, J. Convolutional feature masking for joint object and stuff segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3992–4000. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Shotton, J.; Johnson, M.; Cipolla, R. Semantic texton forests for image categorization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, KY, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Sturgess, P.; Alahari, K.; Ladicky, L.; Torr, P.H. Combining appearance and structure from motion features for road scene understanding. In Proceedings of the BMVC 2012-23rd British Machine Vision Conference, London, UK, 7–10 September 2009. [Google Scholar]
- Brostow, G.J.; Fauqueur, J.; Cipolla, R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit. Lett. 2009, 30, 88–97. [Google Scholar] [CrossRef]
- Ladickỳ, L.; Sturgess, P.; Alahari, K.; Russell, C.; Torr, P.H. What, where and how many? Combining object detectors and crfs. In European Conference on Computer Vision; Springer: Crete, Greece, 2010; pp. 424–437. [Google Scholar]
- Huang, F.J.; Boureau, Y.L.; LeCun, Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv, 2015; arXiv:1505.07293. [Google Scholar]
- Kendall, A.; Badrinarayanan, V.; Cipolla, R. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understandingBayesian segnet. arXiv, 2015; arXiv:1511.02680. [Google Scholar]
- Jung, H.; Choi, M.K.; Soon, K.; Jung, W.Y. End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation. arXiv, 2016; arXiv arXiv:1612.06558. [Google Scholar]
- Xie, K.; Ge, S.; Ye, Q.; Luo, Z. Traffic Sign Recognition Based on Attribute-Refinement Cascaded Convolutional Neural Networks. In Pacific Rim Conference on Multimedia; Springer: Xi’an, China, 2016; pp. 201–210. [Google Scholar]
- Huval, B.; Wang, T.; Tandon, S.; Kiske, J.; Song, W.; Pazhayampallil, J.; Andriluka, M.; Rajpurkar, P.; Migimatsu, T.; Cheng-Yue, R.; et al. An empirical evaluation of deep learning on highway driving. arXiv, 2015; arXiv:1504.01716. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv, 2015; arXiv:1502.03167. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2014; arXiv:1409.1556. [Google Scholar]
- Srivastava, N.; Hinton, G.E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv, 2015; arXiv:1511.07289. [Google Scholar]
- Drucker, H.; Le Cun, Y. Improving generalization performance in character recognition. In Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, New York, NY, USA, 30 September–1 October 1991; pp. 198–207. [Google Scholar]
- Goodfellow, I.J.; Warde-Farley, D.; Mirza, M.; Courville, A.C.; Bengio, Y. Maxout networks. ICML (3) 2013, 28, 1319–1327. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Stochastic pooling for regularization of deep convolutional neural networks. arXiv, 2013; arXiv:1301.3557. [Google Scholar]
- Schilling, F. The Effect of Batch Normalization on Deep Convolutional Neural Networks; DiVA Publisher: Uppsala, Sweden, 2016. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv, 2014; arXiv:1412.6806. [Google Scholar]
- Yasrab, R.; GU, N.; Xiaoci, Z. SCNet: A Simplified Encoder-Decoder CNN for Semantic Segmentation. In Proceedings of the 2016 IEEE Sponsored 5th International Conference on Computer Science and Networks Technology (ICCSNT 2016), Changchun, China, 10–11 December 2016; pp. 1–6. [Google Scholar]
- Yasrab, R.; GU, N.; Xiaoci, Z.; Asad-Khan. DCSeg: Decoupled CNN for Classification and Semantic Segmentation. In Proceedings of the 2017 IEEE Conference on Knowledge and Smart Technologies (KST), Pattaya, Thailand, 1–4 February 2017; pp. 1–6. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Jarrett, K.; Kavukcuoglu, K.; Lecun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 2146–2153. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Yang, Y.; Li, Z.; Zhang, L.; Murphy, C.; Ver Hoeve, J.; Jiang, H. Local label descriptor for example based semantic image labeling. In European Conference on Computer Vision; Springer: Zurich, Switzerland, 2012; pp. 361–375. [Google Scholar]
- Brostow, G.J.; Shotton, J.; Fauqueur, J.; Cipolla, R. Segmentation and recognition using structure from motion point clouds. In European Conference on Computer Vision; Springer: Marseille, France, 2008; pp. 44–57. [Google Scholar]
- Zhang, C.; Wang, L.; Yang, R. Semantic segmentation of urban scenes using dense depth maps. In European Conference on Computer Vision; Springer: Crete, Greece, 2010; pp. 708–721. [Google Scholar]
- Kontschieder, P.; Bulo, S.R.; Bischof, H.; Pelillo, M. Structured class-labels in random forests for semantic image labelling. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6 November–13 November 2011; pp. 2190–2197. [Google Scholar]
- Rota Bulo, S.; Kontschieder, P. Neural decision forests for semantic image labelling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, IN, USA, 23–28 June 2014; pp. 81–88. [Google Scholar]
- Tighe, J.; Lazebnik, S. Superparsing. Int. J. Comput. Vis. 2013, 101, 329–349. [Google Scholar] [CrossRef]
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
- Lim, S.S.; Vos, T.; Flaxman, A.D.; Danaei, G.; Shibuya, K.; Adair-Rohani, H.; AlMazroa, M.A.; Amann, M.; Anderson, H.R.; Andrews, K.G.; et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010. Lancet 2013, 380, 2224–2260. [Google Scholar] [CrossRef]
- Thrun, S.; Montemerlo, M.; Dahlkamp, H.; Stavens, D.; Aron, A.; Diebel, J.; Fong, P.; Gale, J.; Halpenny, M.; Hoffmann, G.; et al. Stanley: The robot that won the DARPA Grand Challenge. J. Field Robot. 2006, 23, 661–692. [Google Scholar] [CrossRef]
- SAE, T. Definitions for Terms Related to On-Road Motor Vehicle Automated Driving Systems-J3016. In Society of Automotive Engineers: On-Road Automated Vehicle Standards Committee; SAE Pub. Inc.: Warrendale, PA, USA, 2013. [Google Scholar]
- Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. The German traffic sign recognition benchmark: A multi-class classification competition. In Proceedings of the 2011 International Joint Conference on Neural Networks (IJCNN), San Jose, CA, USA, 31 July–5 August 2011; pp. 1453–1460. [Google Scholar]
- Lee, S.S.; Lee, E.; Hwang, Y.; Jang, S.J. Low-complexity hardware architecture of traffic sign recognition with IHSL color space for advanced driver assistance systems. In Proceedings of the IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Nagoya, Japan, 24–27 October 2016; pp. 1–2. [Google Scholar]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv, 2013; arXiv:1312.6229. [Google Scholar]
Encoder | Decoder | ||||
---|---|---|---|---|---|
Input 360 × 480 + Norm | Output, Soft-max-with-Loss, Accuracy | ||||
Conv1 | 3 × 3, 64 | BN, ELU, Pooling | De-Conv | 3 × 3, 64 | Up-sample, BN, ELU |
3 × 3, 64 | 3 × 3, 64 | ||||
Conv2 | 3 × 3, 128 | BN, ELU, Pooling | De-Conv | 3 × 3, 128 | Up-sample, BN, ELU |
3 × 3, 128 | 3 × 3, 128 | ||||
Conv3 | 3 × 3, 256 | BN, ELU, Pooling, Dropout | De-Conv | 3 × 3, 256 | Up-sample, BN, ELU, Dropout |
3 × 3, 256 | 3 × 3, 256 | ||||
3 × 3, 256 | 3 × 3, 256 | ||||
Conv4 | 3 × 3, 512 | BN, ELU, Pooling, Dropout | De-Conv | 3 × 3, 512 | Up-sample, BN, ELU, Dropout |
3 × 3, 512 | 3 × 3, 512 | ||||
3 × 3, 512 | 3 × 3, 512 | ||||
Conv5 | 3 × 3, 512 | BN, ELU, Pooling, Dropout | De-Conv | 3 × 3, 512 | Up-sample, BN, ELU, Dropout |
3 × 3, 512 | 3 × 3, 512 | ||||
3 × 3, 512 | 3 × 3, 512 | ||||
Encoder feed Pooling Indices to Decoder |
Method | Building | Tree | Sky | Car | Sign-Symbol | Road | Pedestrian | Fence | Column-Pole | Side-Walk | Bicyclist | Class avg. | Global avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Local Label Descriptors [60] | 80.7 | 61.5 | 88.8 | 16.4 | n/a | 98.0 | 1.09 | 0.05 | 4.13 | 12.4 | 0.07 | 36.3 | 73.6 |
SfM + Appearance [61] | 46.2 | 61.9 | 89.7 | 68.6 | 42.9 | 89.5 | 53.6 | 46.6 | 0.7 | 60.5 | 22.5 | 53.0 | 69.1 |
Boosting [36] | 61.9 | 67.3 | 91.1 | 71.1 | 58.5 | 92.9 | 49.5 | 37.6 | 25.8 | 77.8 | 24.7 | 59.8 | 76.4 |
Dense Depth Maps [62] | 85.3 | 57.3 | 95.4 | 69.2 | 46.5 | 98.5 | 23.8 | 44.3 | 22.0 | 38.1 | 28.7 | 55.4 | 82.1 |
Structured Random Forests [63] | n/a | 51.4 | 72.5 | ||||||||||
Neural Decision Forests [64] | n/a | 56.1 | 82.1 | ||||||||||
Super Parsing [65] | 87.0 | 67.1 | 96.9 | 62.7 | 30.1 | 95.9 | 14.7 | 17.9 | 1.7 | 70.0 | 19.4 | 51.2 | 83.3 |
Boosting + pairwise CRF [36] | 70.7 | 70.8 | 94.7 | 74.4 | 55.9 | 94.1 | 45.7 | 37.2 | 13.0 | 79.3 | 23.1 | 59.9 | 79.8 |
Boosting+Higher order [36] | 84.5 | 72.6 | 97.5 | 72.7 | 34.1 | 95.3 | 34.2 | 45.7 | 8.1 | 77.6 | 28.5 | 59.2 | 83.8 |
Boosting+Detectors + CRF [38] | 81.5 | 76.6 | 96.2 | 78.7 | 40.2 | 93.9 | 43.0 | 47.6 | 14.3 | 81.5 | 33.9 | 62.5 | 83.8 |
ReSeg [14] | 86.8 | 84.7 | 93.0 | 87.3 | 48.6 | 98.0 | 63.3 | 20.9 | 35.6 | 87.3 | 43.5 | 68.1 | 88.7 |
SegNet-Basic (layer-wise training) [41] | 75.0 | 84.6 | 91.2 | 82.7 | 36.9 | 93.3 | 55.0 | 37.5 | 44.8 | 74.1 | 16.0 | 62.9 | 84.3 |
SegNet-Basic [40] | 80.6 | 72.0 | 93.0 | 78.5 | 21.0 | 94.0 | 62.5 | 31.4 | 36.6 | 74.0 | 42.5 | 62.3 | 82.8 |
SegNet [40] | 88.0 | 87.3 | 92.3 | 80.0 | 29.5 | 97.6 | 57.2 | 49.4 | 27.8 | 84.8 | 30.7 | 65.9 | 88.6 |
Ravi et al. [13] | 49.1 | 77.1 | 93.5 | 80.8 | 63.9 | 88.0 | 75.0 | 76.2 | 28.6 | 88.5 | 76.1 | 72.4 | 76.3 |
Bayesian SegNet-Basic [41] | 75.1 | 68.8 | 91.4 | 77.7 | 52.0 | 92.5 | 71.5 | 44.9 | 52.9 | 79.1 | 69.6 | 70.5 | 81.6 |
Bayesian SegNet [41] | 80.4 | 85.5 | 90.1 | 86.4 | 67.9 | 93.8 | 73.8 | 64.5 | 50.8 | 91.7 | 54.6 | 76.3 | 86.9 |
CSSA | 94.8 | 83.7 | 83.6 | 95.0 | 92.0 | 86.9 | 97.3 | 87.8 | 92.1 | 93.3 | 90.1 | 87.3 | 90.6 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yasrab, R.; Gu, N.; Zhang, X. An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS). Appl. Sci. 2017, 7, 312. https://doi.org/10.3390/app7040312
Yasrab R, Gu N, Zhang X. An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS). Applied Sciences. 2017; 7(4):312. https://doi.org/10.3390/app7040312
Chicago/Turabian StyleYasrab, Robail, Naijie Gu, and Xiaoci Zhang. 2017. "An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS)" Applied Sciences 7, no. 4: 312. https://doi.org/10.3390/app7040312
APA StyleYasrab, R., Gu, N., & Zhang, X. (2017). An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS). Applied Sciences, 7(4), 312. https://doi.org/10.3390/app7040312