Analysis and Synthesis of Traffic Scenes from Road Image Sequences
<p>The flow diagram of the improved BiFPN method.</p> "> Figure 2
<p>Global context attention mechanism.</p> "> Figure 3
<p>Flow diagram of optical flow inpainting.</p> "> Figure 4
<p>The network structure of CycleGAN.</p> "> Figure 5
<p>MAP index comparison of UA-DETRAC data set test.</p> "> Figure 6
<p>MAP index comparison of VOC2007 data set test.</p> "> Figure 7
<p>Qualitative comparative test.</p> "> Figure 8
<p>Qualitative results for image inpainting on TSD-max.</p> "> Figure 9
<p>Quantitative results for image inpainting with black masks on TSD-max.</p> "> Figure 10
<p>Quantitative results in terms of PSNR on TSD-max.</p> "> Figure 11
<p>Quantitative results in terms of SSIM on TSD-max.</p> "> Figure 12
<p>Traffic scenes construction from road images.</p> ">
Abstract
:1. Introduction
Traffic elements detection: The feature fusion is integrated for SSD feature extraction. Based on the BiFPN method, we modify the addition operation of BiFPN in the spatial level of feature map to the concatenate operation of channel level, so as to improve the efficiency of the model in obtaining and utilizing feature information. An attention mechanism is applied to enable the model make full use of the desired features.
- Road scene inpainting: An unsupervised CycleGAN is developed to inpaint the missing region in optical flow which is generated from adjacent frames. The inconsistency between foreground and background optical flow information can be applied to restore the missing pixels of undesired regions. A Gaussian mixture model is adopted to further refine the undesired region.
- Road scene modeling: A novel road scene modeling method is developed using object detection and image inpainting, which can be applied to traffic scene simulation and evaluation.
2. Related Works
2.1. Traffic Elements Detection
2.2. Road Scene Modeling
3. Traffic Elements Detection
3.1. Improved BiFPN Feature Fusion
Algorithm 1 SSD feature extraction. |
Require: Input image sequences ; |
for in I do |
Generate feature map through CNN feature extraction; |
for in do |
Extract feature map ; |
Each layer of feature maps obtain more efficient feature information through the attention mechanism; |
Use the improved BiFPN to fuse the new feature maps; |
Construct different bounding boxes in size ; |
end for |
Input NMS algorithm; |
Output default box after selection; |
end for |
Algorithm stops. |
Ensure: Bounding boxes and classes of objects. |
3.2. More Efficient Activation Functions
3.3. Attention Mechanism
4. Road Scene Modeling
4.1. Optical Flow Inpainting Based on CycleGAN
4.2. Inpainting of Image Sequences Based on GMM
Algorithm 2 Inpainting of image sequences based on Gaussian mixture model. |
Require: Inpainting optical flow results of images and ; |
for pixel in missing region do |
for in N Gaussian models do |
Compute mixture of Gaussiasn distrubutions; |
update weight: ; |
end for |
Choose the mean of Gaussian model as pixel value; |
end for |
Algorithm stops. |
Ensure: Impainted sequences . |
4.3. Road Scene Construction and Simulation
5. Experiments and Comparisons
5.1. Object Detection Experiment
5.1.1. Datasets and Metrics
5.1.2. Object Detection Experiments
- mAP: The average accuracy under different categories. The experimental results prove that the BiSSD model proposed in this paper has the highest mAP in the experiments of the three data sets. It further demonstrates the effectiveness of asymmetric convolution and global context attention mechanism.
- FPS: In intelligent transportation systems, the real-time detection speed of the model is very important. The proposed model uses the GCNet attention framework and a large number of small convolution kernels, which greatly reduces the computation. The result in Table 1 and Table 2 demonstrate that the proposed method shows a strong real-time advantage.
5.2. Road Scene Modeling Experiment
5.2.1. Image Inpainting
5.2.2. Road Scene Models Construction
6. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
References
- Li, Y.; Cui, Z.; Liu, Y.; Zhu, J.; Zhao, D.; Jian, Y. Road scene simulation based on vehicle sensors: An intelligent framework using random walk detection and scene stage reconstruction. Sensors 2018, 18, 3782. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wexler, Y.; Shechtman, E.; Irani, M. Space-time completion of video. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 463–476. [Google Scholar] [CrossRef] [Green Version]
- Horry, Y.; Anjyo, K.; Arai, K. Tour into the picture: Using a spidery interface to make animation from a single image. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (ACM Siggraph 97 Conference), Los Angeles, CA, USA, 3–8 August 1997; pp. 225–232. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.211.8170 (accessed on 3 December 2020).
- Anguelov, D.; Dulong, C.; Filip, D.; Frueh, C.; Lafon, S.; Lyon, R.; Ogale, A.; Vincent, L.; Weaver, J. Google street view: Capturing the world at street level. Computer 2010, 43, 32–38. [Google Scholar] [CrossRef]
- Li, L.; Wen, D.; Zheng, N.; Shen, L. Cognitive cars: A new frontier for ADAS research. IEEE Trans. Intell. Transp. Syst. 2012, 13, 395–407. [Google Scholar] [CrossRef]
- Leibe, B.; Matas, J.; Sebe, N.; Welling, M. SSD: Single Shot MultiBox Detector. In Proceedings of the ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
- Lin, T.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Villager, E.; Aubert, G.; Blanc-Feraud, L. Image disocclusion using a probabilistic gradient orientation. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR, Cambridge, UK, 26 August 2004; pp. 52–55. [Google Scholar]
- Nitzberg, M.; Mumford, D.; Shiota, T. Filtering, Segmentation and Depth; Springer: Berlin/Heidelberg, Germany, 1993; Volume 662. [Google Scholar]
- Masnou, S.; Morel, J.-M. Level lines based disocclusion. In Proceedings of the 1998 International Conference on Image, Chicago, IL, USA, 4–7 October 1998; pp. 259–263. [Google Scholar]
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 417–424. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J. Context encoders: Feature learning by inpainting. In Proceedings of the CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Xie, J.; Xu, L.; Chen, E. Image denoising and inpainting with deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 341–349. Available online: http://citeseerx.ist.psu.edu/viewdoc/versions?doi=10.1.1.421.2977 (accessed on 3 December 2020).
- Li, Y.; Liu, S.; Yang, J.; Yang, M. Generative face completion. In Proceedings of the CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
- Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Li, Y.; Liu, Y.C.; Su, Y. Three-dimensional traffic scenes simulation from road image sequences. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1121–1134. [Google Scholar] [CrossRef]
- Lakshmi, T.R.V.; Reddy, C.V.K. Object Classification Using SIFT Algorithm and Transformation Techniques; Springer: Berlin/Heisenberg Germany, 2019; Volume 768. [Google Scholar]
- Lienhart, R.; Maydt, J. An extended set of Haar -like features for rapid object detection. In Proceedings of the 2002 IEEE International Conference on Image Processing (ICIP), New York, NY, USA, 22–25 September 2002; pp. 900–903. [Google Scholar]
- Al Jarouf, Y.A.; Kurdy, M.B. A hybrid method to detect and verify vehiclecrash with haar-like features and SVM over the web. In Proceedings of the International Conference on Computer and Applications (ICCA), Beirut, Lebanon, 25–26 August 2018; pp. 177–182. Available online: https://ieeexplore.ieee.org/document/8460417/ (accessed on 3 December 2020).
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Kapoor, R.; Gupta, R.; Son, S.H.; Jha, S.; Kumar, R. Detection of power quality event using histogram of oriented gradients and support vector machine. Measurement 2018, 120, 52–75. [Google Scholar] [CrossRef]
- Subasi, A.; Dammas, D.H.; Alghamdi, R.D.; Makawi, R.A.; Albiety, E.A.; Brahimi, T.; Sarirete, A. Sensor based human activity recognition using adaboost ensemble classifier. Procedia Comput. Sci. 2018, 140, 104–111. [Google Scholar] [CrossRef]
- Faris, H.; Hassonah, M.A.; Ala’m, A.Z.; Seyedali, M.; Ibrahim, A. A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture. Neural Comput. Appl. 2018, 30, 2355–2369. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, S.F.; Wen, L.Y.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, D.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Ghiasi, G.; Lin, T.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. Available online: http://dl.acm.org/citation.cfm?id=2969125 (accessed on 3 December 2020).
- Diganta, M. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Piotr, D. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 99, 2999–3007. [Google Scholar]
- Available online: http://trafficdata.xjtu.edu.cn/index.do (accessed on 3 December 2020).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, S.; Chen, Y.; Huo, H.; Zhu, L. Analysis and Synthesis of Traffic Scenes from Road Image Sequences. Sensors 2020, 20, 6939. https://doi.org/10.3390/s20236939
Yuan S, Chen Y, Huo H, Zhu L. Analysis and Synthesis of Traffic Scenes from Road Image Sequences. Sensors. 2020; 20(23):6939. https://doi.org/10.3390/s20236939
Chicago/Turabian StyleYuan, Sheng, Yuting Chen, Huihui Huo, and Li Zhu. 2020. "Analysis and Synthesis of Traffic Scenes from Road Image Sequences" Sensors 20, no. 23: 6939. https://doi.org/10.3390/s20236939
APA StyleYuan, S., Chen, Y., Huo, H., & Zhu, L. (2020). Analysis and Synthesis of Traffic Scenes from Road Image Sequences. Sensors, 20(23), 6939. https://doi.org/10.3390/s20236939