[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Joint Optimization of Depth and Ego-Motion for Intelligent Autonomous Vehicles

Published: 01 July 2023 Publication History

Abstract

The three-dimensional (3D) perception of autonomous vehicles is crucial for localization and analysis of the driving environment, while it involves massive computing resources for deep learning, which can’t be provided by vehicle-mounted devices. This requires the use of seamless, reliable, and efficient massive connections provided by the 6G network for computing in the cloud. In this paper, we propose a novel deep learning framework with 6G enabled transport system for joint optimization of depth and ego-motion estimation, which is an important task in 3D perception for autonomous driving. A novel loss based on feature map and quadtree is proposed, which uses feature value loss with quadtree coding instead of photometric loss to merge the feature information at the texture-less region. Besides, we also propose a novel multi-level V-shaped residual network to estimate the depths of the image, which combines the advantages of V-shaped network and residual network, and solves the problem of poor feature extraction results that may be caused by the simple fusion of low-level and high-level features. Lastly, to alleviate the influence of image noise on pose estimation, we propose a number of parallel sub-networks that use RGB image and its feature map as the input of the network. Experimental results show that our method significantly improves the quality of the depth map and the localization accuracy and achieves the state-of-the-art performance.

References

[1]
Y. Qu and N. Xiong, “RFH: A resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage,” in Proc. 41st Int. Conf. Parallel Process., Sep. 2012, pp. 520–529.
[2]
K. Gao, F. Han, P. Dong, N. Xiong, and R. Du, “Connected vehicle as a mobile sensor for real time queue length at signalized intersections,” Sensors, vol. 19, no. 9, p. 2059, 2019.
[3]
W. Guo, N. Xiong, A. V. Vasilakos, G. Chen, and H. Cheng, “Multi-source temporal data aggregation in wireless sensor networks,” Wireless Pers. Commun., vol. 56, no. 3, pp. 359–370, Feb. 2011.
[4]
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3354–3361.
[5]
O. Wasenmüller and D. Stricker, “Comparison of Kinect V1 and V2 depth images in terms of accuracy and precision,” in Proc. Asian Conf. Comput. Vis., Cham, Switzerland: Springer, 2016, pp. 34–45.
[6]
R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “DTAM: Dense tracking and mapping in real-time,” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 2320–2327.
[7]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2003.
[8]
H. Bay, T. Tuytelaars, and L. V. Gool, “Surf: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, Heidelberg, 2006, pp. 404–417.
[9]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 2564–2571.
[10]
A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM: Real-time single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 1052–1067, Jun. 2007.
[11]
R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robot., vol. 33, no. 5, pp. 1255–1262, Oct. 2017.
[12]
M. Pizzoli, C. Forster, and D. Scaramuzza, “REMODE: Probabilistic, monocular dense reconstruction in real time,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2014, pp. 2609–2616.
[13]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 2, pp. 84–90, Jun. 2012.
[14]
N. Mayeret al., “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4040–4048.
[15]
Y. Chen, W. Li, and L. V. Gool, “ROAD: Reality oriented adaptation for semantic segmentation of urban scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 7892–7901.
[16]
T. Shenet al., “Beyond photometric loss for self-supervised ego-motion estimation,” in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019, pp. 6359–6365.
[17]
A. Ranjanet al., “Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 12240–12249.
[18]
Y. Chen, C. Schmid, and C. Sminchisescu, “Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7063–7072.
[19]
B. Bozorgtabar, M. S. Rad, D. Mahapatra, and J.-P. Thiran, “SynDeMo: Synergistic deep feature alignment for joint learning of depth and ego-motion,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4210–4219.
[20]
Z. Yin and J. Shi, “GeoNet: Unsupervised learning of dense depth, optical flow and camera pose,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1983–1992.
[21]
V. Casser, S. Pirk, R. Mahjourian, and A. Angelova, “Unsupervised monocular depth and ego-motion learning with structure and semantics,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 381–388.
[22]
A. Atapour-Abarghouei and T. P. Breckon, “Veritatem dies Aperit–temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3373–3384.
[23]
M. Klingner and T. Fingscheidt, “Online performance prediction of perception DNNs by multi-task learning with depth estimation,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4670–4683, Jul. 2021.
[24]
K. Tateno, F. Tombari, I. Laina, and N. Navab, “CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6243–6252.
[25]
T. Laidlow, J. Czarnowski, and S. Leutenegger, “DeepFusion: Real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions,” in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019, pp. 4068–4074.
[26]
C. Tang and P. Tan, “BA-Net: Dense bundle adjustment network,” 2018, arXiv:1806.04807.
[27]
N. Yang, R. Wang, J. Stuckler, and D. Cremers, “Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Jun. 2018, pp. 817–833.
[28]
J.-H. Lee, M. Heo, K.-R. Kim, and C.-S. Kim, “Single-image depth estimation based on Fourier domain analysis,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 330–339.
[29]
Y. Wang and Y. F. Xu “Unsupervised learning of accurate camera pose and depth from video sequences with Kalman filter,” IEEE Access, vol. 7, pp. 32796–32804, 2019.
[30]
W. Chuah, R. Tennakoon, R. Hoseinnezhad, and A. Bab-Hadiashar, “Deep learning-based incorporation of planar constraints for robust stereo depth estimation in autonomous vehicle applications,” IEEE Trans. Intell. Transp. Syst., early access, Feb. 26, 2021. 10.1109/ TITS.2021.3060001.
[31]
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2366–2374.
[32]
J. N. Kundu, P. K. Uppala, A. Pahuja, and R. V. Babu, “AdaDepth: Unsupervised content congruent adaptation for depth estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 2656–2665.
[33]
N. Yang, L. von Stumberg, R. Wang, and D. Cremers, “D3 VO: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1281–1292.
[34]
S. Pillai, R. Ambrus, and A. Gaidon, “SuperDepth: Self-supervised, super-resolved monocular depth estimation,” in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019, pp. 9250–9256.
[35]
I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in Proc. 4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 239–248.
[36]
J. Spencer, R. Bowden, and S. Hadfield, “DeFeat-Net: General monocular depth via simultaneous unsupervised representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14402–14413.
[37]
K. Park, S. Kim, and K. Sohn, “High-precision depth estimation using uncalibrated LiDAR and stereo fusion,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 1, pp. 321–335, Jan. 2020.
[38]
J. Tang, J. Folkesson, and P. Jensfelt, “Sparse2Dense: From direct sparse odometry to dense 3-D reconstruction,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 530–537, Apr. 2019.
[39]
M. Poggi, F. Tosi, and S. Mattoccia, “Learning monocular depth estimation with unsupervised trinocular assumptions,” in Proc. Int. Conf. 3D Vis. (3DV), Sep. 2018, pp. 324–333.
[40]
J. Wang, G. Zhang, Z. Wu, X. Li, and L. Liu, “Self-supervised joint learning framework of depth estimation via implicit cues,” 2020, arXiv:2006.09876.
[41]
R. Mahjourian, M. Wicke, and A. Angelova, “Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5667–5675.
[42]
A. Wong and S. Soatto, “Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5644–5653.
[43]
H.-Y. Lai, Y.-H. Tsai, and W.-C. Chiu, “Bridging stereo matching and optical flow via spatiotemporal correspondence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 1890–1899.
[44]
M. Heo, J. Lee, K.-R. Kim, H.-U. Kim, and C.-S. Kim, “Monocular depth estimation using whole strip masking and reliability-based refinement,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 36–51.
[45]
P. P. Srinivasan, R. Garg, N. Wadhwa, R. Ng, and J. T. Barron, “Aperture supervision for monocular depth estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6393–6401.
[46]
W. Su, H. Zhang, Q. Zhou, W. Yang, and Z. Wang, “Monocular depth estimation using information exchange network,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 6, pp. 3491–3503, Jun. 2021.
[47]
S. Jia, X. Pei, X. Jing, and D. Yao, “Self-supervised 3D reconstruction and ego-motion estimation via on-board monocular video,” IEEE Trans. Intell. Transp. Syst., early access, Apr. 19, 2021. 10.1109/TITS.2021.3071428.
[48]
J. Watson, M. Firman, G. Brostow, and D. Turmukhambetov, “Self-supervised monocular depth hints,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 2162–2171.
[49]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
[50]
F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,” in Proc. 4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 565–571.
[51]
Z. Zhang, X. Zhang, C. Peng, X. Xue, and J. Sun, “Exfuse: Enhancing feature fusion for semantic segmentation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 269–284.
[52]
Q. Zhang, C. Zhou, Y.-C. Tian, N. Xiong, Y. Qin, and B. Hu, “A fuzzy probability Bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems,” IEEE Trans. Ind. Informat., vol. 14, no. 6, pp. 2497–2506, Jun. 2018.
[53]
W. Fang, X. Yao, X. Zhao, J. Yin, and N. Xiong, “A stochastic control approach to maximize profit on service provisioning for mobile cloudlet platforms,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 48, no. 4, pp. 522–534, Apr. 2018.
[54]
Y. Qu and N. Xiong, “RFH: A resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage,” in Proc. 41st Int. Conf. Parallel Process., Sep. 2012, pp. 520–529.
[55]
M. Wu, L. Tan, and N. Xiong, “A structure fidelity approach for big data collection in wireless sensor networks,” Sensors, vol. 15, no. 1, pp. 248–273, Jan. 2015.
[56]
H. Li, J. Liu, R. W. Liu, N. Xiong, K. Wu, and T.-H. Kim, “A dimensionality reduction-based multi-step clustering method for robust vessel trajectory analysis,” Sensors, vol. 17, no. 8, p. 1792, 2017.
[57]
Z. Zhou, X. Chen, Y. Zhang, and S. Mumtaz, “Blockchain-empowered secure spectrum sharing for 5G heterogeneous networks,” IEEE Netw., vol. 34, no. 1, pp. 24–31, Jan./Feb. 2020.
[58]
Z. Zhouet al., “Energy-efficient resource allocation for energy harvesting-based cognitive machine-to-machine communications,” IEEE Trans. Cognit. Commun. Netw., vol. 5, no. 3, pp. 595–607, Sep. 2019.
[59]
X. Liet al., “Hardware impaired ambient backscatter NOMA systems: Reliability and security,” IEEE Trans. Commun., vol. 69, no. 4, pp. 2723–2736, Apr. 2021.
[60]
S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D scene understanding benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 567–576.
[61]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Proc. Eur. Conf. Comput. Vis., Cham, Switzerland: Springer, 2012, pp. 746–760.
[62]
A. Janochet al., “A category-level 3-D object dataset: Putting the Kinect to work,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCV Workshops), Nov. 2011, pp. 141–165.
[63]
J. Xiao, A. Owens, and A. Torralba, “SUN3D: A database of big spaces reconstructed using SfM and object labels,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1625–1632.
[64]
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2012, pp. 573–580.
[65]
A. Handa, T. Whelan, J. McDonald, and A. J. Davison, “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2014, pp. 1524–1531.
[66]
Y. Kuznietsov, J. Stuckler, and B. Leibe, “Semi-supervised deep learning for monocular depth map prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6647–6655.
[67]
A. Pilzer, S. Lathuiliere, N. Sebe, and E. Ricci, “Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9768–9777.
[68]
C. Godard, O. M. Aodha, M. Firman, and G. Brostow, “Digging into self-supervised monocular depth estimation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 3828–3838.
[69]
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1163, Oct. 2015.
[70]
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1851–1858.
[71]
J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: Large-scale direct monocular SLAM,” in Proc. Eur. Conf. Comput. Vis., Cham, Switzerland: Springer, 2014, pp. 834–849.
[72]
J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. 611–625, Mar. 2016.

Cited By

View all
  • (2023)Geometry-Aware Network for Unsupervised Learning of Monocular Camera’s Ego-MotionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.329871524:12(14226-14236)Online publication date: 1-Dec-2023
  • (2023)Survey on video anomaly detection in dynamic scenes with moving camerasArtificial Intelligence Review10.1007/s10462-023-10609-x56:Suppl 3(3515-3570)Online publication date: 1-Dec-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Intelligent Transportation Systems
IEEE Transactions on Intelligent Transportation Systems  Volume 24, Issue 7
July 2023
1120 pages

Publisher

IEEE Press

Publication History

Published: 01 July 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Geometry-Aware Network for Unsupervised Learning of Monocular Camera’s Ego-MotionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.329871524:12(14226-14236)Online publication date: 1-Dec-2023
  • (2023)Survey on video anomaly detection in dynamic scenes with moving camerasArtificial Intelligence Review10.1007/s10462-023-10609-x56:Suppl 3(3515-3570)Online publication date: 1-Dec-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media