[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Panorama-LiDAR Fusion for Dense Omnidirectional Depth Completion in 3D Street Scene

Published: 06 May 2024 Publication History

Abstract

To create immersive 3D metaverse and digital twin experiences, 3D perception of scenes for consumer electronic products is crucial. Therefore, it is vital for consumer electronic products to generate precise and dense depth estimation. Existing technologies can produce decent depth estimates in small-scale indoor scenes. However, depth estimation of large-scale outdoor omnidirectional 3D scenes is indispensable in practical applications. Nevertheless, existing technologies fail to deliver high-quality omnidirectional depth estimation results. To address these challenges, we propose a novel omnidirectional depth completion network (DODCNet) designed explicitly for outdoor scenes, leveraging panorama-LiDAR sensors. This framework incorporates crossmodal fusion and distortion sensing, comprising two stages: the panoramic depth feature completion network (PDFCN) and the RGB-guided panoramic depth refinement network (RGB-PDRN). The PDFCN generates density-balanced geometric depth features to bridge the gap between cross-modalities. The RGB-PDRN further integrates cross-modal features at the channel level using attention mechanisms. Additionally, we introduce deformable spherical convolution to efficiently extract panoramic features and employ a panoramic depth-aware loss function to enhance the accuracy of omnidirectional depth estimation. Extensive experiments demonstrate that our proposed DODCNet outperforms state-of-the-art methods on the proposed panorama-LiDAR 360RGBD dataset and Holicity datasets.

References

[1]
R. Liu, J. Zhang, S. Chen, and C. Arth, “Towards SLAM-based outdoor localization using poor GPS and 2.5 D building models,” in Proc. IEEE Int. Symp. Mixed Augmented Real. (ISMAR), 2019, pp. 1–7. 10.1109/ISMAR.2019.00016.
[2]
D.-S. Kim, S.-S. Lee, and B.-H. Choi, “A real-time stereo depth extraction hardware for intelligent home assistant robot,” IEEE Trans. Consum. Electron., vol. 56, no. 3, pp. 1782–1788, Aug. 2010. 10.1109/TCE.2010.5606326.
[3]
T.-Y. Kuo and Y.-C. Lo, “Depth estimation from a monocular view of the outdoors,” IEEE Trans. Consum. Electron., vol. 57, no. 2, pp. 817–822, May 2011. 10.1109/TCE.2011.5955227.
[4]
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. Adv. Neural Inf. Proces. Syst., vol. 27, 2014, pp. 1–9.
[5]
R. Liu, Z. Liu, H. Zhang, G. Zhang, Z. Zuo, and W. Sheng, “Dense depth completion based on multi-scale confidence and self-attention mechanism for intestinal endoscopy,” in Proc. IEEE Int. Conf. Robot. Autom., 2023, pp. 7476–7482. 10.1109/ICRA48891.2023.10161549.
[6]
R. Liu et al., “Sparse-to-dense coarse-to-fine depth estimation for colonoscopy,” Comput. Biol. Med., vol. 160, Jun. 2023, Art. no. 10.1016/j.compbiomed.2023.106983.
[7]
S. J. Ha, H. I. Koo, S. H. Lee, N. I. Cho, and S. K. Kim, “Panorama mosaic optimization for mobile camera systems,” IEEE Trans. Consum. Electron., vol. 53, no. 4, pp. 1217–1225, Nov. 2007. 10.1109/TCE.2007.4429204.
[8]
H. Afshari, V. Popovic, T. Tasci, A. Schmid, and Y. Leblebici, “A spherical multi-camera system with real-time omnidirectional video acquisition capability,” IEEE Trans. Consum. Electron., vol. 58, no. 4, pp. 1110–1118, Nov. 2012. 10.1109/TCE.2012.6414975.
[9]
S. Ono and C. Premachandra, “Generation of panoramic images by two hemispherical cameras independent of installation location,” IEEE Consum. Electron. Mag., vol. 11, no. 1, pp. 17–25, Jan. 2022. 10.1109/MCE.2020.3031090.
[10]
R. Thakur, “Scanning LIDAR in advanced driver assistance systems and beyond: Building a road map for next-generation LIDAR technology,” IEEE Consum. Electron. Mag., vol. 5, no. 3, pp. 48–54, Jul. 2016.
[11]
B. Iepure and A. W. Morales, “A novel tracking algorithm using thermal and optical cameras fused with mmWave radar sensor data,” IEEE Trans. Consum. Electron., vol. 67, no. 4, pp. 372–382, Nov. 2021. 10.1109/TCE.2021.3128825.
[12]
K. M. A. Alheeti, A. Alzahrani, and D. Al Dosary, “LiDAR spoofing attack detection in autonomous vehicles,” in Proc. IEEE Int. Conf. Consum. Electron, 2022, pp. 1–2.
[13]
J.-I. Jung and Y.-S. Ho, “Depth image interpolation using confidence-based Markov random field,” IEEE Trans. Consum. Electron., vol. 63, no. 4, pp. 386–391, Nov. 2017. 10.1109/TCE.2012.6415012.
[14]
S.-Y. Kim, M. Kim, and Y.-S. Ho, “Depth image filter for mixed and noisy pixel removal in RGB-D camera systems,” IEEE Trans. Consum. Electron., vol. 59, no. 3, pp. 681–689, Aug. 2013. 10.1109/TCE.2013.6626256.
[15]
I. Andorko, P. Corcoran, and P. Bigioi, “Proposal of a universal test scene for depth map evaluation,” IEEE Trans. Consum. Electron., vol. 59, no. 2, pp. 385–390, May 2013.
[16]
D.-Y. Nam and J.-K. Han, “Improved depth estimation algorithm via superpixel segmentation and graph-cut,” in Proc. IEEE Int. Conf. Consum. Electron, 2021, pp. 1–7.
[17]
K. Kaneda, T. Ooba, H. Shimada, O. Shiku, and Y. Teshima, “Estimation method of calorie intake by deep learning using depth images obtained through a single camera smartphone,” in Proc. IEEE Glob. Conf. Consum. Electron, 2021, pp. 874–875.
[18]
D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V. Sze, “FastDepth: Fast monocular depth estimation on embedded systems,” in Proc. IEEE Int. Conf. Robot. Autom., 2019, pp. 6101–6108. 10.1109/ICRA.2019.8794182.
[19]
R. Liu, G. Zhang, J. Wang, and S. Zhao, “Cross-modal 360° depth completion and reconstruction for large-scale indoor environment,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 12, pp. 25180–25190, Dec. 2022. 10.1109/TITS.2022.3155925.
[20]
M. Poggi, F. Tosi, F. Aleotti, and S. Mattoccia, “Real-time self-supervised monocular depth estimation without GPU,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 17342–17353, Oct. 2022. 10.1109/TITS.2022.3157265.
[21]
M. Ohta, S. Nagano, H. Otani, and K. Yamashita, “Augmented reality product display system on a 360-degree view inside store,” in Proc. IEEE 3rd Global Conf. Consum. Electron. (GCCE), 2014, pp. 167–168. 10.1109/GCCE.2014.7031164.
[22]
S. Mann, “Phenomenal augmented reality: Advancing technology for the future of humanity,” IEEE Consum. Electron. Mag., vol. 4, no. 4, pp. 92–97, Oct. 2015.
[23]
Y. Song and Y.-S. Ho, “High-resolution depth map generator for 3D video applications using time-of-flight cameras,” IEEE Trans. Consum. Electron., vol. 63, no. 4, pp. 386–391, Nov. 2017. 10.1109/TCE.2017.015096.
[24]
D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proc. IEEE Inter. Conf. Comput. Vis, 2015, pp. 2650–2658. 10.1109/ICCV.2015.304.
[25]
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2002–2011. 10.1109/CVPR.2018.00214.
[26]
J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant CNNs,” in Proc. Int. Conf. 3D Vis., 2017, pp. 11–20. 10.1109/3DV.2017.00012.
[27]
Z. Huang, J. Fan, S. Cheng, S. Yi, X. Wang, and H. Li, “HMS-Net: Hierarchical multi-scale sparsity-invariant network for sparse depth completion,” IEEE Trans. Image Process., vol. 29, pp. 3429–3441, 2019. 10.1109/TIP.2019.2960589.
[28]
Z. Tao, P. Shuguo, G. Wang, and S. Yingchun, “Learning modal and spatial features with lightweight 3D convolution for RGB guided depth completion,” IEEE Trans. Consum. Electron., vol. 67, no. 3, pp. 195–201, Aug. 2021. 10.1109/TCE.2021.3095378.
[29]
F. Ma and S. Karaman, “Sparse-to-dense: Depth prediction from sparse depth samples and a single image,” in Proc. IEEE Int. Conf. Rob. Autom., 2018, pp. 4796–4803. 10.1109/ICRA.2018.8460184.
[30]
S. S. Shivakumar, T. Nguyen, I. D. Miller, S. W. Chen, V. Kumar, and C. J. Taylor, “DFuseNet: Deep fusion of RGB and sparse depth information for image guided dense depth completion,” in Proc. IEEE Intell. Transp. Syst. Conf., 2019, pp. 13–20. 10.1109/ITSC.2019.8917294.
[31]
Y. Yang, A. Wong, and S. Soatto, “Dense depth posterior (DDP) from single image and sparse range,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3353–3362. 10.1109/CVPR.2019.00347.
[32]
V. R. Kumar et al., “FisheyeDistanceNet: Self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving,” in Proc. IEEE Int. Conf. Robot. Autom., 2020, pp. 574–581. 10.1109/ICRA40945.2020.9197319.
[33]
B. Coors, A. P. Condurache, and A. Geiger, “SphereNet: Learning spherical representations for detection and classification in omnidirectional images,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 518–533. 10.1007/978-3-030-01240-3_32.
[34]
K. Yang, X. Hu, L. M. Bergasa, E. Romera, and K. Wang, “PASS: Panoramic annular semantic segmentation,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 10, pp. 4171–4185, Oct. 2020. 10.1109/TITS.2019.2938965.
[35]
K. Yang, X. Hu, Y. Fang, K. Wang, and R. Stiefelhagen, “Omnisupervised omnidirectional semantic segmentation,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 2, pp. 1184–1199, Feb. 2022. 10.1109/TITS.2020.3023331.
[36]
N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “OmniDepth: Dense depth estimation for indoors spherical panoramas,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 448–465. 10.1007/978-3-030-01231-1_28.
[37]
K. Tateno, N. Navab, and F. Tombari, “Distortion-aware convolutional filters for dense prediction in panoramic images,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 707–722. 10.1007/978-3-030-01270-0_43.
[38]
H.-X. Chen, K. Li, Z. Fu, M. Liu, Z. Chen, and Y. Guo, “Distortion-aware monocular depth estimation for omnidirectional images,” IEEE Signal Process. Lett., vol. 28, pp. 334–338, 2021. 10.1109/LSP.2021.3050712.
[39]
Y. Li, Y. Guo, Z. Yan, X. Huang, Y. Duan, and L. Ren, “OmniFusion: 360 monocular depth estimation via geometry-aware fusion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2801–2810. 10.1109/CVPR52688.2022.00282.
[40]
P. Mishra, A. Viswanathan, and A. Srinivasan, “A supervised learning approach to far range depth estimation using a consumer-grade RGB-D camera,” in Proc. IEEE Int. Conf. Electron., Comput. Commun. Technol, 2013, pp. 1–6.
[41]
Q. Feng, H. P. Shum, and S. Morishima, “360 depth estimation in the wild-the depth360 dataset and the SegFuse network,” in Proc. IEEE Conf. Virtual Real. 3D User Interfaces, 2022, pp. 664–673. 10.1109/VR51125.2022.00087.
[42]
Z. Chen, V. Badrinarayanan, G. Drozdov, and A. Rabinovich, “Estimating depth from RGB and sparse sensing,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 167–182. 10.1007/978-3-030-01225-0_11.
[43]
A. Eldesokey, M. Felsberg, and F. Khan, “Propagating confidences through CNNs for sparse data regression,” 2018, arXiv:1805.11913.
[44]
A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through CNNs for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2423–2436, Oct. 2020. 10.1109/TPAMI.2019.2929170.
[45]
C. Fernandez-Labrador, J. M. Facil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. J. Guerrero, “Corners for layout: End-to-end layout recovery from 360 images,” IEEE Robot. Autom., vol. 5, no. 2, pp. 1255–1262, Apr. 2020. 10.1109/LRA.2020.2967274.
[46]
S. Orhan and Y. Bastanlar, “Semantic segmentation of outdoor panoramic images,” Signal, Image Video Process., vol. 16, no. 3, pp. 643–650, 2022. 10.1007/s11760-021-02003-3.
[47]
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent., 2015, pp. 234–241. 10.1007/978-3-319-24574-4_28.
[48]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, pp. 2011–2023, Aug. 2020. 10.1109/TPAMI.2019.2913372.
[49]
M. Song, S. Lim, and W. Kim, “Monocular depth estimation using Laplacian pyramid-based depth residuals,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 11, pp. 4381–4393, Nov. 2021. 10.1109/TCSVT.2021.3049869.
[50]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 746–760. 10.1007/978-3-642-33715-4_54.
[51]
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proc. Conf. Robot. Learn., vol. 78, 2017, pp. 1–16.
[52]
Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3D context model for panoramic scene understanding,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 668–686. 10.1007/978-3-319-10599-4_43.
[53]
Y. Chen et al., “360ORB-SLAM: A visual SLAM system for panoramic images with depth completion network,” 2024, arXiv:2401.10560.
[54]
Y. Zhou et al., “HoliCity: A city-scale data platform for learning holistic 3D structures,” 2020, arXiv:2008.03286.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Consumer Electronics
IEEE Transactions on Consumer Electronics  Volume 70, Issue 2
May 2024
364 pages

Publisher

IEEE Press

Publication History

Published: 06 May 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media