3D Scene Understanding with Deep Learning

January 2022

Author:
Junming Zhang
University of Michigan
,
Advisors:
Johnson-Roberson, Matthew
University of Michigan
,
Owens, Andrew
University of Michigan
,
Committee Members:
Vasudevan, Ram
University of Michigan
,
Fessler, Jeffrey
University of Michigan

Publisher:

University of Michigan
Dept. 72 Ann Arbor, MI
United States

ISBN:979-8-3684-7626-1

Order Number:AAI30353349

Purchase on ProQuest

Bibliometrics

Abstract

3D scene understanding is crucial for robotics, augmented reality and autonomous vehicles. In those applications, the 3D structure can be computed by using stereo cameras or depth sensors. One can process these 3D measurements using deep learning techniques to achieve remarkable performance in various perception related tasks. However, different from images that have a dominant representation as 2D pixel arrays, 3D data has many representations, including voxels, meshes, depth images, point clouds, and etc. Among all of them, depth images and point clouds are closer to the direct output from 3D measurements, as depth images are computed by stereo cameras and point clouds are generated from LiDAR. The recent improved accessibility of those 3D measurements requires the need of algorithms to interpret them. Therefore, this dissertation develops algorithms for 3D scene understanding with deep learning techniques for depth images and point clouds.The first portion of this dissertation describes an algorithm to estimate accurate depth maps from stereo images. In particular, by solving the stereo matching problems, one can generate a disparity map and convert it into a dense depth image. During the processing, the semantic embedding learned from semantic segmentation further helps to guide the disparity estimation, especially for smooth, reflective and occluded regions. With the computed depth images and semantic segments, we can efficiently produce semantic 3D models. The second portion of the dissertation addresses the challenges of processing point cloud data which may be arbitrarily rotated. To solve perception tasks with random rotation in real-world point cloud data, traditional techniques employ data augmentation. However, this can increase training time and may require more complex deep learning models. To address rotations that may not exist in the training data, this dissertation proposes a 3D representation of point clouds that is designed to be rotationally invariant and introduces a novel neural network architecture to utilize this representation.The third portion of the dissertation devises methods to address the challenge of processing real-world point clouds due to partial observations. This dissertation applies a multivariate Gaussian distribution to model the output from each local point set and illustrates how to use each such local point set to infer the latent feature encoding information contained by a complete point cloud. This strategy ensures accurate prediction with a partial observed point set for different tasks, such as shape classification, part segmentation, and point cloud completion.The final portion of the dissertation focuses on point cloud completion. At processing the point clouds, existing approaches adopt encoder-decoder structures and output sparse distributed embeddings, which may lead to worse generalizability at testing. In addition, analysis of point cloud completion trained jointly with other tasks are lacking. To address those limitations, this dissertation proposes a novel module that includes a normalization layer to normalize embeddings into unit one, and the module can be integrated into existing approaches. Both the theory and empirical results are shown to demonstrate the effectiveness of the proposed method on improving point cloud completion performance.

Contributors

Junming Zhang
University of Michigan, Ann Arbor
- Publication Years2022 - 2022
- Publication counts1
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
Johnson Roberson Matthew
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
Owens Andrew
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
Vasudevan Ravi Ram
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile
Fessler Jeffrey
- Publication Years
- Publication counts0
- Citation count0
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article0
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Monocular 3D scene understanding with explicit occlusion reasoning
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Scene understanding from a monocular, moving camera is a challenging problem with a number of applications including robotics and automotive safety. While recent systems have shown that this is best accomplished with a 3D scene model, handling of ...
3d scene modeling and understanding from image sequences
Manhattan Scene Understanding via XSlit Imaging
CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

A Manhattan World (MW) is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes ...

Browse Theses

Sections

Monocular 3D scene understanding with explicit occlusion reasoning

3d scene modeling and understanding from image sequences

Manhattan Scene Understanding via XSlit Imaging

Sections

Save to Binder

Recommendations

Monocular 3D scene understanding with explicit occlusion reasoning

3d scene modeling and understanding from image sequences

Manhattan Scene Understanding via XSlit Imaging