Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.11131 (cs)

[Submitted on 17 Mar 2024 (v1), last revised 20 Sep 2024 (this version, v3)]

Title:Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Authors:Yonggan Fu, Huaizhi Qu, Zhifan Ye, Chaojian Li, Kevin Zhao, Yingyan Celine Lin

Abstract:Recent breakthroughs in Neural Radiance Fields (NeRFs) have sparked significant demand for their integration into real-world 3D applications. However, the varied functionalities required by different 3D applications often necessitate diverse NeRF models with various pipelines, leading to tedious NeRF training for each target task and cumbersome trial-and-error experiments. Drawing inspiration from the generalization capability and adaptability of emerging foundation models, our work aims to develop one general-purpose NeRF for handling diverse 3D tasks. We achieve this by proposing a framework called Omni-Recon, which is capable of (1) generalizable 3D reconstruction and zero-shot multitask scene understanding, and (2) adaptability to diverse downstream 3D applications such as real-time rendering and scene editing. Our key insight is that an image-based rendering pipeline, with accurate geometry and appearance estimation, can lift 2D image features into their 3D counterparts, thus extending widely explored 2D tasks to the 3D world in a generalizable manner. Specifically, our Omni-Recon features a general-purpose NeRF model using image-based rendering with two decoupled branches: one complex transformer-based branch that progressively fuses geometry and appearance features for accurate geometry estimation, and one lightweight branch for predicting blending weights of source views. This design achieves state-of-the-art (SOTA) generalizable 3D surface reconstruction quality with blending weights reusable across diverse tasks for zero-shot multitask scene understanding. In addition, it can enable real-time rendering after baking the complex geometry branch into meshes, swift adaptation to achieve SOTA generalizable 3D understanding performance, and seamless integration with 2D diffusion models for text-guided 3D editing.

Comments:	Accepted by ECCV 2024 as an Oral Paper
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.11131 [cs.CV]
	(or arXiv:2403.11131v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.11131

Submission history

From: Yonggan Fu [view email]
[v1] Sun, 17 Mar 2024 07:47:26 UTC (37,204 KB)
[v2] Thu, 18 Jul 2024 12:21:15 UTC (37,196 KB)
[v3] Fri, 20 Sep 2024 14:30:33 UTC (37,196 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators