[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Automatic Scene Inference for 3D Object Compositing

Published: 02 June 2014 Publication History

Abstract

We present a user-friendly image editing system that supports a drag-and-drop object insertion (where the user merely drags objects into the image, and the system automatically places them in 3D and relights them appropriately), postprocess illumination editing, and depth-of-field manipulation. Underlying our system is a fully automatic technique for recovering a comprehensive 3D scene model (geometry, illumination, diffuse albedo, and camera parameters) from a single, low dynamic range photograph. This is made possible by two novel contributions: an illumination inference algorithm that recovers a full lighting model of the scene (including light sources that are not directly visible in the photograph), and a depth estimation algorithm that combines data-driven depth transfer with geometric reasoning about the scene layout. A user study shows that our system produces perceptually convincing results, and achieves the same level of realism as techniques that require significant user interaction.

Supplementary Material

MP4 File (a32-sidebyside.mp4)

References

[1]
R. Achanta, A. Shah, K. Smith, A. Lucchi, P. Fua, and S. Strunk. 2012. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11, 2274--2282.
[2]
J. T. Barron and J. Malik. 2013. Intrinsic scene properties from a single rgb-d image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'13).
[3]
S. Boivin and A. Gagalowicz. 2001. Image-based rendering of diffuse, specular and glossy surfaces from a single image. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[4]
I. Boyadzhiev, S. Paris, and K. Bala. 2013. Example-based synthesis of 3d object arrangements. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[5]
A. Criminisi, I. Reid, and A. Zisserman. 2000. Single view metrology. Int. J. Comput. Vis. 40, 2, 123--148.
[6]
P. Debevec. 1998. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[7]
P. Debevec. 2005. Making “the parthenon”. In Proceedings of the International Symposium on Virtual Reality, Archaeology, and Culturage Heritage.
[8]
E. Delage, H. Lee, and A. Y. Ng. 2005. Automatic single-image 3d reconstructions of indoor manhattan world scenes. In Proceedings of the International Symposium on Robotics Research (ISRR'05). 305--321.
[9]
L. D. del Pero, J. Bowdish, E. Hartley, B. Kermgard, and K. Barnard. 2013. Understanding bayesian rooms using composite 3d object models. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'13).
[10]
M. Diaz and P. Sturm. 2013. Estimating photometric properties from image collections. J. Math. Imag. Vis. 47, 1--2, 93--107.
[11]
R. O. Dror, A. S. Willsky, and E. H. Adelson. 2004. Statistical characterization of real-world illumination. J. Vis. 4, 9, 821--837.
[12]
Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. 2009. Manhattan-world stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 1422--1429.
[13]
D. Gallup, J.-M. Frahm, and M. Pollefeys. 2010. Piecewise planar and non-planar stereo for urban scene reconstruction. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'10).
[14]
S. Gibson and A. Murta. 2000. Interactive rendering with real-world illumination. In Proceedings of the Eurographics Symposium on Rendering (EGSR'00). Springer, 365--376.
[15]
R. Grosse, M. K. Johnson, E. H. Adelson, and W. Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In Proceedings of the International Conference on Computer Vision (ICCV'09).
[16]
R. Hartley and A. Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.
[17]
V. Hedau, D. Hoiem, and D. Forsyth. 2009. Recovering the spatial layout of cluttered rooms. In Proceedings of the International Conference on Computer Vision (ICCV'09).
[18]
D. Hoiem, A. Efros, and M. Hebert. 2005a. Geometric context from a single image. In Proceedings of the International Conference on Computer Vision (ICCV'05). Vol. 1. 654--661.
[19]
D. Hoiem, A. A. Efros, and M. Hebert. 2005b. Automatic photo pop-up. ACM Trans. Graph. 24, 3, 577--584.
[20]
Y. Horry, K.-L. Aniyo, and K. Arai. 1997. Tour into the picture: Using a spidery mesh interface to make animation from a single image. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[21]
H. Intraub and M. Richardson. 1989. Wide-angle memories of close-up scenes. J. Exper. Psychol. Learn. Memor. Cogn. 15, 2, 179--187.
[22]
T. Joachims. 2006. Training linear svm in linear time. In Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06). 217--226.
[23]
M. K. Johnson and H. Farid. 2005. Exposing digital forgeries by detecting inconsistencies in lighting. In Proceedings of the Workshop on Multimedia and Security.
[24]
M. K. Johnson and H. Farid. 2007. Exposing digital forgeries in complex lighting environments. IEEE Trans. Inf. Forens. Secur. 2, 3, 450--461.
[25]
K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem. 2011. Rendering synthetic objects into legacy photographs. In Proceedings of the ACM Conference and Exhibition of Computer Graphics and Interactive Techniques in Asia. 157:1--157:12.
[26]
K. Karsch, C. Liu, and S. B. Kang. 2012. Depth extraction from video using non-parametric sampling. In Proceedings of the European Conference on Computer Vision (ECCV'12).
[27]
E. A. Khan, E. Reinhard, R. W. W. Fleming, and H. H. Bulthoff. 2006. Image-based material editing. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[28]
J. Lalonde, A. A. Efros, and S. Narasimhan. 2009. Estimating natural illumination from a single outdoor image. In Proceedings of the International Conference on Computer Vision (ICCV'09).
[29]
J. Lalonde, D. Hoiem, A. A. Efros, and C. Rother. 2007. Photo clip art. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[30]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'06). 2169--2178.
[31]
D. C. Lee, M. Hebert, and T. Kanade. 2009. Geometric reasoning for single image structure recovery. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'09). 2136--2143.
[32]
B. Liu, S. Gould, and D. Koller. 2010. Single image depth estimation from predicted semantic labels. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'10). 1253--1260.
[33]
S. Lombardi and K. Nishino. 2012a. Reflectance and natural illumination from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'12).
[34]
S. Lombardi and K. Nishino. 2012b. Single image multimaterial estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'12).
[35]
J. Lopez-Moreno, S. Hadap, E. Reinhard, and D. Gutierrez. 2010. Compositing images through light source detection. Comput. Graph. 34, 6, 698--707.
[36]
C. Loscos, M.-C. Frasson, G. Drettakis, B. Walter, X. Granier, and P. Poulin. 1999. Interactive virtual relighting and remodeling of real scenes. In Proceedings of the Eurographics Symposium on Rendering (EGSR'99). 329--340.
[37]
J. S. Nimeroff, E. Simoncelli, and J. Dorsey. 1994. Efficient rerendering of naturally illuminated environments. In Proceedings of the Eurographics Symposium on Rendering (EGSR). 359--373.
[38]
K. Nishino and S. K. Nayar. 2004. Eyes for relighting. ACM Trans. Graph. 23, 3, 704--711.
[39]
J. Nocedal and S. J. Wright. 2006. Numerical Optimization 2nd Ed. Springer.
[40]
B. M. Oh, M. Chen, J. Dorsey, and F. Durand. 2001. Image-based modeling and photo editing. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. 433--442.
[41]
A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175.
[42]
A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios. 2011. Illumination estimation and cast shadow detection through a higher-order graphical model. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'11). 673--680.
[43]
M. Pharr and G. Humphreys. 2010. Physically Based Rendering: From Theory to Implementation 2nd Ed. Morgan Kaufmann, San Fransisco.
[44]
R. Ramamoorthi and P. Hanrahan. 2004. A signal-processing framework for reflection. ACM Trans. Graph. 23, 4, 1004--1042.
[45]
G. Ramanarayanan, J. A. Ferwerda, B. Walter, and K. Bala. 2007. Visual equivalence: Towards a new standard for image fidelity. ACM Trans. Graph. 26, 3.
[46]
F. Romeiro, Y. Vasilyev, and T. Zickler. 2008. Passive reflectometry. In Proceedings of the European Conference on Computer Vision (ECCV'08).
[47]
F. Romeiro and T. Zickler. 2010. Blind reflectometry. In Proceedings of the European Conference on Computer Vision (ECCV'10).
[48]
S. Satkin, J. Lin, and M. Hebert. 2012. Data-driven scene understanding from 3d models. In Proceedings of the 2nd British Machine Vision Conference.
[49]
A. Saxena, M. Sun, and A. Y. Ng. 2009. Make3D: Learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31, 5, 824--840.
[50]
C. Schoeneman, J. Dorsey, B. Smith, J. Arvo, and D. Greenberg. 1993. Painting with light. In Proceedings of the 20th Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. ACM Press, New York, 143--146.
[51]
A. G. Schwing and R. Urtasun. 2O12. Efficient exact inference for 3d indoor scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV'12). 299--313.
[52]
R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267--288.
[53]
J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. 2012. Recognizing scene viewpoint using panoramic place representation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'12).
[54]
Y. Yu, P. Debevec, J. Malik, and T. Hawkins. 1999. Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
[55]
A. R. Zang, D. Felinto, and L. Velho. 2012. Augmented reality using full panoramic captured scene light-depth maps. In ACM SIGGRAPH Asia Papers. 28:1.
[56]
Y. Zhang, J. Xiao, J. Hays, and P. Tan. 2013. Framebreak: Dramatic image extrapolation by guided shift-maps. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR'13).

Cited By

View all
  • (2024)Colorful Diffuse Intrinsic Image Decomposition in the WildACM Transactions on Graphics10.1145/368798443:6(1-12)Online publication date: 19-Nov-2024
  • (2024)Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed EnvironmentsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681640(7639-7648)Online publication date: 28-Oct-2024
  • (2024)CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681283(3647-3656)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 33, Issue 3
May 2014
145 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2631978
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2014
Accepted: 01 February 2014
Revised: 01 February 2014
Received: 01 July 2013
Published in TOG Volume 33, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Illumination inference
  2. depth estimation
  3. image-based editing
  4. image-based rendering
  5. physically grounded
  6. scene reconstruction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)5
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Colorful Diffuse Intrinsic Image Decomposition in the WildACM Transactions on Graphics10.1145/368798443:6(1-12)Online publication date: 19-Nov-2024
  • (2024)Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed EnvironmentsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681640(7639-7648)Online publication date: 28-Oct-2024
  • (2024)CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681283(3647-3656)Online publication date: 28-Oct-2024
  • (2024)SAC-GAN: Structure-Aware Image CompositionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.322668930:7(3151-3165)Online publication date: Jul-2024
  • (2024)Monocular Depth Estimation: A Thorough ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333094446:4(2396-2414)Online publication date: Apr-2024
  • (2024)Panoramic Ray Tracing for Interactive Mixed Reality Rendering Based on 360° RGBD VideoIEEE Computer Graphics and Applications10.1109/MCG.2023.332738344:1(62-75)Online publication date: 1-Jan-2024
  • (2024)FastPlane: A Fully Convolutional Network for Real-time 3D Plane Segmentation2024 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE59016.2024.10444446(1-6)Online publication date: 6-Jan-2024
  • (2024)Shadow Generation for Composite Image Using Diffusion Model2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00776(8121-8130)Online publication date: 16-Jun-2024
  • (2024)DiffusionLight: Light Probes for Free by Painting a Chrome Ball2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00018(98-108)Online publication date: 16-Jun-2024
  • (2024)Illuminator: Image-based illumination editing for indoor scene harmonizationComputational Visual Media10.1007/s41095-023-0397-610:6(1137-1155)Online publication date: 5-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media