[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure

Published: 01 July 2015 Publication History

Abstract

In this paper, we propose a new method to online enhance the quality of a depth video based on the intermediary of a so-called static structure of the captured scene. The static and dynamic regions of the input depth frame are robustly separated by a layer assignment procedure, in which the dynamic part stays in the front while the static part fits and helps to update this structure by a novel online variational generative model with added spatial refinement. The dynamic content is enhanced spatially while the static region is otherwise substituted by the updated static structure so as to favor the long-range spatio-temporal enhancement. The proposed method both performs long-range temporal consistency on the static region and keeps necessary depth variations in the dynamic content. Thus, it can produce flicker-free and spatially optimized depth videos with reduced motion blur and depth distortion. Our experimental results reveal that the proposed method is effective in both static and dynamic indoor scenes and is compatible with depth videos captured by Kinect and time-of-flight camera. We also demonstrate that excellent performance can be achieved by the proposed method in comparison with the existing spatio-temporal approaches. In addition, our enhanced depth videos and static structures can act as effective cues to improve various applications, including depth-aided background subtraction and novel view synthesis, showing satisfactory results with few visual artifacts.

References

[1]
J. Diebel and S. Thrun, “An application of Markov random fields to range sensing,” in Advances in Neural Information Processing Systems, vol. 18. Cambridge, MA, USA: MIT Press, 2005, pp. 291–298.
[2]
J. Yang, X. Ye, K. Li, and C. Hou, “Depth recovery using an adaptive color-guided auto-regressive model,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 158–171.
[3]
J. Park, H. Kim, Y.-W. Tai, M. S. Brown, and I. Kweon, “High quality depth map upsampling for 3D-TOF cameras,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 1623–1630.
[4]
J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, no. 3, 2007, Art. ID.
[5]
J. Dolson, J. Baek, C. Plagemann, and S. Thrun, “Upsampling range data in dynamic environments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 1141–1148.
[6]
B. Huhle, T. Schairer, P. Jenke, and W. Straßer, “Fusion of range and color images for denoising and resolution enhancement with a non-local filter,” Comput. Vis. Image Understand., vol. 114, no. 12, pp. 1336–1345, 2010.
[7]
L. Sheng and K. N. Ngan, “Depth enhancement based on hybrid geometric hole filling strategy,” in Proc. 20th IEEE Int. Conf. Image Process., Sep. 2013, pp. 2173–2176.
[8]
J. Lu, H. Yang, D. Min, and M. N. Do, “Patch match filter: Efficient edge-aware filtering meets randomized search for fast correspondence field estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1854–1861.
[9]
C. Richardt, C. Stoll, N. A. Dodgson, H.-P. Seidel, and C. Theobalt, “Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos,” Comput. Graph. Forum, vol. 31, no. 2, pp. 247–256, May 2012.
[10]
D. Min, J. Lu, and M. N. Do, “Depth video enhancement based on weighted mode filtering,” IEEE Trans. Image Process., vol. 21, no. 3, pp. 1176–1190, Mar. 2012.
[11]
L. Sheng, K. N. Ngan, and S. Li, “Temporal depth video enhancement based on intrinsic static structure,” in Proc. IEEE Int. Conf. Image Process., Paris, France, Oct. 2014, pp. 2893–2897.
[12]
H. C. Daniel, J. Kannala, L. Ladický, and J. Heikkilä, “Depth map inpainting under a second-order smoothness prior,” in Image Analysis. Berlin, Germany: Springer-Verlag, 2013, pp. 555–566.
[13]
H. C. Daniel, J. Kannala, P. Sturm, and J. Heikkilä, “A learned joint depth and intensity prior using Markov random fields,” in Proc. IEEE Int. Conf. 3D Vis. (3DV), Jun./Jul. 2013, pp. 17–24.
[14]
A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis., vol. 40, no. 1, pp. 120–145, 2011.
[15]
F. Garcia, B. Mirbach, B. Ottersten, F. Grandidier, and A. Cuesta, “Pixel weighted average strategy for depth sensor data fusion,” in Proc. IEEE Int. Conf. Image Process., Sep. 2010, pp. 2805–2808.
[16]
E. S. L. Gastal and M. M. Oliveira, “Adaptive manifolds for real-time high-dimensional filtering,” ACM Trans. Graph., vol. 31, no. 4, Jul. 2012, Art. ID.
[17]
Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu, “Constant time weighted median filtering for stereo matching and beyond,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 49–56.
[18]
Q. Yang et al., “Fusion of median and bilateral filtering for range image upsampling,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 4841–4852, Dec. 2013.
[19]
Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth super resolution for range images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8.
[20]
S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” in Proc. 17th IEEE Int. Conf. Comput. Vis., vol. 2. Sep. 1999, pp. 722–729.
[21]
C. Vogel, K. Schindler, and S. Roth, “3D scene flow estimation with a rigid motion prior,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 1291–1298.
[22]
C. Vogel, K. Schindler, and S. Roth, “Piecewise rigid scene flow,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1377–1384.
[23]
S.-Y. Kim, J.-H. Cho, A. Koschan, and M. A. Abidi, “Spatial and temporal enhancement of depth images captured by a time-of-flight depth sensor,” in Proc. 20th Int. Conf. Pattern Recognit., Aug. 2010, pp. 2358–2361.
[24]
J. Zhu, L. Wang, J. Gao, and R. Yang, “Spatial-temporal fusion for high accuracy depth maps using dynamic MRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 899–909, May 2010.
[25]
M. Lang, O. Wang, T. Aydin, A. Smolic, and M. Gross, “Practical temporal consistency for image-based graphics applications,” ACM Trans. Graph., vol. 31, no. 4, Jul. 2012, Art. ID.
[26]
J. Shen and S.-C. S. Cheung, “Layer depth denoising and completion for structured-light RGB-D cameras,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1187–1194.
[27]
R. Szeliski, “A multi-view approach to motion and stereo,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1. Jun. 1999, pp. 1157–1163.
[28]
P. Merrell et al., “Real-time visibility-based fusion of depth maps,” in Proc. IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
[29]
S. Liu and D. B. Cooper, “A complete statistical inverse ray tracing approach to multi-view stereo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 913–920.
[30]
Y. M. Kim, C. Theobalt, J. Diebel, J. Kosecka, B. Miscusik, and S. Thrun, “Multi-view image and ToF sensor fusion for dense 3D reconstruction,” in Proc. IEEE 12th Int. Conf. Comput. Vis. Workshops, Sep./Oct. 2009, pp. 1542–1549.
[31]
C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM Trans. Graph., vol. 23, no. 3, pp. 600–608, Aug. 2004.
[32]
K. Pathak, A. Birk, J. Poppinga, and S. Schwertfeger, “3D forward sensor modeling and application to occupancy grid based sensor fusion,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct./Nov. 2007, pp. 2059–2064.
[33]
B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proc. 23rd ACM SIGGRAPH, 1996, pp. 303–312.
[34]
R. A. Newcombe et al., “KinectFusion: Real-time dense surface mapping and tracking,” in Proc. 10th IEEE Int. Symp. Mixed Augmented Reality, Oct. 2011, pp. 127–136.
[35]
O. J. Woodford and G. Vogiatzis, “A generative model for online depth fusion,” in Proc. 12th Eur. Conf. Comput. Vis., 2012, pp. 144–157.
[36]
S. Thrun, “Learning occupancy grids with forward models,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., vol. 3. Oct./Nov. 2001, pp. 1676–1681.
[37]
G. Vogiatzis and C. Hernández, “Video-based, real-time multi-view stereo,” Image Vis. Comput., vol. 29, no. 7, pp. 434–441, 2011.
[38]
C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer-Verlag, 2006.
[39]
T. P. Minka, “A family of algorithms for approximate Bayesian inference,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 2001.
[40]
P. Krähenbühl and V. Koltun, “Efficient inference in fully connected CRFs with Gaussian edge potentials,” in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2011, pp. 109–117.
[41]
E. S. L. Gastal and M. M. Oliveira, “Domain transform for edgeaware image and video processing,” ACM Trans. Graph., vol. 30, no. 4, Jul. 2011, Art. ID.
[42]
D. Ferstl, C. Reinbacher, R. Ranftl, M. Ruether, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 993–1000.
[43]
D. Scharstein and C. Pal, “Learning conditional random fields for stereo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8.
[44]
H. Hirschmuller and D. Scharstein, “Evaluation of cost functions for stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8.
[45]
C. Fehn, “Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV,” Proc. SPIE, vol. 5921, pp. 93–104, May 2004.

Index Terms

  1. Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Image Processing
          IEEE Transactions on Image Processing  Volume 24, Issue 7
          July 2015
          269 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 July 2015

          Author Tags

          1. layer assignment
          2. Static structure
          3. temporally consistent depth video enhancement
          4. online estimation

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 28 Feb 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media