Abstract
Online 3D reconstruction of real-world scenes has been attracting increasing interests from both the academia and industry, especially with the consumer-level depth cameras becoming widely available. Recent most online reconstruction systems take live depth data from a moving Kinect camera and incrementally fuse them to a single high-quality 3D model in real time. Although most real-world scenes have static environment, the daily objects in a scene often move dynamically, which are non-trivial to reconstruct especially when the camera is also not still. To solve this problem, we propose a single depth camera-based real-time approach for simultaneous reconstruction of dynamic object and static environment, and provide solutions for its key issues. In particular, we first introduce a robust optimization scheme which takes advantage of raycasted maps to segment moving object and background from the live depth map. The corresponding depth data are then fused to the volumes, respectively. These volumes are raycasted to extract views of the implicit surface which can be used as a consistent reference frame for the next iteration of segmentation and tracking. Particularly, in order to handle fast motion of dynamic object and handheld camera in the fusion stage, we propose a sequential 6D pose prediction method which largely increases the registration robustness and avoids registration failures occurred in conventional methods. Experimental results show that our approach can reconstruct moving object as well as static environment with rich details, and outperform conventional methods in multiple aspects.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cao, C., Weng, Y., Lin, S., Zhou, K.: 3d shape regression for real-time facial animation. ACM Trans. Graph. (TOG) 32(4), 41 (2013)
Chen, J., Bautembach, D., Izadi, S.: Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. (TOG) 32(4), 113 (2013)
Chen, K., Lai, Y., Wu, Y.X., Martin, R.R., Hu, S.M.: Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Trans. Gr. 33(6), 208:1–208:12 (2014)
Chen, Y., Medioni, G.: Object modelling by registration of multiple range images. Image Vis. Comput. 10(3), 145–155 (1992)
Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3d scanning deformable objects with a single rgbd sensor. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 493–501. IEEE (2015)
Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo, and motion reconstruction using a single rgb-d camera. ACM Trans. Graph. (TOG) 36(3), 32 (2017)
Hernández, C., Vogiatzis, G., Brostow, G.J., Stenger, B., Cipolla, R.: Non-rigid photometric stereo with colored lights. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)
Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: Volumedeform: real-time volumetric non-rigid reconstruction. In: European Conference on Computer Vision, pp. 362–379. Springer (2016)
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Jaimez, M., Kerl, C., Gonzalez-Jimenez, J., Cremers, D.: Fast odometry and scene flow from rgb-d cameras based on geometric clustering. In: Proc. International Conference on Robotics and Automation (ICRA) (2017)
Kahler, O., Prisacariu, V., Valentin, J., Murray, D.: Hierarchical voxel block hashing for efficient integration of depth images. IEEE Robot. Autom. Lett. 1, 192–197 (2016)
Kahler, O., Prisacariu, V.A., Ren, C.Y., Sun, X., Torr, P., Murray, D.: Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans. Vis. Comput. Graph. 21(11), 1241–1250 (2015)
Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. In: ACM Transactions on Graphics (TOG), vol. 28, p. 175. ACM (2009)
Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling deformable objects from a single depth camera. In: IEEE 12th International Conference on Computer Vision, pp. 167–174. IEEE (2009)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4628–4635. IEEE (2017)
Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352 (2015)
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE international symposium on Mixed and augmented reality (ISMAR), pp. 127–136. IEEE (2011)
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32(6), 169 (2013)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BmVC, vol. 1, p. 3 (2011)
Roth, H., Vona, M.: Moving volume kinectfusion. In: BMVC, pp. 1–11 (2012)
Shen, C.H., Fu, H., Chen, K., Hu, S.M.: Structure recovery by part assembly. ACM Trans. Graph. (TOG) 31(6), 180 (2012)
Steinbrucker, F., Kerl, C., Cremers, D.: Large-scale multi-resolution surface reconstruction from rgb-d sequences. In: The IEEE International Conference on Computer Vision (ICCV) (2013)
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 103–110. IEEE (2012)
Toscana, G., Rosa, S., Bona, B.: Fast graph-based object segmentation for rgb-d images. In: Proceedings of SAI Intelligent Systems Conference, pp. 42–58. Springer (2016)
Weiss, A., Hirshberg, D., Black, M.J.: Home 3d body scans from noisy image and range data. In: IEEE International Conference on Computer Vision (ICCV), pp. 1951–1958. IEEE (2011)
Whelan, T., Kaess, M., Fallon, M., et al.: Kintinuous: Spatially extended kinectFusion [J]. Robot Auton Syst 69(C), 3–14 (2012)
Whelan, T., Salas-Moreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: Elasticfusion: real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)
Xu, K., Huang, H., Shi, Y., Li, H., Long, P., Caichen, J., Sun, W., Chen, B.: Autoscanning for coupled scene reconstruction and proactive object analysis. ACM Trans. Graph. (TOG) 34(6), 177 (2015)
Xu, K., Shi, Y., Zheng, L., Zhang, J., Liu, M., Huang, H., Su, H., Cohen-Or, D., Chen, B.: 3d attention-driven depth acquisition for object identification. ACM Trans. Graph. (TOG) 35(6), 238 (2016)
Yu, T., Guo, K., Xu, F., Dong, Y., Su, Z., Zhao, J., Li, J., Dai, Q., Liu, Y.: Bodyfusion: Real-time capture of human motion and surface geometry using a single depth camera. In: The IEEE International Conference on Computer Vision (ICCV). ACM (2017)
Zhang, Y., Xu, W., Tong, Y., Zhou, K.: Online structure analysis for real-time indoor scene reconstruction. ACM Trans. Graph. (TOG) 34(5), 159 (2015)
Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., et al.: Real-time non-rigid reconstruction using an rgb-d camera. ACM Trans. Graph. (TOG) 33(4), 156 (2014)
Acknowledgements
This study was funded by National Natural Science Foundation of China (Grant Nos. 61502023 and U1736217).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Lu, F., Zhou, B., Zhang, Y. et al. Real-time 3D scene reconstruction with dynamically moving object using a single depth camera. Vis Comput 34, 753–763 (2018). https://doi.org/10.1007/s00371-018-1540-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-018-1540-8