Abstract
The reconstruction of high-quality 3D clothed humans from monocular images or videos has gained popularity in recent years due to its significant practical applications. While several surveys have addressed the reconstruction of full-body parametric human models from images or videos, this survey specifically delves into the challenges and methodologies of reconstructing 3D clothed humans. It covers both pose-dependent and dynamic approaches to clothed human reconstruction. Regarding pose-dependent clothed human reconstruction from monocular images, we investigate methodologies that employ regression models trained on high-quality 3D scans to estimate human geometry with clothing. Additionally, we explore research leveraging texture priors within large-scale diffusion models to enhance the inference of human appearance in occluded or unseen areas. In terms of dynamic clothed human reconstruction from monocular and sparse multi-view videos, we analyze human modeling techniques utilizing neural radiance fields and 3D Gaussian representations, which employ deformation fields to capture human movements across frames. Furthermore, we provide an overview of the datasets and commonly used quantitative evaluation metrics in these studies. Finally, we conclude by discussing open issues and proposing future research directions in the realistic reconstruction of clothed humans, emphasizing areas that warrant additional investigation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
Salagean, A., Crellin, E., Parsons, M., Cosker, D., Fraser, D.S.: Meeting your virtual twin: effects of photorealism and personalization on embodiment, self-identification and perception of self-avatars in virtual reality. In: CHI, pp. 499–149916 (2023). https://doi.org/10.1145/3544548.3581182
Panda, P., Nicholas, M.J., González-Franco, M., Inkpen, K., Ofek, E., Cutler, R., Hinckley, K., Lanier, J.: AllTogether: effect of avatars in mixed-modality conferencing environments. In: CHIWORK, pp. 8–1810 (2022). https://doi.org/10.1145/3533406.3539658
Manfredi, G., Gilio, G., Baldi, V., Youssef, H., Erra, U.: VICO-DR: a collaborative virtual dressing room for image consulting. J. Imaging 9(4), 76 (2023). https://doi.org/10.3390/JIMAGING9040076
Szolin, K., Kuss, D.J., Nuyens, F.M., Griffiths, M.D.: Exploring the user-avatar relationship in videogames: a systematic review of the Proteus effect. Hum. Comput. Interact. 38(5–6), 374–399 (2023). https://doi.org/10.1080/07370024.2022.2103419
Guo, K., Lincoln, P., Davidson, P.L., Busch, J., Yu, X., Whalen, M., Harvey, G., Orts-Escolano, S., Pandey, R., Dourgarian, J., Tang, D., Tkach, A., Kowdle, A., Cooper, E., Dou, M., Fanello, S.R., Fyffe, G., Rhemann, C., Taylor, J., Debevec, P.E., Izadi, S.: The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. 38(6), 217–121719 (2019). https://doi.org/10.1145/3355089.3356571
Collet, A., Chuang, M., Sweeney, P., Gillett, D., Evseev, D., Calabrese, D., Hoppe, H., Kirk, A.G., Sullivan, S.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69–16913 (2015). https://doi.org/10.1145/2766945
Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV, pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239
Saito, S., Simon, T., Saragih, J.M., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR, pp. 81–90 (2020). https://doi.org/10.1109/CVPR42600.2020.00016
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: CVPR, pp. 13286–13296 (2022). https://doi.org/10.1109/CVPR52688.2022.01294
Weng, C., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: HumanNeRF: free-viewpoint rendering of moving people from monocular video. In: CVPR, pp. 16189–16199 (2022). https://doi.org/10.1109/CVPR52688.2022.01573
Hu, S., Liu, Z.: GauHuman: articulated Gaussian splatting from monocular human videos. In: CVPR (2024)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248–124816 (2015). https://doi.org/10.1145/2816795.2818013
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV, pp. 792–804 (2021). https://doi.org/10.1109/3DV53792.2021.00088
Alldieck, T., Magnor, M.A., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: CVPR, pp. 1175–1186 (2019). https://doi.org/10.1109/CVPR.2019.00127
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.A.: Tex2Shape: detailed full human body geometry from a single image. In: ICCV, pp. 2293–2303 (2019). https://doi.org/10.1109/ICCV.2019.00238
Xiu, Y., Yang, J., Cao, X., Tzionas, D., Black, M.J.: ECON: explicit clothed humans optimized via normal integration. In: CVPR, pp. 512–523 (2023). https://doi.org/10.1109/CVPR52729.2023.00057
Corona, E., Hodan, T., Vo, M., Moreno-Noguer, F., Sweeney, C., Newcombe, R.A., Ma, L.: LISA: learning implicit shape and appearance of hands. In: CVPR, pp. 20501–20511 (2022). https://doi.org/10.1109/CVPR52688.2022.01988
Chen, X., Wang, B., Shum, H.: Hand Avatar: free-pose hand animation and rendering from monocular video. In: CVPR, pp. 8683–8693 (2023). https://doi.org/10.1109/CVPR52729.2023.00839
Chen, Z., Moon, G., Guo, K., Cao, C., Pidhorskyi, S., Simon, T., Joshi, R., Dong, Y., Xu, Y., Pires, B., Wen, H., Evans, L., Peng, B., Buffalini, J., Trimble, A., McPhail, K., Schoeller, M., Yu, S.-I., Romero, J., Zollhöfer, M., Sheikh, Y., Liu, Z., Saito, S.: URHand: universal relightable hands. In: CVPR (2024)
Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable Gaussian codec avatars. In: CVPR (2024)
Bi, S., Lombardi, S., Saito, S., Simon, T., Wei, S., McPhail, K., Ramamoorthi, R., Sheikh, Y., Saragih, J.M.: Deep relightable appearance models for animatable faces. ACM Trans. Graph. 40(4), 89–18915 (2021). https://doi.org/10.1145/3450626.3459829
Li, X., Sheng, B., Li, P., Kim, J., Feng, D.D.: Voxelized facial reconstruction using deep neural network. In: CGI, pp. 1–4 (2018). https://doi.org/10.1145/3208159.3208170
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P.V., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: ECCV, pp. 561–578 (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR, pp. 7122–7131 (2018).https://doi.org/10.1109/CVPR.2018.00744
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: CVPR, pp. 5746–5756 (2021). https://doi.org/10.1109/CVPR46437.2021.00569
Wang, L., Zhao, X., Yu, T., Wang, S., Liu, Y.: NormalGAN: learning detailed 3D human from a single RGB-D image. In: ECCV, vol. 12365, pp. 430–446 (2020). https://doi.org/10.1007/978-3-030-58565-5_26
Tian, Y., Zhang, H., Liu, Y., Wang, L.: Recovering 3D human mesh from monocular images: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(12), 15406–15425 (2023). https://doi.org/10.1109/TPAMI.2023.3298850
Chen, L., Peng, S., Zhou, X.: Towards efficient and photorealistic 3D human reconstruction: a brief survey. Vis. Inform. 5(4), 11–19 (2021). https://doi.org/10.1016/J.VISINF.2021.10.003
Sun, M., Yang, D., Kou, D., Jiang, Y., Shan, W., Yan, Z., Zhang, L.: Human 3D avatar modeling with implicit neural representation: a brief survey. In: 2022 14th International Conference on Signal Processing Systems (ICSPS), pp. 818–827. IEEE (2022)
Ma, Q., Saito, S., Yang, J., Tang, S., Black, M.J.: SCALE: modeling clothed humans with a surface codec of articulated local elements. In: CVPR, pp. 16082–16093 (2021). https://doi.org/10.1109/CVPR46437.2021.01582
Ma, Q., Yang, J., Tang, S., Black, M.J.: The power of points for modeling humans in clothing. In: ICCV, pp. 10954–10964 (2021). https://doi.org/10.1109/ICCV48922.2021.01079
Manfredi, G., Capece, N., Erra, U., Gilio, G., Baldi, V., Domenico, S.G.D.: TryItOn: a virtual dressing room with motion tracking and physically based garment simulation. In: XR, vol. 13445, pp. 63–76 (2022). https://doi.org/10.1007/978-3-031-15546-8_5
Fan, T., Yang, B., Bao, C., Wang, L., Zhang, G., Cui, Z.: HybridAvatar: efficient mesh-based human avatar generation from few-shot monocular images with implicit mesh displacement. In: IEEE International Symposium on Mixed and Augmented Reality Adjunct, ISMAR 2023, Sydney, Australia, October 16–20, 2023, pp. 371–376 (2023).https://doi.org/10.1109/ISMAR-ADJUNCT60411.2023.00080
Varol, G., Ceylan, D., Russell, B.C., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: ECCV, pp. 20–38 (2018). https://doi.org/10.1007/978-3-030-01234-2_2
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: ICCV, pp. 7738–7748 (2019).https://doi.org/10.1109/ICCV.2019.00783
Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: ICCV, pp. 7749–7758 (2019). https://doi.org/10.1109/ICCV.2019.00784
Smith, D., Loper, M., Hu, X., Mavroidis, P., Romero, J.: FACSIMILE: fast and accurate scans from an image in less than a second. In: ICCV, pp. 5329–5338 (2019). https://doi.org/10.1109/ICCV.2019.00543
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 139–113914 (2023). https://doi.org/10.1145/3592433
Park, J.J., Florence, P.R., Straub, J., Newcombe, R.A., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019). https://doi.org/10.1109/CVPR.2019.00025
Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR, pp. 4460–4470 (2019). https://doi.org/10.1109/CVPR.2019.00459
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV, pp. 405–421 (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Tewari, A., Thies, J., Mildenhall, B., Srinivasan, P.P., Tretschk, E., Wang, Y., Lassner, C., Sitzmann, V., Martin-Brualla, R., Lombardi, S., Simon, T., Theobalt, C., Nießner, M., Barron, J.T., Wetzstein, G., Zollhöfer, M., Golyanik, V.: Advances in neural rendering. Comput. Graph. Forum 41(2), 703–735 (2022). https://doi.org/10.1111/CGF.14507
Pfister, H., Zwicker, M., Baar, J., Gross, M.H.: Surfels: surface elements as rendering primitives. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, pp. 335–342 (2000). https://doi.org/10.1145/344779.344936
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005). https://doi.org/10.1145/1073204.1073207
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: CVPR, pp. 6183–6192 (2020). https://doi.org/10.1109/CVPR42600.2020.00622
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: ECCV, vol. 12351, pp. 598–613 (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3170–3184 (2022). https://doi.org/10.1109/TPAMI.2021.3050505
Hong, F., Chen, Z., Lan, Y., Pan, L., Liu, Z.: EVA3D: compositional 3D human generation from 2D image collections. In: ICLR (2023)
Dong, Z., Chen, X., Yang, J., Black, M.J., Hilliges, O., Geiger, A.: AG3D: learning to generate 3D avatars from 2D image collections. In: ICCV, pp. 14870–14881 (2023). https://doi.org/10.1109/ICCV51070.2023.01370
Huang, Y., Yi, H., Xiu, Y., Liao, T., Tang, J., Cai, D., Thies, J.: TeCH: text-guided reconstruction of lifelike clothed humans. In: 3DV (2024)
Albahar, B., Saito, S., Tseng, H., Kim, C., Kopf, J., Huang, J.: Single-image 3D human digitization with shape-guided diffusion. In: SIGGRAPH Asia 2023 Conference Papers, pp. 62–16211 (2023). https://doi.org/10.1145/3610548.3618153
Yao, J., Chen, J., Niu, L., Sheng, B.: Scene-aware human pose generation using transformer. In: MM, pp. 2847–2855 (2023). https://doi.org/10.1145/3581783.3612439
Kamel, A., Liu, B., Li, P., Sheng, B.: An investigation of 3D human pose estimation for learning Tai Chi: a human factor perspective. Int. J. Hum. Comput. Interact. 35(4–5), 427–439 (2019). https://doi.org/10.1080/10447318.2018.1543081
Kamel, A., Sheng, B., Li, P., Kim, J., Feng, D.D.: Efficient body motion quantification and similarity evaluation using 3-D joints skeleton coordinates. IEEE Trans. Syst. Man Cybern. Syst. 51(5), 2774–2788 (2021). https://doi.org/10.1109/TSMC.2019.2916896
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6), 194–119417 (2017). https://doi.org/10.1145/3130800.3130813
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245–124517 (2017). https://doi.org/10.1145/3130800.3130883
Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019). https://doi.org/10.1109/CVPR.2019.01123
Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: CVPR, pp. 4491–4500 (2019). https://doi.org/10.1109/CVPR.2019.00462
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-Degree textures of people in clothing from a single image. In: 3DV, pp. 643–653 (2019). https://doi.org/10.1109/3DV.2019.00076
Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M.J.: Learning to dress 3D people in generative clothing. In: CVPR, pp. 6468–6477 (2020). https://doi.org/10.1109/CVPR42600.2020.00650
Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-Garment Net: learning to dress 3D people from images. In: ICCV, pp. 5419–5429 (2019). https://doi.org/10.1109/ICCV.2019.00552
Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: BCNet: learning body and cloth shape from a single image. In: ECCV, vol. 12365, pp. 18–35 (2020). https://doi.org/10.1007/978-3-030-58565-5_2
Patel, C., Liao, Z., Pons-Moll, G.: TailorNet: predicting clothing in 3D as a function of human pose, shape and garment style. In: CVPR, pp. 7363–7373 (2020).https://doi.org/10.1109/CVPR42600.2020.00739
Corona, E., Pumarola, A., Alenyà, G., Pons-Moll, G., Moreno-Noguer, F.: SMPLicit: topology-aware generative model for clothed people. In: CVPR, pp. 11875–11885 (2021). https://doi.org/10.1109/CVPR46437.2021.01170
Luigi, L.D., Li, R., Guillard, B., Salzmann, M., Fua, P.: DrapeNet: garment generation and self-supervised draping. In: CVPR, pp. 1451–1460 (2023). https://doi.org/10.1109/CVPR52729.2023.00146
Mikić, I., Trivedi, M., Hunter, E., Cosman, P.: Human body model acquisition and tracking using voxel data. Int. J. Comput. Vis. 53, 199–223 (2003)
Gilbert, A., Volino, M., Collomosse, J.P., Hilton, A.: Volumetric performance capture from minimal camera viewpoints. In: ECCV, vol. 11215, pp. 591–607 (2018). https://doi.org/10.1007/978-3-030-01252-6_35
Stoll, C., Hasler, N., Gall, J., Seidel, H., Theobalt, C.: Fast articulated motion tracking using a sums of Gaussians body model. In: ICCV, pp. 951–958 (2011).https://doi.org/10.1109/ICCV.2011.6126338
Robertini, N., Casas, D., Rhodin, H., Seidel, H., Theobalt, C.: Model-based outdoor performance capture. In: 3DV, pp. 166–175 (2016). https://doi.org/10.1109/3DV.2016.25
Chen, G., Wang, W.: A survey on 3D Gaussian splatting (2024). arXiv preprint arXiv:2401.03890
Bai, S., Li, J.: Progress and prospects in 3D generative AI: a technical overview including 3D human (2024). arXiv preprint arXiv:2401.02620
Wu, T., Yuan, Y.-J., Zhang, L.-X., Yang, J., Cao, Y.-P., Yan, L.-Q., Gao, L.: Recent advances in 3D Gaussian Splatting. Comput. Vis. Media (2024). https://doi.org/10.1007/s41095-024-0436-y
Xu, Z., Peng, S., Lin, H., He, G., Sun, J., Shen, Y., Bao, H., Zhou, X.: 4K4D: real-time 4D view synthesis at 4K resolution. In: CVPR (2024)
Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Xinggang, W.: 4D Gaussian splatting for real-time dynamic scene rendering. In: CVPR (2024)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D Gaussians for high-fidelity monocular dynamic scene reconstruction. In: CVPR (2024)
Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2D Gaussian splatting for geometrically accurate radiance fields. In: ACM SIGGRAPH 2024 Conference Papers, SIGGRAPH 2024, Denver, CO, USA, 27 July 2024–1 August 2024, pp. 32 (2024). https://doi.org/10.1145/3641519.3657428
Guédon, A., Lepetit, V.: Sugar: Surface-aligned gaussian splatting for efficient 3D mesh reconstruction and high-quality mesh rendering. In: CVPR (2024)
Chen, H., Li, C., Lee, G.H.: NeuSG: neural implicit surface reconstruction with 3D Gaussian splatting guidance (2023). arXiv preprint arXiv:2312.00846
Chen, Z., Wang, F., Liu, H.: Text-to-3D using Gaussian splatting (2023). arXiv preprint arXiv:2309.16585
Li, X., Wang, H., Tseng, K.-K.: GaussianDiffusion: 3D Gaussian splatting for denoising diffusion probabilistic models with structured noise (2023). arXiv preprint arXiv:2311.11221
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative Gaussian splatting for efficient 3D content creation (2023). arXiv preprint arXiv:2309.16653
Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3D Gaussian avatars (2023). arXiv preprint arXiv:2311.13404
Shao, Z., Wang, Z., Li, Z., Wang, D., Lin, X., Zhang, Y., Fan, M., Wang, Z.: SplattingAvatar: realistic real-time human avatars with mesh-embedded Gaussian splatting. In: CVPR (2024)
Liu, X., Wu, C., Liu, J., Liu, X., Zhao, C., Feng, H., Ding, E., Wang, J.: GVA: reconstructing Vivid 3D Gaussian avatars from monocular videos. Arxiv (2024)
Svitov, D., Morerio, P., Agapito, L., Del Bue, A.: HAHA: highly articulated Gaussian human avatars with textured mesh prior (2024). arXiv preprint arXiv:2404.01053
Wen, J., Zhao, X., Ren, Z., Schwing, A., Wang, S.: GoMAvatar: efficient animatable human modeling from monocular video using Gaussians-on-mesh. In: CVPR (2024)
Jiang, Y., Liao, Q., Li, X., Ma, L., Zhang, Q., Zhang, C., Lu, Z., Shan, Y.: UV Gaussians: joint learning of mesh deformation and gaussian textures for human avatar modeling (2024). arXiv preprint arXiv:2403.11589
Liu, X., Zhan, X., Tang, J., Shan, Y., Zeng, G., Lin, D., Liu, X., Liu, Z.: HumanGaussian: text-driven 3D human generation with Gaussian splatting. In: CVPR (2024)
Abdal, R., Yifan, W., Shi, Z., Xu, Y., Po, R., Kuang, Z., Chen, Q., Yeung, D.-Y., Wetzstein, G.: Gaussian shell maps for efficient 3D human generation. In: CVPR (2024)
Cheng, W., Chen, R., Fan, S., Yin, W., Chen, K., Cai, Z., Wang, J., Gao, Y., Yu, Z., Lin, Z., Ren, D., Yang, L., Liu, Z., Loy, C.C., Qian, C., Wu, W., Lin, D., Dai, B., Lin, K.: DNA-rendering: a diverse neural actor repository for high-fidelity human-centric rendering. In: ICCV, pp. 19925–19936 (2023). https://doi.org/10.1109/ICCV51070.2023.01829
Bonopera, S., Hedman, P., Esnault, J., Prakash, S., Rodriguez, S., Thonat, T., Benadel, M., Chaurasia, G., Philip, J., Drettakis, G.: SIBR: a system for image based rendering (2020). https://sibr.gitlabpages.inria.fr/
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987, pp. 163–169 (1987). https://doi.org/10.1145/37401.37422
Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: CVPR, pp. 1496–1505 (2022). https://doi.org/10.1109/CVPR52688.2022.00156
Corona, E., Zanfir, M., Alldieck, T., Bazavan, E.G., Zanfir, A., Sminchisescu, C.: Structured 3D features for reconstructing controllable avatars. In: CVPR, pp. 16954–16964 (2023). https://doi.org/10.1109/CVPR52729.2023.01626
Lin, L., Zhu, J.: Topology-preserved human reconstruction with details. Vis. Comput. 39(8), 3609–3619 (2023). https://doi.org/10.1007/S00371-023-02957-0
Hu, S., Hong, F., Pan, L., Mei, H., Yang, L., Liu, Z.: SHERF: generalizable human nerf from a single image. In: ICCV, pp. 9318–9330 (2023). https://doi.org/10.1109/ICCV51070.2023.00858
Huang, Y., Yi, H., Liu, W., Wang, H., Wu, B., Wang, W., Lin, B., Zhang, D., Cai, D.: One-shot implicit animatable avatars with model-based priors. In: ICCV, pp. 8940–8951 (2023). https://doi.org/10.1109/ICCV51070.2023.00824
Alldieck, T., Magnor, M.A., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: CVPR, pp. 8387–8397 (2018). https://doi.org/10.1109/CVPR.2018.00875
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020). https://doi.org/10.1145/3422622
Zhu, H., Qiu, L., Qiu, Y., Han, X.: Registering explicit to implicit: towards high-fidelity garment mesh reconstruction from single images. In: CVPR, pp. 3835–3844 (2022). https://doi.org/10.1109/CVPR52688.2022.00382
Cao, X., Santo, H., Shi, B., Okura, F., Matsushita, Y.: Bilateral normal integration. In: ECCV 13661, 552–567 (2022). https://doi.org/10.1007/978-3-031-19769-7_32
Han, S., Park, M., Yoon, J.H., Kang, J., Park, Y., Jeon, H.: High-fidelity 3D human digitization from single 2K resolution images. In: CVPR, pp. 12869–12879 (2023).https://doi.org/10.1109/CVPR52729.2023.01237
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: ARCH: animatable reconstruction of clothed humans. In: CVPR, pp. 3090–3099 (2020). https://doi.org/10.1109/CVPR42600.2020.00316
He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: ARCH++: animation-ready clothed human reconstruction revisited. In: ICCV, pp. 11026–11036 (2021). https://doi.org/10.1109/ICCV48922.2021.01086
Liao, T., Zhang, X., Xiu, Y., Yi, H., Liu, X., Qi, G., Zhang, Y., Wang, X., Zhu, X., Lei, Z.: High-fidelity clothed avatar reconstruction from a single image. In: CVPR, pp. 8662–8672 (2023). https://doi.org/10.1109/CVPR52729.2023.00837
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: ICLR (2023)
Chen, M., Chen, J., Ye, X., Gao, H.-a., Chen, X., Fan, Z., Zhao, H.: Ultraman: single image 3D human reconstruction with ultra speed and detail. arXiv preprint arXiv:2403.12028 (2024)
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
Moon, G., Nam, H., Shiratori, T., Lee, K.M.: 3D clothed human reconstruction in the wild. In: ECCV, vol. 13662, pp. 184–200 (2022). https://doi.org/10.1007/978-3-031-20086-1_11
Gabeur, V., Franco, J., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3D human shape estimation from single images. In: ICCV, pp. 2232–2241 (2019).https://doi.org/10.1109/ICCV.2019.00232
Kazhdan, M.M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of the Fourth Eurographics Symposium on Geometry Processing, Cagliari, Sardinia, Italy, June 26–28, 2006. ACM International Conference Proceeding Series, vol. 256, pp. 61–70 (2006). https://doi.org/10.2312/SGP/SGP06/061-070
Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3D shape reconstruction and completion. In: CVPR, pp. 6968–6979 (2020). https://doi.org/10.1109/CVPR42600.2020.00700
Kazhdan, M.M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 29–12913 (2013). https://doi.org/10.1145/2487228.2487237
Gao, J., Chen, W., Xiang, T., Jacobson, A., McGuire, M., Fidler, S.: Learning deformable tetrahedral meshes for 3D reconstruction. In: NeurIPS (2020)
Shen, T., Gao, J., Yin, K., Liu, M., Fidler, S.: Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. In: NeurIPS, pp. 6087–6101 (2021)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10674–10685 (2022). https://doi.org/10.1109/CVPR52688.2022.01042
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV, pp. 3813–3824 (2023). https://doi.org/10.1109/ICCV51070.2023.00355
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR, pp. 22500–22510 (2023). https://doi.org/10.1109/CVPR52729.2023.02155
Li, J., Li, D., Xiong, C., Hoi, S.C.H.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: ICML, vol. 162, pp. 12888–12900 (2022)
Xiu, Y., Ye, Y., Liu, Z., Tzionas, D., Black, M.J.: PuzzleAvatar: assembling 3D avatars from personal albums (2024). arXiv preprint arXiv:2405.14869
Gao, X., Li, X., Zhang, C., Zhang, Q., Cao, Y., Shan, Y., Quan, L.: ConTex-Human: free-view rendering of human from a single image with texture-consistent synthesis (2023). arXiv preprint arXiv:2311.17123
Liu, R., Wu, R., Hoorick, B.V., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: Zero-shot one image to 3D object. In: ICCV, pp. 9264–9275 (2023). https://doi.org/10.1109/ICCV51070.2023.00853
He, T., Collomosse, J.P., Jin, H., Soatto, S.: Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. In: NeurIPS (2020)
Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR, pp. 8798–8807 (2018). https://doi.org/10.1109/CVPR.2018.00917
Yang, X., Luo, Y., Xiu, Y., Wang, W., Xu, H., Fan, Z.: D-IF: uncertainty-aware human digitization via implicit distribution field. In: ICCV, pp. 9088–9098 (2023). https://doi.org/10.1109/ICCV51070.2023.00837
Cao, Y., Han, K., Wong, K.K.: SeSDF: self-evolved signed distance field for implicit 3D clothed human reconstruction. In: CVPR, pp. 4647–4657 (2023). https://doi.org/10.1109/CVPR52729.2023.00451
Song, D., Lee, H., Seo, J., Cho, D.: DIFu: depth-guided implicit function for clothed human reconstruction. In: CVPR, pp. 8738–8747 (2023). https://doi.org/10.1109/CVPR52729.2023.00844
Zhang, Z., Sun, L., Yang, Z., Chen, L., Yang, Y.: Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. In: NeurIPS (2023)
Choi, H., Moon, G., Armando, M., Leroy, V., Lee, K.M., Rogez, G.: MonoNHR: monocular neural human renderer. In: 3DV, pp. 242–251 (2022). https://doi.org/10.1109/3DV57658.2022.00036
Weng, Z., Liu, J., Tan, H., Xu, Z., Zhou, Y., Yeung-Levy, S., Yang, J.: Single-view 3D human digitalization with large reconstruction models. arXiv preprint arXiv:2401.12175 (2024)
Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: large reconstruction model for single image to 3D (2023). arXiv preprint arXiv:2311.04400
Xu, X., Loy, C.C.: 3D human texture estimation from a single image with transformers. In: ICCV, pp. 13829–13838 (2021). https://doi.org/10.1109/ICCV48922.2021.01359
Svitov, D., Gudkov, D., Bashirov, R., Lempitsky, V.: DINAR: diffusion inpainting of neural textures for one-shot human avatars. In: ICCV, pp. 7039–7049 (2023). https://doi.org/10.1109/ICCV51070.2023.00650
Zhan, X., Yang, J., Li, Y., Guo, J., Guo, Y., Wang, W.: Semantic human mesh reconstruction with textures (2024). arXiv preprint arXiv:2403.02561
Zhang, J., Li, X., Zhang, Q., Cao, Y., Shan, Y., Liao, J.: HumanRef: single image to 3D human generation via reference-guided diffusion. arXiv preprint arXiv:2311.16961 (2023)
Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., Morishima, S.: SiCloPe: silhouette-based clothed people. In: CVPR, pp. 4480–4490 (2019). https://doi.org/10.1109/CVPR.2019.00461
Sengupta, A., Alldieck, T., Kolotouros, N., Corona, E., Zanfir, A., Sminchisescu, C.: DiffHuman: probabilistic photorealistic 3D reconstruction of humans (2024). arXiv preprint arXiv:2404.00485
Wang, J., Zhong, Y., Li, Y., Zhang, C., Wei, Y.: Re-identification supervised texture generation. In: CVPR, pp. 11846–11856 (2019). https://doi.org/10.1109/CVPR.2019.01212
Xu, X., Chen, H., Moreno-Noguer, F., Jeni, L.A., Torre, F.D.: 3D human pose, shape and texture from low-resolution images and videos. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4490–4504 (2022). https://doi.org/10.1109/TPAMI.2021.3070002
Altindis, S.F., Meric, A., Dalva, Y., Gudukbay, U., Dundar, A.: Refining 3D human texture estimation from a single image (2023). arXiv preprint arXiv:2303.03471
Fang, Q., Shuai, Q., Dong, J., Bao, H., Zhou, X.: Reconstructing 3D human pose by watching humans in the mirror. In: CVPR, pp. 12814–12823 (2021). https://doi.org/10.1109/CVPR46437.2021.01262
Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR, pp. 9054–9063 (2021). https://doi.org/10.1109/CVPR46437.2021.00894
Xu, W., Chatterjee, A., Zollhöfer, M., Rhodin, H., Mehta, D., Seidel, H., Theobalt, C.: MonoPerfCap: human performance capture from monocular video. ACM Trans. Graph. 37(2), 27 (2018). https://doi.org/10.1145/3181973
Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 14–11417 (2019). https://doi.org/10.1145/3311970
Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: DeepCap: monocular human performance capture using weak supervision. In: CVPR, pp. 5051–5062 (2020)
Alldieck, T., Magnor, M.A., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: 3DV, pp. 98–109 (2018). https://doi.org/10.1109/3DV.2018.00022
Jiang, B., Hong, Y., Bao, H., Zhang, J.: SelfRecon: self reconstruction your digital avatar from monocular video. In: CVPR, pp. 5595–5605 (2022). https://doi.org/10.1109/CVPR52688.2022.00552
Peng, S., Dong, J., Wang, Q., Zhang, S., Shuai, Q., Zhou, X., Bao, H.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV, pp. 14294–14303 (2021). https://doi.org/10.1109/ICCV48922.2021.01405
Chen, J., Zhang, Y., Kang, D., Zhe, X., Bao, L., Jia, X., Lu, H.: Animatable neural radiance fields from monocular RGB videos (2021). arXiv preprint arXiv:2106.13629
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068
Li, R., Tanke, J., Vo, M., Zollhöfer, M., Gall, J., Kanazawa, A., Lassner, C.: TAVA: template-free animatable volumetric actors. In: ECCV, vol. 13692, pp. 419–436 (2022). https://doi.org/10.1007/978-3-031-19824-3_25
Jiang, W., Yi, K.M., Samei, G., Tuzel, O., Ranjan, A.: NeuMan: Neural human radiance field from a single video. In: ECCV, vol. 13692, pp. 402–418 (2022).https://doi.org/10.1007/978-3-031-19824-3_24
Yu, Z., Cheng, W., Liu, X., Wu, W., Lin, K.: MonoHuman: animatable human neural field from monocular video. In: CVPR, pp. 16943–16953 (2023).https://doi.org/10.1109/CVPR52729.2023.01625
Wang, S., Schwarz, K., Geiger, A., Tang, S.: ARAH: animatable volume rendering of articulated human SDFs. In: ECCV, vol. 13692, pp. 1–19 (2022). https://doi.org/10.1007/978-3-031-19824-3_1
Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric regularization for learning shapes. In: ICML. Proceedings of Machine Learning Research, vol. 119, pp. 3789–3799 (2020)
Jiang, T., Chen, X., Song, J., Hilliges, O.: InstantAvatar: learning avatars from monocular video in 60 seconds. In: CVPR, pp. 16922–16932 (2023).https://doi.org/10.1109/CVPR52729.2023.01623
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution Hash encoding. ACM Trans. Graph. 41(4), 102–110215 (2022). https://doi.org/10.1145/3528223.3530127
Feng, Y., Yang, J., Pollefeys, M., Black, M.J., Bolkart, T.: Capturing and animation of body and clothing from monocular video. In: SIGGRAPH Asia 2022 Conference Papers, pp. 45–1459 (2022). https://doi.org/10.1145/3550469.3555423
Zheng, Z., Huang, H., Yu, T., Zhang, H., Guo, Y., Liu, Y.: Structured local radiance fields for human avatar modeling. In: CVPR, pp. 15872–15882 (2022). https://doi.org/10.1109/CVPR52688.2022.01543
Su, S., Yu, F., Zollhöfer, M., Rhodin, H.: A-NeRF: articulated neural radiance fields for learning human shape, appearance, and pose. In: NeurIPS, pp. 12278–12291 (2021)
Xu, T., Fujita, Y., Matsumoto, E.: Surface-aligned neural radiance fields for controllable 3D human synthesis. In: CVPR, pp. 15862–15871 (2022). https://doi.org/10.1109/CVPR52688.2022.01542
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C.: Neural actor: neural free-view synthesis of human actors with pose control. ACM Trans. Graph. 40(6), 219–121916 (2021). https://doi.org/10.1145/3478513.3480528
Chen, Y., Wang, X., Chen, X., Zhang, Q., Li, X., Guo, Y., Wang, J., Wang, F.: UV volumes for real-time rendering of editable free-view human performance. In: CVPR, pp. 16621–16631 (2023). https://doi.org/10.1109/CVPR52729.2023.01595
Li, Z., Zheng, Z., Wang, L., Liu, Y.: Animatable Gaussians: learning pose-dependent gaussian maps for high-fidelity human avatar modeling. In: CVPR (2024)
Lei, J., Wang, Y., Pavlakos, G., Liu, L., Daniilidis, K.: GART: Gaussian articulated template models. In: CVPR (2024)
Kocabas, M., Chang, J.-H.R., Gabriel, J., Tuzel, O., Ranjan, A.: HUGS: human Gaussian splats. In: CVPR (2024)
Hu, L., Zhang, H., Zhang, Y., Zhou, B., Liu, B., Zhang, S., Nie, L.: GaussianAvatar: towards realistic human avatar modeling from a single video via animatable 3D Gaussians. In: CVPR (2024)
Pang, H., Zhu, H., Kortylewski, A., Theobalt, C., Habermann, M.: ASH: animatable Gaussian splats for efficient and photoreal human rendering. In: CVPR (2024)
Guo, C., Jiang, T., Chen, X., Song, J., Hilliges, O.: Vid2Avatar: 3D avatar reconstruction from videos in the wild via self-supervised scene decomposition. In: CVPR, pp. 12858–12868 (2023). https://doi.org/10.1109/CVPR52729.2023.01236
Feng, Y., Liu, W., Bolkart, T., Yang, J., Pollefeys, M., Black, M.J.: Learning disentangled avatars with hybrid 3D representations. arXiv (2023)
Wang, K., Zhang, G., Cong, S., Yang, J.: Clothed human performance capture with a double-layer neural radiance fields. In: CVPR, pp. 21098–21107 (2023). https://doi.org/10.1109/CVPR52729.2023.02021
Chen, M., Zhang, J., Xu, X., Liu, L., Cai, Y., Feng, J., Yan, S.: Geometry-guided progressive nerf for generalizable and efficient neural human rendering. In: ECCV, vol. 13683, pp. 222–239 (2022). https://doi.org/10.1007/978-3-031-20050-2_14
Peng, B., Hu, J., Zhou, J., Zhang, J.: SelfNeRF: fast training NeRF for human from monocular self-rotating video (2022). arXiv preprint arXiv:2210.01651
Geng, C., Peng, S., Xu, Z., Bao, H., Zhou, X.: Learning neural volumetric representations of dynamic humans in minutes. In: CVPR, pp. 8759–8770 (2023).https://doi.org/10.1109/CVPR52729.2023.00846
Kwon, Y., Kim, D., Ceylan, D., Fuchs, H.: Neural human performer: learning generalizable radiance fields for human performance rendering. In: NeurIPS, pp. 24741–24752 (2021)
Li, C., Lin, J., Lee, G.H.: GHuNeRF: generalizable human NeRF from a monocular video (2023). arXiv preprint arXiv:2308.16576
Chen, X., Zheng, Y., Black, M.J., Hilliges, O., Geiger, A.: SNARF: differentiable forward skinning for animating non-rigid neural implicit shapes. In: ICCV, pp. 11574–11584 (2021). https://doi.org/10.1109/ICCV48922.2021.01139
Chen, X., Jiang, T., Song, J., Rietmann, M., Geiger, A., Black, M.J., Hilliges, O.: Fast-SNARF: a fast deformer for articulated neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 45(10), 11796–11809 (2023). https://doi.org/10.1109/TPAMI.2023.3271569
Zhi, Y., Qian, S., Yan, X., Gao, S.: Dual-space NeRF: learning animatable avatars and scene lighting in separate spaces. In: 3DV, pp. 1–10 (2022). https://doi.org/10.1109/3DV57658.2022.00048
Mu, J., Sang, S., Vasconcelos, N., Wang, X.: ActorsNeRF: animatable few-shot human rendering with generalizable NeRFs. In: ICCV, pp. 18345–18355 (2023). https://doi.org/10.1109/ICCV51070.2023.01686
Noguchi, A., Sun, X., Lin, S., Harada, T.: Neural articulated radiance field. In: ICCV, pp. 5742–5752 (2021). https://doi.org/10.1109/ICCV48922.2021.00571
Te, G., Li, X., Li, X., Wang, J., Hu, W., Lu, Y.: Neural capture of animatable 3D human from monocular video. In: ECCV, vol. 13666, pp. 275–291 (2022). https://doi.org/10.1007/978-3-031-20068-7_16
Su, S., Bagautdinov, T.M., Rhodin, H.: DANBO: disentangled articulated neural body representations via graph neural networks. In: ECCV, vol. 13662, pp. 107–124 (2022).https://doi.org/10.1007/978-3-031-20086-1_7
Zhang, R., Chen, J.: NDF: neural deformable fields for dynamic human modelling. In: ECCV, vol. 13692, pp. 37–52 (2022).https://doi.org/10.1007/978-3-031-19824-3_3
Li, M., Tao, J., Yang, Z., Yang, Y.: Human101: Training 100+FPS human Gaussians in 100s from 1 view (2023). arXiv preprint arXiv:2312.15258
Moreau, A., Song, J., Dhamo, H., Shaw, R., Zhou, Y., Pérez-Pellitero, E.: Human Gaussian splatting: real-time rendering of animatable avatars (2023). arXiv preprint arXiv:2311.17113
Qian, Z., Wang, S., Mihajlovic, M., Geiger, A., Tang, S.: 3DGS-Avatar: animatable avatars via deformable 3D Gaussian splatting. In: CVPR (2024)
Li, M., Yao, S., Xie, Z., Chen, K., Jiang, Y.-G.: GaussianBody: clothed human reconstruction via 3D Gaussian splatting (2024). arXiv preprint arXiv:2401.09720
Jung, H., Brasch, N., Song, J., Perez-Pellitero, E., Zhou, Y., Li, Z., Navab, N., Busam, B.: Deformable 3D Gaussian splatting for animatable human avatars (2023). arXiv preprint arXiv:2312.15059
Jena, R., Iyer, G.S., Choudhary, S., Smith, B., Chaudhari, P., Gee, J.: SplatArmor: articulated Gaussian splatting for animatable humans from monocular RGB videos (2023). arXiv preprint arXiv:2311.10812
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern. Syst. 49(9), 1806–1819 (2019). https://doi.org/10.1109/TSMC.2018.2850149
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B.C., Matthews, I.A., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV, pp. 3334–3342 (2015). https://doi.org/10.1109/ICCV.2015.381
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV, pp. 506–516 (2017). https://doi.org/10.1109/3DV.2017.00064
Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: ECCV, vol. 11214, pp. 614–631 (2018). https://doi.org/10.1007/978-3-030-01249-6_37
Tsuchida, S., Fukayama, S., Hamasaki, M., Goto, M.: AIST dance video database: multi-genre, multi-dancer, and multi-camera database for dance information processing. In: ISMIR, pp. 501–510 (2019)
Li, R., Yang, S., Ross, D.A., Kanazawa, A.: AI choreographer: music conditioned 3D dance generation with AIST++. In: ICCV, pp. 13381–13392 (2021).https://doi.org/10.1109/ICCV48922.2021.01315
Isik, M., Rünz, M., Georgopoulos, M., Khakhulin, T., Starck, J., Agapito, L., Nießner, M.: HumanRF: high-fidelity neural radiance fields for humans in motion. ACM Trans. Graph. 42(4), 160–116012 (2023). https://doi.org/10.1145/3592415
Cai, Z., Ren, D., Zeng, A., Lin, Z., Yu, T., Wang, W., Fan, X., Gao, Y., Yu, Y., Pan, L., Hong, F., Zhang, M., Loy, C.C., Yang, L., Liu, Z.: HuMMan: multi-modal 4D human dataset for versatile sensing and modeling. In: ECCV, vol. 13667, pp. 557–577 (2022). https://doi.org/10.1007/978-3-031-20071-7_33
Cheng, W., Xu, S., Piao, J., Qian, C., Wu, W., Lin, K.-Y., Li, H.: Generalizable neural performer: learning robust radiance fields for human novel view synthesis (2022). arXiv preprint arXiv:2204.11798
Xiong, Z., Li, C., Liu, K., Liao, H., Hu, J., Zhu, J., Ning, S., Qiu, L., Wang, C., Wang, S., et al.: MVHumanNet: a large-scale dataset of multi-view daily dressing human captures (2023). arXiv preprint arXiv:2312.02963
Zhang, C., Pujades, S., Black, M.J., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: CVPR, pp. 5484–5493 (2017).https://doi.org/10.1109/CVPR.2017.582
Su, Z., Yu, T., Wang, Y., Liu, Y.: DeepCloth: neural garment representation for shape and style editing. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1581–1593 (2023). https://doi.org/10.1109/TPAMI.2022.3168569
Habermann, M., Liu, L., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: Real-time deep dynamic characters. ACM Trans. Graph. 40(4), 94–19416 (2021). https://doi.org/10.1145/3450626.3459749
Yu, Z., Yoon, J.S., Lee, I.K., Venkatesh, P., Park, J., Yu, J., Park, H.S.: HUMBI: a large multiview dataset of human body expressions. In: CVPR, pp. 2987–2997 (2020). https://doi.org/10.1109/CVPR42600.2020.00306
Yoon, J.S., Yu, Z., Park, J., Park, H.S.: HUMBI: a large multiview dataset of human body expressions and benchmark challenge. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 623–640 (2023). https://doi.org/10.1109/TPAMI.2021.3138762
Over 4,000 Scanned 3D People Models. https://renderpeople.com/
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Zheng, Z., Zhao, X., Zhang, H., Liu, B., Liu, Y.: AvatarReX: real-time expressive full-body avatars. ACM Trans. Graph. 42(4), 158–115819 (2023). https://doi.org/10.1145/3592101
Dong, J., Fang, Q., Guo, Y., Peng, S., Shuai, Q., Zhou, X., Bao, H.: TotalSelfScan: learning full-body avatars from self-portrait videos of faces, hands, and bodies. In: NeurIPS (2022)
Yu, T., Zheng, Z., Guo, K., Zhao, J., Dai, Q., Li, H., Pons-Moll, G., Liu, Y.: DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor. In: CVPR, pp. 7287–7296 (2018). https://doi.org/10.1109/CVPR.2018.00761
Lin, S., Li, Z., Su, Z., Zheng, Z., Zhang, H., Liu, Y.: LayGA: layered Gaussian avatars for animatable clothing transfer (2024). arXiv preprint arXiv:2405.07319
Khirodkar, R., Tripathi, S., Kitani, K.: Occluded human mesh recovery. In: CVPR, pp. 1705–1715 (2022). https://doi.org/10.1109/CVPR52688.2022.00176
Wang, J., Yoon, J.S., Wang, T.Y., Singh, K.K., Neumann, U.: Complete 3D human reconstruction from a single incomplete image. In: CVPR, pp. 8748–8758 (2023). https://doi.org/10.1109/CVPR52729.2023.00845
Xiang, T., Sun, A., Wu, J., Adeli, E., Fei-Fei, L.: Rendering Humans from object-occluded monocular videos. In: ICCV, pp. 3216–3227 (2023). https://doi.org/10.1109/ICCV51070.2023.00300
Ye, J., Zhang, Z., Jiang, Y., Liao, Q., Yang, W., Lu, Z.: OccGaussian: 3D Gaussian splatting for occluded human rendering (2024). arXiv preprint arXiv:2404.08449
Acknowledgements
This work was supported by the National Science Foundation of China (No. 62471168, 61802100 and 62372147). This work was also supported by the Zhejiang Provincial Natural Science Foundation of China (No. LDT23F02025F02, No.LY21F020019 and No. LY22F020028) and the Open Project Program of the State Key Laboratory of CAD &CG (No. A2314, No. A2304 and A2306), Zhejiang University. This work was also partially supported by Aeronautical Science Foundation of China (No. 2022Z0710T5001).
Author information
Authors and Affiliations
Contributions
SY and XG wrote the main manuscript text and SY prepared all figures and tables. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, S., Gu, X., Kuang, Z. et al. Innovative AI techniques for photorealistic 3D clothed human reconstruction from monocular images or videos: a survey. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03641-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03641-7