More Web Proxy on the site http://driver.im/

research-article

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars

Authors:

Mingyuan Zhang,

Ziwei LiuAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 41, Issue 4

Article No.: 161, Pages 1 - 19

https://doi.org/10.1145/3528223.3530094

Published: 22 July 2022 Publication History

Abstract

3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages. Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation. Specifically, driven by natural language descriptions, we initialize 3D human geometry generation with a shape VAE network. Based on the generated 3D human shapes, a volume rendering model is utilized to further facilitate geometry sculpting and texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability. Codes are available at https://github.com/hongfz16/AvatarCLIP.

Supplemental Material

MP4 File

supplemental material

Download
332.65 MB

MP4 File

presentation

Download
647.27 MB

SRT File

presentation

Download
16.05 KB

References

[1]

Gunjan Aggarwal and Devi Parikh. 2021. Dance2Music: Automatic Dance-driven Music Generation. arXiv preprint arXiv:2107.06252 (2021).

[2]

Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh. 2018. Text2Action: Generative Adversarial Synthesis from Language to Action. In 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21--25, 2018. IEEE, 1--5.

Digital Library

[3]

Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In 2019 International Conference on 3D Vision (3DV). 719--728.

[4]

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387--8397.

[5]

Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions On Graphics (TOG) 38, 6 (2019), 1--11.

Digital Library

[6]

Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. 2020. Combining implicit function learning and parametric models for 3d human reconstruction. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 311--329.

[7]

Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5420--5430.

[8]

Andrei Burov, Matthias Nießner, and Justus Thies. 2021. Dynamic surface function networks for clothed human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10754--10764.

[9]

Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, and Ziwei Liu. 2022. HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling. arXiv preprint arXiv:2204.13686 (2022).

[10]

Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Jiatong Li, Zhengyu Lin, Haiyu Zhao, Shuai Yi, Lei Yang, et al. 2021. Playing for 3D Human Recovery. arXiv preprint arXiv:2110.07588 (2021).

[11]

Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5799--5809.

[12]

Xu Chen, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J Black, Andreas Geiger, and Otmar Hilliges. 2022. gDNA: Towards Generative Detailed Neural Avatars. arXiv (2022).

[13]

Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875--11885.

[14]

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016).

[15]

Kevin Frans, LB Soros, and Olaf Witkowski. 2021. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021).

[16]

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen-Change Loy, Wayne Wu, and Ziwei Liu. 2022. StyleGAN-Human: A Data-Centric Odyssey of Human Generation. arXiv preprint arXiv:2204.11823 (2022).

[17]

Anindita Ghosh, Noshaba Cheema, Cennet Oguz, Christian Theobalt, and Philipp Slusallek. 2021. Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN. In ACM SIGGRAPH 2021 Posters. 1--2.

Digital Library

[18]

Artur Grigorev, Karim Iskakov, Anastasia Ianina, Renat Bashirov, Ilya Zakharkin, Alexander Vakhitov, and Victor Lempitsky. 2021. StylePeople: A Generative Model of Fullbody Human Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5151--5160.

[19]

Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. 2020. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020).

[20]

Chuan Guo, Xinxin Zuo, Sen Wang, Shihao Zou, Qingyao Sun, Annan Deng, Minglun Gong, and Li Cheng. 2020. Action2motion: Conditioned generation of 3d human motions. In Proceedings of the 28th ACM International Conference on Multimedia. 2021--2029.

Digital Library

[21]

Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt. 2021. Real-time deep dynamic characters. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--16.

Digital Library

[22]

Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7543--7552.

[23]

Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. 2021. Garment4D: Garment Reconstruction from Point Cloud Sequences. In Advances in Neural Information Processing Systems.

[24]

Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. 2022. Versatile Multi-Modal Pre-Training for Human-Centric Perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093--3102.

[26]

Leslie Ikemoto, Okan Arikan, and David Forsyth. 2009. Generalizing Motion Edits with Gaussian Processes. ACM Trans. Graph. 28, 1, Article 1 (feb 2009), 12 pages.

Digital Library

[27]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325--1339.

[28]

Ajay Jain, Ben Mildenhall, Jonathan T Barron, Pieter Abbeel, and Ben Poole. 2021a. Zero-Shot Text-Guided Object Generation with Dream Fields. arXiv preprint arXiv:2112.01455 (2021).

[29]

Ajay Jain, Matthew Tancik, and Pieter Abbeel. 2021b. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5885--5894.

[30]

Nikolay Jetchev. 2021. ClipMatrix: Text-controlled Creation of 3D Textured Meshes. arXiv preprint arXiv:2109.12922 (2021).

[31]

Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, and Hujun Bao. 2020. Bcnet: Learning body and cloth shape from a single image. In European Conference on Computer Vision. Springer, 18--35.

Digital Library

[32]

Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, and Ziwei Liu. 2021. Talk-to-Edit: Fine-Grained Facial Editing via Dialog. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13799--13808.

[33]

Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, and Ziwei Liu. 2022. Text2Human: Text-Driven Controllable Human Image Generation. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--11.

Digital Library

[34]

Kacper Kania, Marek Kowalski, and Tomasz Trzciński. 2021. TrajeVAE-Controllable Human Motion Generation from Trajectories. arXiv preprint arXiv:2104.00351 (2021).

[35]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.

[36]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.

[37]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[38]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[39]

Kathleen M Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman. 2021. Tryongan: Body-aware try-on via layered interpolation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--10.

Digital Library

[40]

Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. 2021. Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13401--13412.

[41]

Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel Van De Panne. 2020. Character controllers using motion vaes. ACM Transactions on Graphics (TOG) 39, 4 (2020), 40--1.

Digital Library

[42]

Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1--16.

Digital Library

[43]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[44]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1--16.

Digital Library

[45]

William E Lorensen and Harvey E Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM siggraph computer graphics 21, 4 (1987), 163--169.

[46]

Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5442--5451.

[47]

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2015. Generating images from captions with attention. arXiv preprint arXiv:1511.02793 (2015).

[48]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV). IEEE, 506--516.

[49]

Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).

[50]

Marko Mihajlovic, Yan Zhang, Michael J Black, and Siyu Tang. 2021. LEAP: Learning articulated occupancy of people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10461--10471.

[51]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405--421.

Digital Library

[52]

Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical Motion Interpolation. ACM Trans. Graph. 24, 3 (jul 2005), 1062--1070.

Digital Library

[53]

Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. 2021. Neural articulated radiance field. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5762--5772.

[54]

Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. 2022. Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations. arXiv preprint arXiv:2204.08839 (2022).

[55]

Pablo Palafox, Aljaž Božič, Justus Thies, Matthias Nießner, and Angela Dai. 2021. NPMs: Neural Parametric Models for 3D Deformable Shapes. arXiv preprint arXiv:2104.00702 (2021).

[56]

Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085--2094.

[57]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975--10985.

[58]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021a. Animatable neural radiance fields for human body modeling. arXiv e-prints (2021), arXiv-2105.

[59]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021b. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054--9063.

[60]

Mathis Petrovich, Michael J Black, and Gül Varol. 2021. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE. arXiv preprint arXiv:2104.05670 (2021).

[61]

Leonid Pishchulin, Stefanie Wuhrer, Thomas Helten, Christian Theobalt, and Bernt Schiele. 2017. Building Statistical Shape Spaces for 3D Human Modeling. Pattern Recognition (2017).

[62]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021).

[63]

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021).

[64]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International Conference on Machine Learning. PMLR, 1060--1069.

[65]

Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. 1998. Verbs and Adverbs: Multidimensional Motion Interpolation. IEEE Comput. Graph. Appl. 18, 5 (sep 1998), 32--40.

Digital Library

[66]

Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J Black. 2021. SCANimate: Weakly supervised learning of skinned clothed avatar networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2886--2897.

[67]

Aditya Sanghi, Hang Chu, Joseph G Lambourne, Ye Wang, Chin-Yi Cheng, and Marco Fumero. 2021. Clip-forge: Towards zero-shot text-to-shape generation. arXiv preprint arXiv:2110.02624 (2021).

[68]

Kripasindhu Sarkar, Vladislav Golyanik, Lingjie Liu, and Christian Theobalt. 2021a. Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint arXiv:2102.11263 (2021).

[69]

Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. 2021b. HumanGAN: A Generative Model of Humans Images. arXiv preprint arXiv:2103.06902 (2021).

[70]

Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019a. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2377--2386.

[71]

Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019b. First order motion model for image animation. Advances in Neural Information Processing Systems 32 (2019), 7137--7147.

[72]

Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. MotionCLIP: Exposing Human Motion Generation to CLIP Space. arXiv preprint arXiv:2203.08063 (2022).

[73]

Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, et al. 2021. Advances in neural rendering. arXiv preprint arXiv:2111.05849 (2021).

[74]

Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2017. Learning from Synthetic Humans. In CVPR.

[75]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[76]

Timo von Marcard, Roberto Henschel, Michael Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera. In European Conference on Computer Vision (ECCV).

Digital Library

[77]

Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2021a. CLIPNeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. arXiv preprint arXiv:2112.05139 (2021).

[78]

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. 2021b. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. arXiv preprint arXiv:2106.10689 (2021).

[79]

Chung-Yi Weng, Brian Curless, and Ira Kemelmacher-Shlizerman. 2019. Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5908--5917.

[80]

Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--12.

Digital Library

[81]

Mohamed R. Amer Xiao Lin. 2014. Human Motion Modeling using DVGANs. arXiv preprint arXiv:1804.10652 (2014).

[82]

Hongyi Xu, Thiemo Alldieck, and Cristian Sminchisescu. 2021a. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. Advances in Neural Information Processing Systems 34 (2021).

[83]

Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, and Xiang Bai. 2021b. A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. arXiv preprint arXiv:2112.14757 (2021).

[84]

Jae Shin Yoon, Lingjie Liu, Vladislav Golyanik, Kripasindhu Sarkar, Hyun Soo Park, and Christian Theobalt. 2021. Pose-guided human animation from a single image in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15039--15048.

[85]

Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, and Wei Wang. 2021. 3D-Aware Semantic-Guided Generative Model for Human Synthesis. arXiv preprint arXiv:2112.01422 (2021).

[86]

Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2021. HumanNeRF: Generalizable Neural Human Radiance Field from Sparse Inputs. arXiv preprint arXiv:2112.02789 (2021).

[87]

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5745--5753.

Cited By

Sinha SDatta SKumar RBhattacharya SSarkar ADas K(2024)Exploring CreativityThe Pioneering Applications of Generative AI10.4018/979-8-3693-3278-8.ch008(167-198)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-3278-8.ch008
Kim YYoo HRyu JLee SLee JKim J(2024)TransSMPL: Efficient Human Pose Estimation with Pruned and Quantized Transformer NetworksElectronics10.3390/electronics1324498013:24(4980)Online publication date: 18-Dec-2024
https://doi.org/10.3390/electronics13244980
Chen FChoi J(2024)ATGT3D: Animatable Texture Generation and Tracking for 3D AvatarsElectronics10.3390/electronics1322456213:22(4562)Online publication date: 20-Nov-2024
https://doi.org/10.3390/electronics13224562
Show More Cited By

Index Terms

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars
1. Computing methodologies
  1. Computer graphics

Recommendations

Text2Human: text-driven controllable human image generation

Generating high-quality and diverse human images is an important yet challenging task in vision and graphics. However, existing generative models often fall short under the high diversity of clothing shapes and textures. Furthermore, the generation ...
E³Gen: Efficient, Expressive and Editable Avatars Generation
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

This paper aims to introduce 3D Gaussians for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: 1) The unstructured nature of 3D Gaussians makes it incompatible with current generation pipelines; 2) the ...
CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization

BNRist, Department of Computer Science and Technology, Tsinghua University, China

In the field of digital content creation, generating high-quality 3D characters from single images is challenging, especially given the complexities of various body poses ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 41, Issue 4

July 2022

1978 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3528223

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2022

Published in TOG Volume 41, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative
NTU NAP, MOE AcRF Tier 2

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

140
Total Citations
View Citations
482
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)18

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sinha SDatta SKumar RBhattacharya SSarkar ADas K(2024)Exploring CreativityThe Pioneering Applications of Generative AI10.4018/979-8-3693-3278-8.ch008(167-198)Online publication date: 28-Jun-2024
https://doi.org/10.4018/979-8-3693-3278-8.ch008
Kim YYoo HRyu JLee SLee JKim J(2024)TransSMPL: Efficient Human Pose Estimation with Pruned and Quantized Transformer NetworksElectronics10.3390/electronics1324498013:24(4980)Online publication date: 18-Dec-2024
https://doi.org/10.3390/electronics13244980
Chen FChoi J(2024)ATGT3D: Animatable Texture Generation and Tracking for 3D AvatarsElectronics10.3390/electronics1322456213:22(4562)Online publication date: 20-Nov-2024
https://doi.org/10.3390/electronics13224562
Wang L(2024)3D Animation Production and Generation Based on Human Body Base 3D ModelsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-00539:1Online publication date: 31-Jan-2024
https://doi.org/10.2478/amns-2024-0053
Persa G(2024)Aggregált avatárok szerepe a VR biztonságbanScientia et Securitas10.1556/112.2023.001834:4(294-301)Online publication date: 5-Aug-2024
https://doi.org/10.1556/112.2023.00183
Jian LJun YLiyan WYonggui W(2024)Incorporating variational auto-encoder networks for text-driven generation of 3D motion human bodyJournal of Image and Graphics10.11834/jig.23029129:5(1434-1446)Online publication date: 2024
https://doi.org/10.11834/jig.230291
Chen MLau M(2024)Enhancing the Aesthetics of 3D Shapes via Reference-based EditingACM Transactions on Graphics10.1145/368795443:6(1-15)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687954
Li YChen BRen ZDing YLiu LShao TZhou K(2024)CPoser: An Optimization-after-Parsing Approach for Text-to-Pose Generation Using Large Language ModelsACM Transactions on Graphics10.1145/368793243:6(1-13)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687932
Xiu YYe YLiu ZTzionas DBlack M(2024)PuzzleAvatar: Assembling 3D Avatars from Personal AlbumsACM Transactions on Graphics10.1145/368777143:6(1-15)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687771
Xu YZhu LYang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model GuidanceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681039(10910-10919)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681039
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents