Corona et al., 2024 - Google Patents
VLOGGER: Multimodal diffusion for embodied avatar synthesisCorona et al., 2024
View PDF- Document ID
- 17639355026261715266
- Author
- Corona E
- Zanfir A
- Bazavan E
- Kolotouros N
- Alldieck T
- Sminchisescu C
- Publication year
- Publication venue
- arXiv preprint arXiv:2403.08764
External Links
Snippet
We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel …
- 238000009792 diffusion process 0 title abstract description 44
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset | |
Athar et al. | Rignerf: Fully controllable neural 3d portraits | |
Yi et al. | Audio-driven talking face video generation with learning-based personalized head pose | |
Bagautdinov et al. | Modeling facial geometry using compositional vaes | |
Martin et al. | Scangan360: A generative model of realistic scanpaths for 360 images | |
Zhong et al. | Identity-preserving talking face generation with landmark and appearance priors | |
Corona et al. | VLOGGER: Multimodal diffusion for embodied avatar synthesis | |
Ye et al. | Geneface++: Generalized and stable real-time audio-driven 3d talking face generation | |
Sinha et al. | Identity-preserving realistic talking face generation | |
Ye et al. | Real3d-portrait: One-shot realistic 3d talking portrait synthesis | |
Wang et al. | Learning how to smile: Expression video generation with conditional adversarial recurrent nets | |
Elgharib et al. | Egocentric videoconferencing | |
Shen et al. | Sd-nerf: Towards lifelike talking head animation via spatially-adaptive dual-driven nerfs | |
Ling et al. | Stableface: Analyzing and improving motion stability for talking face generation | |
Liu et al. | Moda: Mapping-once audio-driven portrait animation with dual attentions | |
Sun et al. | Twostreamvan: Improving motion modeling in video generation | |
Song et al. | Unpaired person image generation with semantic parsing transformation | |
Sheng et al. | Stochastic latent talking face generation towards emotional expressions and head poses | |
Hong et al. | Dagan++: Depth-aware generative adversarial network for talking head video generation | |
Paier et al. | Unsupervised learning of style-aware facial animation from real acting performances | |
Park et al. | DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion | |
Meng et al. | A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing | |
Tang et al. | DPHMs: Diffusion Parametric Head Models for Depth-based Tracking | |
Lei et al. | A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | |
Lin et al. | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer |