Corona et al., 2024 - Google Patents

VLOGGER: Multimodal diffusion for embodied avatar synthesis

Corona et al., 2024

Document ID: 17639355026261715266
Author: Corona E; Zanfir A; Bazavan E; Kolotouros N; Alldieck T; Sminchisescu C
Publication year: 2024
Publication venue: arXiv preprint arXiv:2403.08764

External Links

Cited by

Snippet

We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel …

Continue reading at arxiv.org (PDF) (other versions)

238000009792 diffusion process 0 title abstract description 44

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2021	Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
Athar et al.	2022	Rignerf: Fully controllable neural 3d portraits
Yi et al.	2020	Audio-driven talking face video generation with learning-based personalized head pose
Bagautdinov et al.	2018	Modeling facial geometry using compositional vaes
Martin et al.	2022	Scangan360: A generative model of realistic scanpaths for 360 images
Zhong et al.	2023	Identity-preserving talking face generation with landmark and appearance priors
Corona et al.	2024	VLOGGER: Multimodal diffusion for embodied avatar synthesis
Ye et al.	2023	Geneface++: Generalized and stable real-time audio-driven 3d talking face generation
Sinha et al.	2020	Identity-preserving realistic talking face generation
Ye et al.	2024	Real3d-portrait: One-shot realistic 3d talking portrait synthesis
Wang et al.	2020	Learning how to smile: Expression video generation with conditional adversarial recurrent nets
Elgharib et al.	2020	Egocentric videoconferencing
Shen et al.	2023	Sd-nerf: Towards lifelike talking head animation via spatially-adaptive dual-driven nerfs
Ling et al.	2023	Stableface: Analyzing and improving motion stability for talking face generation
Liu et al.	2023	Moda: Mapping-once audio-driven portrait animation with dual attentions
Sun et al.	2020	Twostreamvan: Improving motion modeling in video generation
Song et al.	2020	Unpaired person image generation with semantic parsing transformation
Sheng et al.	2023	Stochastic latent talking face generation towards emotional expressions and head poses
Hong et al.	2023	Dagan++: Depth-aware generative adversarial network for talking head video generation
Paier et al.	2023	Unsupervised learning of style-aware facial animation from real acting performances
Park et al.	2023	DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion
Meng et al.	2024	A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Tang et al.	2024	DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Lei et al.	2024	A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Lin et al.	2024	GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer