[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Corona et al., 2024 - Google Patents

VLOGGER: Multimodal diffusion for embodied avatar synthesis

Corona et al., 2024

View PDF
Document ID
17639355026261715266
Author
Corona E
Zanfir A
Bazavan E
Kolotouros N
Alldieck T
Sminchisescu C
Publication year
Publication venue
arXiv preprint arXiv:2403.08764

External Links

Snippet

We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00362Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
    • G06K9/00369Recognition of whole body, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality

Similar Documents

Publication Publication Date Title
Zhang et al. Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
Athar et al. Rignerf: Fully controllable neural 3d portraits
Yi et al. Audio-driven talking face video generation with learning-based personalized head pose
Bagautdinov et al. Modeling facial geometry using compositional vaes
Martin et al. Scangan360: A generative model of realistic scanpaths for 360 images
Zhong et al. Identity-preserving talking face generation with landmark and appearance priors
Corona et al. VLOGGER: Multimodal diffusion for embodied avatar synthesis
Ye et al. Geneface++: Generalized and stable real-time audio-driven 3d talking face generation
Sinha et al. Identity-preserving realistic talking face generation
Ye et al. Real3d-portrait: One-shot realistic 3d talking portrait synthesis
Wang et al. Learning how to smile: Expression video generation with conditional adversarial recurrent nets
Elgharib et al. Egocentric videoconferencing
Shen et al. Sd-nerf: Towards lifelike talking head animation via spatially-adaptive dual-driven nerfs
Ling et al. Stableface: Analyzing and improving motion stability for talking face generation
Liu et al. Moda: Mapping-once audio-driven portrait animation with dual attentions
Sun et al. Twostreamvan: Improving motion modeling in video generation
Song et al. Unpaired person image generation with semantic parsing transformation
Sheng et al. Stochastic latent talking face generation towards emotional expressions and head poses
Hong et al. Dagan++: Depth-aware generative adversarial network for talking head video generation
Paier et al. Unsupervised learning of style-aware facial animation from real acting performances
Park et al. DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion
Meng et al. A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Tang et al. DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Lei et al. A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Lin et al. GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer