Liu et al., 2023 - Google Patents

Moda: Mapping-once audio-driven portrait animation with dual attentions

Liu et al., 2023

Document ID: 12958348119986055500
Author: Liu Y; Lin L; Yu F; Zhou C; Li Y
Publication year: 2023
Publication venue: Proceedings of the IEEE/CVF International Conference on Computer Vision

External Links

Cited by

Snippet

Audio-driven portrait animation aims to synthesize portrait videos that are conditioned by given audio. Animating high-fidelity and multimodal video portraits has a variety of applications. Previous methods have attempted to capture different motion modes and …

Continue reading at openaccess.thecvf.com (PDF) (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2021	Facial: Synthesizing dynamic talking face with implicit attribute learning
Wang et al.	2023	Seeing what you said: Talking face generation guided by a lip reading expert
Lu et al.	2021	Live speech portraits: real-time photorealistic talking-head animation
Ye et al.	2023	Geneface: Generalized and high-fidelity audio-driven 3d talking face synthesis
Liang et al.	2022	Expressive talking head generation with granular audio-visual control
Guo et al.	2021	Ad-nerf: Audio driven neural radiance fields for talking head synthesis
Yi et al.	2020	Audio-driven talking face video generation with learning-based personalized head pose
Tian et al.	2024	Emo: Emote portrait alive-generating expressive portrait videos with audio2video diffusion model under weak conditions
Zhu et al.	2018	Arbitrary talking face generation via attentional audio-visual coherence learning
Liu et al.	2023	Moda: Mapping-once audio-driven portrait animation with dual attentions
Sinha et al.	2022	Emotion-controllable generalized talking face generation
Gururani et al.	2023	Space: Speech-driven portrait animation with controllable expression
Ye et al.	2023	Geneface++: Generalized and stable real-time audio-driven 3d talking face generation
Hajarolasvadi et al.	2020	Generative adversarial networks in human emotion synthesis: A review
Bigioi et al.	2024	Speech driven video editing via an audio-conditioned diffusion model
Zhou et al.	2012	An image-based visual speech animation system
Ma et al.	2023	Talkclip: Talking head generation with text-guided expressive speaking styles
Corona et al.	2024	VLOGGER: Multimodal diffusion for embodied avatar synthesis
Tan et al.	2024	Style2talker: High-resolution talking head generation with emotion style and art style
Liu et al.	2023	Talking face generation via facial anatomy
Hong et al.	2023	Dagan++: Depth-aware generative adversarial network for talking head video generation
Wang et al.	2024	StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Wang et al.	2022	Talking faces: Audio-to-video face generation
Ji et al.	2024	Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network
Liu et al.	2024	OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions