Liu, 2023 - Google Patents

Audio-Driven Talking Face Generation: A Review

Liu, 2023

Document ID: 15721870832175795251
Author: Liu S
Publication year: 2023
Publication venue: Journal of the Audio Engineering Society

External Links

Cited by

Snippet

Given a face image and a speech audio, talking face generation refers to synthesizing a face video speaking the given speech. It has wide applications in movie dubbing, teleconference, virtual assistant, etc. This paper gives an overview of research progress on talking face …

Continue reading at www.aes.org (other versions)

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality

Similar Documents

Publication	Publication Date	Title
Wang et al.	2022	One-shot talking face generation from single-speaker audio-visual correlation learning
Thies et al.	2020	Neural voice puppetry: Audio-driven facial reenactment
Lu et al.	2021	Live speech portraits: real-time photorealistic talking-head animation
Yi et al.	2023	Generating holistic 3d human motion from speech
Ginosar et al.	2019	Learning individual styles of conversational gesture
US11514634B2 (en)	2022-11-29	Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses
Pham et al.	2018	Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural network
Zhou et al.	2012	An image-based visual speech animation system
Rebol et al.	2021	Passing a non-verbal turing test: Evaluating gesture animations generated from speech
Stoll et al.	2020	Signsynth: Data-driven sign language video generation
Liu et al.	2023	Talking face generation via facial anatomy
Zeng et al.	2022	Expression-tailored talking face generation with adaptive cross-modal weighting
Song et al.	2022	Audio-driven dubbing for user generated contents via style-aware semi-parametric synthesis
Wang et al.	2022	Talking faces: Audio-to-video face generation
Liu	2023	Audio-Driven Talking Face Generation: A Review
Gowda et al.	2023	From pixels to portraits: A comprehensive survey of talking head generation techniques and applications
Nazarieh et al.	2024	A Survey of Cross-Modal Visual Content Generation
Meng et al.	2024	A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Song et al.	2023	Virtual Human Talking-Head Generation
Lin et al.	2024	EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention
Pham et al.	2023	Style transfer for 2d talking head animation
Li et al.	2024	KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Lei et al.	2024	A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Feng et al.	2024	EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation
Wang et al.	2023	Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head