Liu, 2023 - Google Patents
Audio-Driven Talking Face Generation: A ReviewLiu, 2023
- Document ID
- 15721870832175795251
- Author
- Liu S
- Publication year
- Publication venue
- Journal of the Audio Engineering Society
External Links
Snippet
Given a face image and a speech audio, talking face generation refers to synthesizing a face video speaking the given speech. It has wide applications in movie dubbing, teleconference, virtual assistant, etc. This paper gives an overview of research progress on talking face …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | One-shot talking face generation from single-speaker audio-visual correlation learning | |
Thies et al. | Neural voice puppetry: Audio-driven facial reenactment | |
Lu et al. | Live speech portraits: real-time photorealistic talking-head animation | |
Yi et al. | Generating holistic 3d human motion from speech | |
Ginosar et al. | Learning individual styles of conversational gesture | |
US11514634B2 (en) | Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses | |
Pham et al. | Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural network | |
Zhou et al. | An image-based visual speech animation system | |
Rebol et al. | Passing a non-verbal turing test: Evaluating gesture animations generated from speech | |
Stoll et al. | Signsynth: Data-driven sign language video generation | |
Liu et al. | Talking face generation via facial anatomy | |
Zeng et al. | Expression-tailored talking face generation with adaptive cross-modal weighting | |
Song et al. | Audio-driven dubbing for user generated contents via style-aware semi-parametric synthesis | |
Wang et al. | Talking faces: Audio-to-video face generation | |
Liu | Audio-Driven Talking Face Generation: A Review | |
Gowda et al. | From pixels to portraits: A comprehensive survey of talking head generation techniques and applications | |
Nazarieh et al. | A Survey of Cross-Modal Visual Content Generation | |
Meng et al. | A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing | |
Song et al. | Virtual Human Talking-Head Generation | |
Lin et al. | EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention | |
Pham et al. | Style transfer for 2d talking head animation | |
Li et al. | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding | |
Lei et al. | A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | |
Feng et al. | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation | |
Wang et al. | Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head |