[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Liu, 2023 - Google Patents

Audio-Driven Talking Face Generation: A Review

Liu, 2023

Document ID
15721870832175795251
Author
Liu S
Publication year
Publication venue
Journal of the Audio Engineering Society

External Links

Snippet

Given a face image and a speech audio, talking face generation refers to synthesizing a face video speaking the given speech. It has wide applications in movie dubbing, teleconference, virtual assistant, etc. This paper gives an overview of research progress on talking face …
Continue reading at www.aes.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality

Similar Documents

Publication Publication Date Title
Wang et al. One-shot talking face generation from single-speaker audio-visual correlation learning
Thies et al. Neural voice puppetry: Audio-driven facial reenactment
Lu et al. Live speech portraits: real-time photorealistic talking-head animation
Yi et al. Generating holistic 3d human motion from speech
Ginosar et al. Learning individual styles of conversational gesture
US11514634B2 (en) Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses
Pham et al. Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural network
Zhou et al. An image-based visual speech animation system
Rebol et al. Passing a non-verbal turing test: Evaluating gesture animations generated from speech
Stoll et al. Signsynth: Data-driven sign language video generation
Liu et al. Talking face generation via facial anatomy
Zeng et al. Expression-tailored talking face generation with adaptive cross-modal weighting
Song et al. Audio-driven dubbing for user generated contents via style-aware semi-parametric synthesis
Wang et al. Talking faces: Audio-to-video face generation
Liu Audio-Driven Talking Face Generation: A Review
Gowda et al. From pixels to portraits: A comprehensive survey of talking head generation techniques and applications
Nazarieh et al. A Survey of Cross-Modal Visual Content Generation
Meng et al. A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Song et al. Virtual Human Talking-Head Generation
Lin et al. EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention
Pham et al. Style transfer for 2d talking head animation
Li et al. KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Lei et al. A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
Feng et al. EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation
Wang et al. Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head