Bigioi et al., 2024 - Google Patents

Speech driven video editing via an audio-conditioned diffusion model

Bigioi et al., 2024

Document ID: 18057207411151414915
Author: Bigioi D; Basak S; Stypułkowski M; Zieba M; Jordan H; McDonnell R; Corcoran P
Publication year: 2024
Publication venue: Image and Vision Computing

External Links

Cited by

Snippet

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech …

Continue reading at www.sciencedirect.com (HTML) (other versions)

238000009792 diffusion process 0 title abstract description 97

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general

Similar Documents

Publication	Publication Date	Title
Bigioi et al.	2024	Speech driven video editing via an audio-conditioned diffusion model
Lu et al.	2021	Live speech portraits: real-time photorealistic talking-head animation
Zhang et al.	2021	Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
Jamaludin et al.	2019	You said that?: Synthesising talking faces from audio
Song et al.	2018	Talking face generation by conditional recurrent adversarial network
Park et al.	2022	Synctalkface: Talking face generation with precise lip-syncing via audio-lip memory
US20210390945A1 (en)	2021-12-16	Text-driven video synthesis with phonetic dictionary
Hussen Abdelaziz et al.	2020	Modality dropout for improved performance-driven talking faces
Peng et al.	2023	Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces
Stoll et al.	2020	Signsynth: Data-driven sign language video generation
Liu et al.	2023	Moda: Mapping-once audio-driven portrait animation with dual attentions
Chen et al.	2022	Transformer-s2a: Robust and efficient speech-to-animation
Medina et al.	2022	Speech driven tongue animation
Zeng et al.	2022	Expression-tailored talking face generation with adaptive cross-modal weighting
Yan et al.	2024	Dialoguenerf: Towards realistic avatar face-to-face conversation video generation
Yang et al.	2024	Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications
Huang et al.	2021	Fine-grained talking face generation with video reinterpretation
Li et al.	2022	Speech driven facial animation generation based on GAN
Wang et al.	2022	Talking faces: Audio-to-video face generation
Jang et al.	2024	Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Bigioi et al.	2022	Pose-aware speech driven facial landmark animation pipeline for automated dubbing
Hussen Abdelaziz et al.	2019	Speaker-independent speech-driven visual speech synthesis using domain-adapted acoustic models
Nazarieh et al.	2024	A Survey of Cross-Modal Visual Content Generation
Jha et al.	2019	Cross-language speech dependent lip-synchronization
Li et al.	2024	KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding