Bigioi et al., 2024 - Google Patents
Speech driven video editing via an audio-conditioned diffusion modelBigioi et al., 2024
View HTML- Document ID
- 18057207411151414915
- Author
- Bigioi D
- Basak S
- Stypułkowski M
- Zieba M
- Jordan H
- McDonnell R
- Corcoran P
- Publication year
- Publication venue
- Image and Vision Computing
External Links
Snippet
Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech …
- 238000009792 diffusion process 0 title abstract description 97
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bigioi et al. | Speech driven video editing via an audio-conditioned diffusion model | |
Lu et al. | Live speech portraits: real-time photorealistic talking-head animation | |
Zhang et al. | Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset | |
Jamaludin et al. | You said that?: Synthesising talking faces from audio | |
Song et al. | Talking face generation by conditional recurrent adversarial network | |
Park et al. | Synctalkface: Talking face generation with precise lip-syncing via audio-lip memory | |
US20210390945A1 (en) | Text-driven video synthesis with phonetic dictionary | |
Hussen Abdelaziz et al. | Modality dropout for improved performance-driven talking faces | |
Peng et al. | Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces | |
Stoll et al. | Signsynth: Data-driven sign language video generation | |
Liu et al. | Moda: Mapping-once audio-driven portrait animation with dual attentions | |
Chen et al. | Transformer-s2a: Robust and efficient speech-to-animation | |
Medina et al. | Speech driven tongue animation | |
Zeng et al. | Expression-tailored talking face generation with adaptive cross-modal weighting | |
Yan et al. | Dialoguenerf: Towards realistic avatar face-to-face conversation video generation | |
Yang et al. | Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications | |
Huang et al. | Fine-grained talking face generation with video reinterpretation | |
Li et al. | Speech driven facial animation generation based on GAN | |
Wang et al. | Talking faces: Audio-to-video face generation | |
Jang et al. | Faces that Speak: Jointly Synthesising Talking Face and Speech from Text | |
Bigioi et al. | Pose-aware speech driven facial landmark animation pipeline for automated dubbing | |
Hussen Abdelaziz et al. | Speaker-independent speech-driven visual speech synthesis using domain-adapted acoustic models | |
Nazarieh et al. | A Survey of Cross-Modal Visual Content Generation | |
Jha et al. | Cross-language speech dependent lip-synchronization | |
Li et al. | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding |