[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Bigioi et al., 2024 - Google Patents

Speech driven video editing via an audio-conditioned diffusion model

Bigioi et al., 2024

View HTML
Document ID
18057207411151414915
Author
Bigioi D
Basak S
Stypułkowski M
Zieba M
Jordan H
McDonnell R
Corcoran P
Publication year
Publication venue
Image and Vision Computing

External Links

Snippet

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech …
Continue reading at www.sciencedirect.com (HTML) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general

Similar Documents

Publication Publication Date Title
Bigioi et al. Speech driven video editing via an audio-conditioned diffusion model
Lu et al. Live speech portraits: real-time photorealistic talking-head animation
Zhang et al. Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
Jamaludin et al. You said that?: Synthesising talking faces from audio
Song et al. Talking face generation by conditional recurrent adversarial network
Park et al. Synctalkface: Talking face generation with precise lip-syncing via audio-lip memory
US20210390945A1 (en) Text-driven video synthesis with phonetic dictionary
Hussen Abdelaziz et al. Modality dropout for improved performance-driven talking faces
Peng et al. Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces
Stoll et al. Signsynth: Data-driven sign language video generation
Liu et al. Moda: Mapping-once audio-driven portrait animation with dual attentions
Chen et al. Transformer-s2a: Robust and efficient speech-to-animation
Medina et al. Speech driven tongue animation
Zeng et al. Expression-tailored talking face generation with adaptive cross-modal weighting
Yan et al. Dialoguenerf: Towards realistic avatar face-to-face conversation video generation
Yang et al. Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks Methods and Applications
Huang et al. Fine-grained talking face generation with video reinterpretation
Li et al. Speech driven facial animation generation based on GAN
Wang et al. Talking faces: Audio-to-video face generation
Jang et al. Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Bigioi et al. Pose-aware speech driven facial landmark animation pipeline for automated dubbing
Hussen Abdelaziz et al. Speaker-independent speech-driven visual speech synthesis using domain-adapted acoustic models
Nazarieh et al. A Survey of Cross-Modal Visual Content Generation
Jha et al. Cross-language speech dependent lip-synchronization
Li et al. KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding