Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)

Python 227 24 Updated Jul 31, 2024

jjunak-yun / FLowHigh_code

[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"

Python 72 7 Updated Jan 17, 2025

jishengpeng / TextrolSpeech

[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Python 173 6 Updated Nov 22, 2024

bytedance / MegaTTS3

Python 5,591 418 Updated May 11, 2025

harry0703 / MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 37,518 5,386 Updated Jun 11, 2025

Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

Python 282 34 Updated Jan 1, 2025

Zyphra / Zonos

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 6,805 758 Updated Mar 5, 2025

jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 1,929 206 Updated May 17, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 5,167 581 Updated Jun 4, 2025

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,106 80 Updated Mar 27, 2025

Naozumi520 / g2pW-Cantonese

Cantonese Grapheme-to-Phoneme Converter based on GitYCC/g2pW

Python 13 3 Updated Dec 10, 2024

deepseek-ai / DeepSeek-R1

90,364 11,658 Updated Jun 27, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,760 1,438 Updated Jun 30, 2025

emo-box / EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 255 12 Updated Mar 31, 2025

MahmoudAshraf97 / ctc-forced-aligner

Text to speech alignment using CTC forced alignment

Python 307 59 Updated Mar 24, 2025

ZFTurbo / Music-Source-Separation-Training

Repository for training models for music source separation.

Python 805 110 Updated Jun 20, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 12,512 1,803 Updated Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

donghaiyw

Block or report donghaiyw

Stars

LAION-AI / emotion-annotations

JusperLee / SPMamba

ZeyueT / AudioX

TEN-framework / ten-vad

ace-step / ACE-Step

hubertsiuzdak / snac

nari-labs / dia

MoonshotAI / Kimi-Audio

SandAI-org / MAGI-1

WeichenFan / CFG-Zero-star

jzq2000 / MoonCast

mindverse / Second-Me

Lakonik / GMFlow

hayeong0 / DDDM-VC