Stars
Codename's rvc fork version 3, based on Applio.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis model
batch files for setup environments and training the DiffSinger models
speaker-disentangled speech linguistic content quantizer
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
GPT-4o-level, real-time spoken dialogue system.
J-Moshi: A Japanese Full-duplex Spoken Dialogue System
Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching (ICASSP '25)
[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
a Frontier Japanese Speech Generation net