-
gerzz.inc
- shanghai
- dubbing-ai.com dubbingai.io
-
Sonic1 Public
Forked from jixiaozhong/SonicOfficial implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
Python Other UpdatedMay 6, 2025 -
sonic Public
Forked from waywardgeek/sonicSimple library to speed up or slow down speech
C Apache License 2.0 UpdatedMay 6, 2025 -
TTS.cpp Public
Forked from mmwillet/TTS.cppTTS support with GGML
C++ MIT License UpdatedMay 6, 2025 -
Orpheus-TTS Public
Forked from canopyai/Orpheus-TTSTTS Towards Human-Sounding Speech
Python Apache License 2.0 UpdatedMay 6, 2025 -
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceLLM based TTS model, providing inference/training/deployment full-stack ability.
Python Apache License 2.0 UpdatedMay 6, 2025 -
-
10000 ACE-Step Public
Forked from ace-step/ACE-StepACE-Step: A Step Towards Music Generation Foundation Model
Python Apache License 2.0 UpdatedMay 6, 2025 -
VoxBox Public
Forked from SparkAudio/VoxBoxA large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.
Python Other UpdatedMay 5, 2025 -
-
InspireMusic Public
Forked from FunAudioLLM/InspireMusicInspireMusic: A fundamental toolkit for music, song and audio generation.
Python Apache License 2.0 UpdatedMay 2, 2025 -
Whisper-Sidecar Public
Forked from LingweiMeng/Whisper-SidecarThe implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
Python MIT License UpdatedMay 2, 2025 -
Qwen2.5-Omni Public
Forked from QwenLM/Qwen2.5-OmniQwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Jupyter Notebook Apache License 2.0 UpdatedMay 1, 2025 -
EmoVoice Public
Forked from yanghaha0908/EmoVoiceOfficial code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
Python UpdatedMay 1, 2025 -
OSUM Public
Forked from ASLP-lab/OSUM西北工业大学ASLP实验室OSUM项目官方库
Python Apache License 2.0 UpdatedApr 30, 2025 -
kokoro-rust Public
Forked from mzdk100/kokoroKokoro TTS的Rust推理实现
C Apache License 2.0 UpdatedApr 30, 2025 -
-
dia Public
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
Python Apache License 2.0 UpdatedApr 29, 2025 -
CycleDiffusion Public
Forked from hpjang/CycleDiffusionThis repository provides the source code associated with the paper "CycleDiffusion: Voice Conversion Using Cycle-Consistent Diffusion Models."
Python MIT License UpdatedApr 29, 2025 -
-
NeMo Public
Forked from NVIDIA/NeMoNeMo: a toolkit for conversational AI
-
Marco-o1 Public
Forked from AIDC-AI/Marco-o1An Open Large Reasoning Model for Real-World Solutions
Python Other UpdatedApr 28, 2025 -
ClearerVoice-Studio Public
Forked from modelscope/ClearerVoice-StudioClearVoice
Python Apache License 2.0 UpdatedApr 28, 2025 -
NeMo_VoiceTextBlender Public
Forked from pyf98/NeMo_VoiceTextBlenderCode for our NAACL 2025 paper: "VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning"
Python Apache License 2.0 UpdatedApr 28, 2025 -
tts_impl Public
Forked from uthree/tts_implimplementation of text to speech models
Python MIT License UpdatedApr 28, 2025 -
-
Kimi-Audio Public
Forked from MoonshotAI/Kimi-AudioKimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Python UpdatedApr 28, 2025 -
onnxruntime Public
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance scoring engine for ML models
C++ MIT License UpdatedApr 27, 2025 -
-
StyleTTS2-lite Public
Forked from dangtr0408/StyleTTS2-liteA lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.
Python MIT License UpdatedApr 26, 2025 -