8000 TythonLee / Starred · GitHub

More Web Proxy on the site http://driver.im/

TythonLee

Follow

TythonLee

Follow

3 followers · 14 following

Starred repositories

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,669 185 Updated May 8, 2025

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

Python 3,347 356 Updated Jun 2, 2025

Stability-AI / stable-codec

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 375 23 Updated May 30, 2025

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 3,157 294 Updated Jun 17, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 5,170 583 Updated Jun 4, 2025

inclusionAI / Ming

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Python 363 24 Updated Jul 3, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 281 33 Updated Jun 15, 2025

BytedanceSpeech / seed-tts-eval

Python 1,352 120 Updated Jun 14, 2024

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 9,944 1,054 Updated Apr 9, 2025

qi-hua / async_cosyvoice

使用vllm加速cosyvoice2的推理

Jupyter Notebook 363 44 Updated Apr 26, 2025

ajd12342 / paraspeechcaps

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 125 4 Updated Mar 24, 2025

SuperKogito / SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 347 45 Updated Sep 30, 2024

stepfun-ai / Step-Audio

Python 4,392 358 Updated Jun 12, 2025

FunAudioLLM / InspireMusic

InspireMusic: A toolkit designed for music, song, and audio generation

Python 1,128 107 Updated May 20, 2025

deepseek-ai / DeepSeek-R1

90,386 11,660 Updated Jun 27, 2025

ScottishFold007 / TTSAudioNormalizer

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 99 15 Updated Dec 20, 2024

hexisyztem / CosyVoice

Forked from FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20 2 Updated Apr 16, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,942 1,581 Updated Jun 29, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,945 198 Updated May 19, 2025

lifeiteng / Aligner-SUPERB

Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark

Python 28 3 Updated May 7, 2025

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 338 45 Updated Jun 16, 2025

lifeiteng / OmniSenseVoice

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Python 854 34 Updated Mar 7, 2025

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,025 243 Updated Jul 2, 2025

KeSpeech / KeSpeech

The repo provides information about KeSpeech dataset.

146 11 Updated Oct 13, 2022

v3ucn / CosyVoice_For_Windows

CosyVoice在Windows环境下使用的版本

Python 699 100 Updated Nov 19, 2024

Rikorose / DeepFilterNet

Noise supression using deep filtering

Python 3,159 297 Updated Oct 17, 2024

resemble-ai / resemble-enhance

AI powered speech denoising and enhancement

Python 1,863 221 Updated Dec 3, 2024

metame-ai / awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

391 17 Updated Jun 23, 2025

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 37,023 4,012 Updated May 23, 2025

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,275 100 Updated Jun 20, 2025

Starred topics

gender-recognition-by-voice

multimodal-deep-learning

speech-emotion-recognition

multimodal-sentiment-analysis

0