8000 TythonLee / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View TythonLee's full-sized avatar

Block or report TythonLee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,669 185 Updated May 8, 2025

Generative models for conditional audio generation

Python 3,347 356 Updated Jun 2, 2025

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 375 23 Updated May 30, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 3,157 294 Updated Jun 17, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 5,170 583 Updated Jun 4, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Python 363 24 Updated Jul 3, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 281 33 Updated Jun 15, 2025

Spark-TTS Inference Code

Python 9,944 1,054 Updated Apr 9, 2025

使用vllm加速cosyvoice2的推理

Jupyter Notebook 363 44 Updated Apr 26, 2025

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 125 4 Updated Mar 24, 2025

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 347 45 Updated Sep 30, 2024
Python 4,392 358 Updated Jun 12, 2025

InspireMusic: A toolkit designed for music, song, and audio generation

Python 1,128 107 Updated May 20, 2025

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 99 15 Updated Dec 20, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20 2 Updated Apr 16, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,942 1,581 Updated Jun 29, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,945 198 Updated May 19, 2025

Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark

Python 28 3 Updated May 7, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 338 45 Updated Jun 16, 2025

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Python 854 34 Updated Mar 7, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,025 243 Updated Jul 2, 2025

The repo provides information about KeSpeech dataset.

146 11 Updated Oct 13, 2022

CosyVoice在Windows环境下使用的版本

Python 699 100 Updated Nov 19, 2024

Noise supression using deep filtering

Python 3,159 297 Updated Oct 17, 2024

AI powered speech denoising and enhancement

Python 1,863 221 Updated Dec 3, 2024

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

391 17 Updated Jun 23, 2025

A generative speech model for daily dialogue.

Python 37,023 4,012 Updated May 23, 2025

SALMONN family: A suite of advanced multi-modal LLMs

1,275 100 Updated Jun 20, 2025
Next
0