Starred repositories
The Hugging Face Course on Transformers for Audio
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"
[SIGGRAPH 2025] LAM: Large Avatar Model for One-shot Animatable Gaussian Head
An AI-powered interactive avatar engine using Live2D, LLM, ASR, TTS, and RVC. Ideal for VTubing, streaming, and virtual assistant applications.
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
PainterEngine is a application/game engine with software renderer,PainterEngine can be transplanted to any platform that supports C
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
A TTS model capable of generating ultra-realistic dialogue in one pass.
A service to convert audio to facial blendshapes for lipsyncing and facial performances.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Helpful tools and examples for working with flex-attention
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
Speech, Language, Audio, Music Processing with Large Language Model
A toolkit for making real world machine learning and data analysis applications in C++