Starred repositories
🎥 Python and OpenCV-based scene cut/transition detection program & library.
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Train high-quality text-to-image diffusion models in a data & compute efficient manner
Official PyTorch implementation of "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis"
Chinese voice corpus. 中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。
ACE-Step: A Step Towards Music Generation Foundation Model
OneDiff: An out-of-the-box acceleration library for diffusion models.
[ICML 2025] Differentiable Solver Search for Fast Diffusion Sampling
applying audio FX with text descriptors
Variable Bitrate Residual Vector Quantization for Audio Coding
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
Implementation of SoundStorm built upon SpeechTokenizer.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Official repository for Aria-MIDI: a MIDI dataset of 1,186,253 transcribed solo-piano recordings.
High-performance Image Tokenizers for VAR and AR
Official repository of SepReformer for speech separation
Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
A TTS model capable of generating ultra-realistic dialogue in one pass.
Self-supervised Generative LM-based Voice Conversion
MAGI-1: Autoregressive Video Generation at Scale