Stars
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Official Implementation of LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models.
超级速查表 - 编程语言、框架和开发工具的速查表,单个文件包含一切你需要知道的东西 ⚡
Directional sparse filtering for blind speech separation
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Ola: Pushing the Frontiers of Omni-Modal Language Model
A high-throughput and memory-efficient inference and serving engine for LLMs
Minimal reproduction of DeepSeek R1-Zero
Awesome speech/audio LLMs, representation learning, and codec models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
DCCRN with various loss functions
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Weighted Spatial Covariance Matrix Estimation for MUSIC based TDOA Estimation of Speech Source
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support