Starred repositories
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
SincNet is a neural architecture for efficiently processing raw audio samples.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Noise suppression plugin based on Xiph's RNNoise
Robust Speech Recognition via Large-Scale Weak Supervision
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Simple Speech Keyword Detecting with Depthwise Separable Convolutions | DLology
A PyTorch Library for Multi-Task Learning
QbE Keyword Spotting System based on ASR
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Large, modern dataset for speech recognition
Instant voice cloning by MIT and MyShell. Audio foundation model.
On-device wake word detection powered by deep learning
Conferencing Speech Challenge
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.
notes of machine learning algorithm derivation