Lists (3)
Sort Name ascending (A-Z)
Stars
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
ACE-Step: A Step Towards Music Generation Foundation Model
[CVPR'25] Official Implementation of MambaIC: State Space Models for High-Performance Learned Image Compression
A final sanity checklist to help your CS paper get accepted, not desk rejected.
A benchmark to evaluate full-duplex spoken dialogue models on pause handling, backchanneling, turn-taking, and user interruptions.
A dynamic library tweak for WeChat macOS - 首款微信 macOS 客户端撤回拦截与多开 🔨
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Collection of leaked system prompts
anan235 / dia-multilingual
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
MAGI-1: Autoregressive Video Generation at Scale
A TTS model capable of generating ultra-realistic dialogue in one pass.
Implementing DeepSeek R1's GRPO algorithm from scratch
NdLinear by Ensemble is a drop-in PyTorch module that shrinks your models with no accuracy loss. It powers the Ensemble Platform—upload any model and get back a smaller, faster version, ready to de…
Elegant reading of real-time and hottest news
Lightweight coding agent that runs in your terminal
🚀从聊天记录创造数字分身的一站式解决方案💡 使用聊天记录微调大语言模型,让大模型有“那味儿”,并绑定到聊天机器人,实现自己的数字分身。 数字克隆/数字分身/数字永生/LLM/聊天机器人/LoRA
This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.
A very simple GRPO implement for reproducing r1-like LLM thinking.
A lightweight audio codec based on a single quantizer