Stars
[Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machi…
🛠「Watt Toolkit」是一个开源跨平台的多功能 Steam 工具箱。
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Unofficial PyTorch implementation of "Keyword Transformer: A Self-Attention Model for Keyword Spotting", Berg et al. 2021.
A generative speech model for daily dialogue.
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
FSA/FST algorithms, differentiable, with PyTorch compatibility.
KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的mcp框架。
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
PyTorch implementation of the U-Net for image semantic segmentation with high quality images
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Pytorch reimplementation for "Gradient Surgery for Multi-Task Learning"
A PyTorch Library for Multi-Task Learning
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Stable Diffusion web UI