Starred repositories
CUDA C 编程权威指南代码实现 包含了书上第二章到第八章的大部分代码实现和作者笔记,全由作者本人手动实现,难免有错误的地方,请大家谨慎参考,非常欢迎对错误的指正。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
In-car multi-channel speech transcription system of AISHELL-5.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Generative models for conditional audio generation
Monolingual wordlists with pronunciation information in IPA
A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
This is a balanced dataset for English homograph disambiguation (HD), generated with Meta's Llama 2-Chat 70B model.
Clean and modernized implementation of FastSpeech2/LightSpeech using IPA
Added vLLM support to IndexTTS for faster inference.
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.
Fine-tuning Moshi/J-Moshi on your own spoken dialogue data
Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis model
Grapheme-to-phoneme tool for corpus conversion, where phonemes match Phoible inventories
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate
VoiceBench: Benchmarking LLM-Based Voice Assistants
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
High quality text-to-speech based on StyleTTS 2.
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
Codebase for Iterative DPO Using Rule-based Rewards
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
A high-throughput and memory-efficient inference and serving engine for LLMs