-
Chinese University of Hong Kong
- Hong Kong SAR, China
- https://hhguo.github.io/
Starred repositories
verl: Volcano Engine Reinforcement Learning for LLMs
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
A TTS model capable of generating ultra-realistic dialogue in one pass.
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
A Conversational Speech Generation Model
PodAgent: A Comprehensive Framework for Podcast Generation
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
A PyTorch native platform for training generative AI models
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
Modern builds for the 90s/00s DECtalk text-to-speech application.
OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
pyMUSHRA is a python web application which hosts webMUSHRA experiments and collects the data with python.
We are committed to the open-sourcing quantitative knowledge, aiming to bridge the information gap between the domestic and international quantitative finance industries.我们致力于量化知识的开源与汉化,打破国内外量化金融行业…
Speech, Language, Audio, Music Processing with Large Language Model
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
zero-shot voice conversion & singing voice conversion, with real-time support
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".
Awesome speech/audio LLMs, representation learning, and codec models