8000 hhguo (Haohan Guo) / Starred · GitHub

More Web Proxy on the site http://driver.im/

hhguo

Follow

Haohan Guo hhguo

Follow

PhD student @ CUHK

132 followers · 45 following

Chinese University of Hong Kong
Hong Kong SAR, China
https://hhguo.github.io/

Achievements

Achievements

Starred repositories

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,064 938 Updated May 16, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,584 225 Updated May 8, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 15,648 1,227 Updated May 15, 2025

jzq2000 / MoonCast

Python 126 12 Updated Apr 11, 2025

facebookresearch / FlowDec

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

Python 144 13 Updated Mar 22, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 209 13 Updated Apr 20, 2025

SesameAILabs / csm

A Conversational Speech Generation Model

Python 13,239 1,257 Updated Mar 27, 2025

yujxx / PodAgent

PodAgent: A Comprehensive Framework for Podcast Generation

Python 83 10 Updated May 16, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 9,380 980 Updated Apr 9, 2025

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 970 75 Updated Mar 27, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,813 368 Updated May 16, 2025

scholarly-python-package / scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!

Python 1,619 331 Updated Apr 28, 2025

dectalk / dectalk

Modern builds for the 90s/00s DECtalk text-to-speech application.

PostScript 330 34 Updated Mar 26, 2025

ASLP-lab / OSUM

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 363 25 Updated May 13, 2025

tensorzero / tensorzero

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

Rust 4,154 275 Updated May 16, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,972 544 Updated May 15, 2025

nils-werner / pymushra

pyMUSHRA is a python web application which hosts webMUSHRA experiments and collects the data with python.

Python 45 8 Updated Apr 18, 2025

LLMQuant / quant-wiki

We are committed to the open-sourcing quantitative knowledge, aiming to bridge the information gap between the domestic and international quantitative finance industries.我们致力于量化知识的开源与汉化，打破国内外量化金融行业…

1,426 108 Updated May 15, 2025

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 806 77 Updated Apr 24, 2025

Wataru-Nakata / miipher

Unofficial implementation of miipher

Python 123 16 Updated Apr 19, 2024

jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 1,870 203 Updated Mar 26, 2025

deepseek-ai / DeepSeek-V3

Python 96,799 15,737 Updated Apr 9, 2025

ScottishFold007 / TTSAudioNormalizer

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 97 16 Updated Dec 20, 2024

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,465 280 Updated Apr 20, 2025

kyutai-labs / yomikomi

A small rust-based data loader

Rust 24 Updated Dec 10, 2024

NX-AI / xlstm

Official repository of the xLSTM.

Python 1,858 144 Updated Apr 7, 2025

sarulab-speech / UTMOSv2

UTokyo-SaruLab MOS Prediction System

Python 178 17 Updated Apr 3, 2025

haoheliu / SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 198 15 Updated Mar 7, 2025

imxtx / awesome-controllable-speech-synthesis

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

141 7 Updated May 7, 2025

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

992 60 Updated Apr 25, 2025

Starred topics

Natural language processing

0