8000 hhguo (Haohan Guo) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View hhguo's full-sized avatar

Block or report hhguo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,064 10000 938 Updated May 16, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,584 225 Updated May 8, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 15,648 1,227 Updated May 15, 2025
Python 126 12 Updated Apr 11, 2025

An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.

Python 144 13 Updated Mar 22, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 209 13 Updated Apr 20, 2025

A Conversational Speech Generation Model

Python 13,239 1,257 Updated Mar 27, 2025

PodAgent: A Comprehensive Framework for Podcast Generation

Python 83 10 Updated May 16, 2025

Spark-TTS Inference Code

Python 9,380 980 Updated Apr 9, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 970 75 Updated Mar 27, 2025

A PyTorch native platform for training generative AI models

Python 3,813 368 Updated May 16, 2025

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!

Python 1,619 331 Updated Apr 28, 2025

Modern builds for the 90s/00s DECtalk text-to-speech application.

PostScript 330 34 Updated Mar 26, 2025

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 363 25 Updated May 13, 2025

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

Rust 4,154 275 Updated May 16, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,972 544 Updated May 15, 2025

pyMUSHRA is a python web application which hosts webMUSHRA experiments and collects the data with python.

Python 45 8 Updated Apr 18, 2025

We are committed to the open-sourcing quantitative knowledge, aiming to bridge the information gap between the domestic and international quantitative finance industries.我们致力于量化知识的开源与汉化,打破国内外量化金融行业…

1,426 108 Updated May 15, 2025

Speech, Language, Audio, Music Processing with Large Language Model

Python 806 77 Updated Apr 24, 2025

Unofficial implementation of miipher

Python 123 16 Updated Apr 19, 2024

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 1,870 203 Updated Mar 26, 2025

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 97 16 Updated Dec 20, 2024

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,465 280 Updated Apr 20, 2025

A small rust-based data loader

Rust 24 Updated Dec 10, 2024

Official repository of the xLSTM.

Python 1,858 144 Updated Apr 7, 2025

UTokyo-SaruLab MOS Prediction System

Python 178 17 Updated Apr 3, 2025

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 198 15 Updated Mar 7, 2025

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

141 7 Updated May 7, 2025

Awesome speech/audio LLMs, representation learning, and codec models

992 60 Updated Apr 25, 2025
Next
0