8000 yl4579 (Aaron (Yinghao) Li) / Starred · GitHub

More Web Proxy on the site http://driver.im/

8000

yl4579

Follow

Aaron (Yinghao) Li yl4579

Follow

361 followers · 7 following

Columbia University
New York, US

Achievements

Achievements

Highlights

Pro

Stars

SesameAILabs / csm

A Conversational Speech Generation Model

Python 13,053 1,219 Updated Mar 27, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 475 32 Updated May 1, 2025

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 550 38 Updated Apr 8, 2025

deepseek-ai / DeepSeek-R1

89,083 11,521 Updated Apr 9, 2025

facebookresearch / large_concept_model

Large Concept Models: Language modeling in a sentence representation space

Python 2,127 193 Updated Jan 29, 2025

naver-ai / usdm

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 87 3 Updated Dec 3, 2024

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

23,072 1,922 Updated May 5, 2025

alessandroragano / scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 71 4 Updated Jan 24, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 20,965 1,676 Updated Apr 12, 2025

mini-sora / minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,267 150 Updated Feb 18, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 11,702 1,646 Updated May 5, 2025

bytedance / SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,218 100 Updated Mar 4, 2025

FireRedTeam / FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Python 695 56 Updated Apr 15, 2025

tencent-ailab / MuCodec

Python 86 5 Updated Nov 22, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

33,368 1,822 Updated Aug 1, 2024

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 268 11 Updated Apr 15, 2025

SonyCSLParis / music2latent

Encode and decode audio samples to/from compressed latent representations!

Python 203 12 Updated Feb 24, 2025

Aria-K-Alethia / BigCodec

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 158 13 Updated Sep 19, 2024

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,162 677 Updated May 6, 2025

yangdongchao / SimpleSpeech

The open source code for SimpleSpeech series

Python 138 8 Updated Oct 8, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,907 197 Updated Apr 19, 2025

supertone-inc / super-monotonic-align

Python 143 9 Updated Sep 19, 2024

WangHelin1997 / SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 130 13 Updated Jan 1, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,267 169 Updated Mar 28, 2025

keonlee9420 / evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 78 10 Updated Mar 12, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 206 13 Updated Apr 20, 2025

AudioLLMs / Awesome-Audio-LLM

Audio Large Language Models

Python 514 30 Updated Mar 9, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,717 129 Updated Apr 21, 2025

showlab / Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,378 58 Updated Apr 28, 2025

AI-S2-Lab / GPT-Talker

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

33 2 Updated Oct 28, 2024

0