8000 yl4579 (Aaron (Yinghao) Li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View yl4579's full-sized avatar
  • Columbia University
  • New York, US

Highlights

  • Pro

Block or report yl4579

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Conversational Speech Generation Model

Python 13,053 1,219 Updated Mar 27, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 475 32 Updated May 1, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 550 38 Updated Apr 8, 2025

Large Concept Models: Language modeling in a sentence representation space

Python 2,127 193 Updated Jan 29, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 87 3 Updated Dec 3, 2024

Awesome-LLM: a curated list of Large Language Model

23,072 1,922 Updated May 5, 2025

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 71 4 Updated Jan 24, 2025

SOTA Open Source TTS

Python 20,965 1,676 Updated Apr 12, 2025

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,267 150 Updated Feb 18, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 11,702 1,646 Updated May 5, 2025

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,218 100 Updated Mar 4, 2025

An Open-Sourced LLM-empowered Foundation TTS System

Python 695 56 Updated Apr 15, 2025
Python 86 5 Updated Nov 22, 2024

LLM101n: Let's build a Storyteller

33,368 1,822 Updated Aug 1, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 268 11 Updated Apr 15, 2025

Encode and decode audio samples to/from compressed latent representations!

Python 203 12 Updated Feb 24, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 158 13 Updated Sep 19, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,162 677 Updated May 6, 2025

The open source code for SimpleSpeech series

Python 138 8 Updated Oct 8, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,907 197 Updated Apr 19, 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 130 13 Updated Jan 1, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,267 169 Updated Mar 28, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 78 10 Updated Mar 12, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 206 13 Updated Apr 20, 2025

Audio Large Language Models

Python 514 30 Updated Mar 9, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,717 129 Updated Apr 21, 2025

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,378 58 Updated Apr 28, 2025

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

33 2 Updated Oct 28, 2024
Next
0