8000 Jiaxin-Ye (Jiaxin Ye) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Jiaxin-Ye's full-sized avatar
💭
Keep Improving
💭
Keep Improving

Block or report Jiaxin-Ye

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A curated list of Video to Audio Generation

41 2 Updated Apr 15, 2025

Huggingface Implementation of AV-HuBERT on the MuAViC Dataset

Python 8 Updated Mar 6, 2025
Python 351 30 Updated May 6, 2025

Famous Vision Language Models and Their Architectures

Markdown 834 42 Updated Feb 24, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,352 1,407 Updated May 16, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,615 226 Updated May 19, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,059 166 Updated May 14, 2025

ICML 2024 "From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation"

6 Updated Oct 13, 2024
Python 11 3 Updated May 19, 2025
Python 24 Updated Apr 7, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,482 48 Updated May 15, 2025

[CVPR 2025] The First Investigation of CoT Reasoning in Image Generation

Python 677 20 Updated May 7, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 486 31 Updated May 1, 2025

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 76 4 Updated Nov 9, 2024

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Python 129 2 Updated Apr 14, 2025
C 64 10 Updated Sep 13, 2022

Official Pytorch Implementation of MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation (ICLR 2025)

Python 4 Updated Feb 11, 2025

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

Jupyter Notebook 426 52 Updated Jan 10, 2025

Code, Dataset, Samples for the NeurIPS paper “ Tell What You Hear From What You See -- Video to Audio Generation Through Text”

Python 7 Updated Feb 24, 2025

Official Code for "Rethinking Diffusion Model in High Dimension"

HTML 14 Updated May 10, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,049 86 Updated Apr 3, 2025

A set of functions for supervised feature learning/classification of mental states from EEG based on "EEG images" idea.

Python 731 222 Updated Jul 2, 2020

Awesome Gesture Generation

196 7 Updated Jan 25, 2025

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 413 53 Updated Aug 27, 2024

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

576 15 Updated May 9, 2025

Pytorch port of Google Research's VGGish model used for extracting audio features.

Python 388 71 Updated Nov 3, 2021

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

137 2 Updated Jun 13, 2024

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 604 23 Updated May 19, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 585 20 Updated Mar 18, 2025

The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.

Python 217 20 Updated May 15, 2025
Next
0