Official Pytorch Implementation of MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation (ICLR 2025)

Python 4 Updated Feb 11, 2025

facebookresearch / Ego4d

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

Jupyter Notebook 426 52 Updated Jan 10, 2025

DragonLiu1995 / multimodal-llm-for-audio-gen

Code, Dataset, Samples for the NeurIPS paper “ Tell What You Hear From What You See -- Video to Audio Generation Through Text”

Python 7 Updated Feb 24, 2025

blairstar / NaturalDiffusion

Official Code for "Rethinking Diffusion Model in High Dimension"

HTML 14 Updated May 10, 2025

lsdefine / simple_GRPO

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,049 86 Updated Apr 3, 2025

pbashivan / EEGLearn

A set of functions for supervised feature learning/classification of mental states from EEG based on "EEG images" idea.

Python 731 222 Updated Jul 2, 2020

openhuman-ai / awesome-gesture_generation

Awesome Gesture Generation

196 7 Updated Jan 25, 2025

shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 413 53 Updated Aug 27, 2024

yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

576 15 Updated May 9, 2025

harritaylor / torchvggish

Pytorch port of Google Research's VGGish model used for extracting audio features.

Python 388 71 Updated Nov 3, 2021

line / LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

137 2 Updated Jun 13, 2024

ModalMinds / MM-EUREKA

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 604 23 Updated May 19, 2025

turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

Python 585 20 Updated Mar 18, 2025

xzf-thu / Audio-Reasoner

The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.

Python 217 20 Updated May 15, 2025

Jiaxin Ye Jiaxin-Ye

Lists (8)

Affective Computing 🤓

AIGC 🫨

Diffusion-based Method 🫡

FaceTTS 😊🎙️

Mamba🐍

Speech Generation 🎤

Talking Head Generation 🤖️

Toolkit 👍

Stars