980202006

980202006

32 followers · 438 following

Starred repositories

ydqmkkx / ShallowFlowMatching-TTS

Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Python 32 4 Updated Jun 25, 2025

ttsds / ttsds

The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.

Python 45 2 Updated Jun 21, 2025

Shy-98 / MELLE

Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"

Python 19 2 Updated Jun 27, 2025

AIDC-AI / Awesome-Unified-Multimodal-Models

Awesome Unified Multimodal Models

355 10 Updated Jun 27, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 2,019 146 Updated Jun 27, 2025

zcli-charlie / ZIQI-Eval

ZIQI-Eval: A Music Evaluation Benchmark for Large Language Models

Python 12 1 Updated Jul 23, 2024

Pliploop / SLAP

Official repository for the paper - SLAP: Siamese Language-Audio Pretraining without negative samples for Music Understanding

Python 16 1 Updated Jun 21, 2025

ilaria-manco / muscall

Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)

Python 118 11 Updated Dec 5, 2024

LLMBook-zh / LLMBook-zh.github.io

《大语言模型》作者：赵鑫，李军毅，周昆，唐天一，文继荣

Python 3,732 274 Updated Mar 31, 2025

magenta / magenta-realtime

Python 519 57 Updated Jun 25, 2025

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 181 13 Updated Jun 26, 2025

MeiGen-AI / MultiTalk

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 821 77 Updated Jun 25, 2025

MattShannon / mcd

Mel cepstral distortion (MCD) computations in python.

Python 224 35 Updated Jun 13, 2017

huutuongtu / Lightvoc

LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM

Jupyter Notebook 17 2 Updated May 17, 2024

lucadellalib / focalcodec

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

Python 92 11 Updated Feb 12, 2025

kaistmm / fregrad

Python 33 4 Updated May 13, 2024

KdaiP / JiOu-LLM

JiOu-LLM: 基于llama2的奇偶数判别模型

Python 5 1 Updated Mar 11, 2024

yangjackie / Topics-on-diffusion-generative-models

TeX 26 1 Updated Apr 20, 2025

Cypress-Yang / SongBloom

Python 55 5 Updated Jun 22, 2025

fluxions-ai / vui

Python 588 57 Updated Jun 25, 2025

zihaod / MusiLingo

Python 47 4 Updated Aug 27, 2024

wuzhiyue111 / MLLM-paper-reading

MutiModel paper reading (Visual, Audio)

11 Updated Jun 22, 2025

CNChTu / FCBE

Python 3 Updated Feb 9, 2025

EdisonLeeeee / Awesome-Masked-Autoencoders

A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).

836 53 Updated Jul 10, 2024

Diaoxiaozhang / Ximalaya-Downloader

喜马拉雅专辑音频一键下载工具

JavaScript 1,145 155 Updated Feb 15, 2025

qiuqiangkong / materials_for_students

13 Updated Jan 16, 2024

haidog-yaqub / MeanFlow

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 496 30 Updated Jun 14, 2025

nicolaus625 / CMI-bench

Python 12 Updated Jun 24, 2025

kyutai-labs / delayed-streams-modeling

Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly