8000 donghaiyw / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View donghaiyw's full-sized avatar

Block or report donghaiyw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 39 4 Updated Jun 28, 2025
Python 172 23 Updated Dec 5, 2024

Repository of AudioX

Python 1,020 106 Updated Apr 30, 2025

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

C 842 75 Updated Jul 3, 2025

ACE-Step: A Step Towards Music Generation Foundation Model

Python 2,612 263 Updated Jun 27, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 620 32 Updated Nov 19, 2024

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,283 1,421 Updated Jun 28, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,907 261 Updated Jun 21, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,342 193 Updated Jun 17, 2025

Official repo for CFG-Zero*

Python 618 21 Updated May 2, 2025
Python 265 31 Updated Apr 11, 2025

Train your AI self, amplify you, bridge the world

Python 13,023 943 Updated Jul 3, 2025

[ICML 2025] Gaussian Mixture Flow Matching Models (GMFlow)

Python 106 3 Updated May 28, 2025

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)

Python 227 24 Updated Jul 31, 2024

[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"

Python 72 7 Updated Jan 17, 2025

[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Python 173 6 Updated Nov 22, 2024
Python 5,591 418 Updated May 11, 2025

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 37,518 5,386 Updated Jun 11, 2025

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

Python 282 34 Updated Jan 1, 2025

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 6,805 758 Updated Mar 5, 2025

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 1,929 206 Updated May 17, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 5,167 581 Updated Jun 4, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,106 80 Updated Mar 27, 2025

Cantonese Grapheme-to-Phoneme Converter based on GitYCC/g2pW

Python 13 3 Updated Dec 10, 2024

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,760 1,438 Updated Jun 30, 2025

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Python 255 12 Updated Mar 31, 2025

Text to speech alignment using CTC forced alignment

Python 307 59 Updated Mar 24, 2025

Repository for training models for music source separation.

Python 805 110 Updated Jun 20, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 12,512 1,803 Updated Jul 2, 2025
Next
0