🚩
Focusing
Stay hungry, Stay foolish
-
BUPT->Tsinghua->Tencent
- Shenzhen, China
- https://zyzisyz.github.io/
Stars
speech
7 repositories
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Audio Codec Speech processing Universal PERformance Benchmark
Silero VAD: pre-trained enterprise-grade Voice Activity Detector