-
KAIST
- Daejeon, Korea
- signofthefour.github.io
- https://signofthefour.github.io
Highlights
- Pro
Stars
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization (IROS 2024)
Predictive Coding for Decision Transformer (IROS 2024)
[CVPR'25] SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Official implementation of Inductive Moment Matching
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
A TTS model that makes a speaker speak new languages
VoiceLDM: Text-to-Speech with Environmental Context
Resumes generated using the GitHub informations
Architecture decision record (ADR) examples for software planning, IT leadership, and template documentation
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis