Stars
MARS5 speech model (TTS) from CAMB.AI
An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.
Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2
Robust Speech Recognition via Large-Scale Weak Supervision
Faster Whisper transcription with CTranslate2
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
πΈ - A general purpose model trainer, as flexible as it gets
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, β¦
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
π» π€ A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech π
π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
MOS score prediction by fine-tuned wav2vec2.0 model
Objective metrics used in several text-to-speech (TTS) papers.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Official implementation of "Separate Anything You Describe"
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained β¦
This is the GitHub page for publicly available emotional speech data.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Stable diffusion for real-time music generation
A collection of resources and papers on Diffusion Models