Stars
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A lightweight end-to-end text-to-speech model
The official implementation of CATT Arabic diacritization models.
A cross-platform inference engine for neural TTS models.
[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)
A semi-automated way to generate annotations for football matches
jpgallegoar / Spanish-F5
Forked from SWivid/F5-TTSOfficial code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Based on Official code of "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". This work uses phoneme-level forced alignment to stabilize the generation process.
A comprehensive collection of Quran resources
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
Voice Assistant for FM-Clinic: A multilingual AI-powered voice assistant for booking doctor appointments, leveraging advanced speech-to-text, text-to-speech, and large language models for seamless,…
OpenMMLab Detection Toolbox and Benchmark
Ancient Egyptian hieroglyphic Unicode in OpenType
A course offered by Justin Johnson from the University of Michigan
This is the repo of our graduation project. All code and notebooks should be added here
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Voice activity detector (VAD) for the browser with a simple API
An autoregressive character-level language model for making more things
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Notes on Deep Learning textbook by Ian Goodfellow, Yoshua Bengio and Aaron Courville
CoTracker is a model for tracking any point (pixel) on a video.
Text to speech alignment using CTC forced alignment
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
DSPy: The framework for programming—not prompting—language models
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Foundational Models for State-of-the-Art Speech and Text Translation
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.