Lists (1)
Sort Name ascending (A-Z)
Stars
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal is…
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Fast and memory-efficient exact attention
Digital Signal Processing - Theory and Computational Examples
Production First and Production Ready End-to-End Speech Recognition Toolkit
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Instant voice cloning by MIT and MyShell. Audio foundation model.
컴퓨터 전공생을 위한 ‘AI 기반 맞춤형’ 모의 면접 어플
A latent text-to-image diffusion model
Deep Speaker: an End-to-End Neural Speaker Embedding System.
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Faster Whisper transcription with CTranslate2
한국어 음성인식 STT API 리스트. 각 성능 벤치마크.
Official Code for DragGAN (SIGGRAPH 2023)
머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)
final-project-level2-cv-04 created by GitHub Classroom
level2_objectdetection_cv-level2-cv-04 created by GitHub Classroom