Stars
Noise supression using deep filtering
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
Inference and training library for high-quality TTS models.
๐ Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. ๐ง๐ฅ๐ Advanced audio processing.
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The โฆ
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, Dโฆ
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
A multi-voice TTS system trained with an emphasis on quality
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Ukrainian TTS (text-to-speech) using ESPNET
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
๐ธ๐ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Industry leading face manipulation platform
A simple, high-quality voice conversion tool focused on ease of use and performance.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Clips AI is an open-source Python library that automatically converts long videos into clips.
๐ Text-Prompted Generative Audio Model
Automatically detect and skip intro/credit sequences in Jellyfin
Unsupervised detection of opening / closing credits, recaps, and previews in video files ๐ฅ๐ฟ๐ฌ
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset