-
X-Labs AI
- https://t.me/varfolomeefff
Stars
Open Source framework for voice and multimodal conversational AI
Все необходимые материалы для "Лучшего курса по Питону"
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Generative Models by Stability AI
🖼 A collection of high-quality anime faces.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.
VoiceBench: Benchmarking LLM-Based Voice Assistants
Foundational Model for Speech Recognition Tasks
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound
Open TTS models, built for streaming on the edge
Atomic CSS toolkit with Sass and ergonomics for creating styles of any complexity
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Pytorch implementation of automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture (hFT-Transformer).
[NeurIPS 2024] Image Understanding Makes for A Good Tokenizer for Image Generation
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open