Lists (11)
Sort Name ascending (A-Z)
Stars
Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)
Large datasets for conversational AI
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
FlashMLA: Efficient MLA decoding kernels
Pretraining code for a large-scale depth-recurrent language model
NickLucche / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A generative world for general-purpose robotics & embodied AI learning.
Python tool for converting files and office documents to Markdown.
first base model for full-duplex conversational audio
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Vision infrastructure to turn complex documents into RAG/LLM-ready data
catie-aq / flash-attention
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
LLM101n: Let's build a Storyteller
Gemma 2B with 10M context length using Infini-attention.