Stars
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
MSGQ: A lock free single producer multi consumer message queue
Training pipeline for end-to-end self-driving with Comma AI's Openpilot. WIP
(ICCV 2025) GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
[CVPR 2024] SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
[ICCV 2023] OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
ROS package to find a rigid-body transformation between a LiDAR and a camera for "LiDAR-Camera Calibration using 3D-3D Point correspondences"
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
An open protocol enabling communication and interoperability between opaque agentic applications.
A TTS model capable of generating ultra-realistic dialogue in one pass.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Unofficial PyTorch implementation of Google AI's VoiceFilter system
It allows you to download a website from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
kaldi-asr/kaldi is the official location of the Kaldi project.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Write any JavaScript with 6 Characters: []()!+
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Hiragana/Katakana Speed Reading Quiz in Command Line !! 😎
Build Conversational AI in minutes ⚡️