Lists (1)
Sort Name ascending (A-Z)
Stars
DLRover: An Automatic Distributed Deep Learning System
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
No fortress, purely open ground. OpenManus is Coming.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
R1-onevision, a visual language model capable of deep CoT reasoning.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Democratizing Reinforcement Learning for LLMs
Codebase for Iterative DPO Using Rule-based Rewards
[CVPR 2025] A unified framework for Scene Coordinate Regression-based visual localization
Witness the aha moment of VLM with less than $3.
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Fully open reproduction of DeepSeek-R1
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
Code for "MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training", Arxiv 2025.
Open source impl of **MV-DUSt3R+ Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds** from Meta Reality Labs. Project page https://mv-dust3rp.github.io/
CUDA accelerated rasterization of gaussian splatting
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
Application for camera and sensor data logging (iOS)
Algorithms and Publications on 3D Object Tracking
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Free to use online tool for labelling photos. https://makesense.ai
Open-source and strong foundation image recognition models.
Gaussian Splatting from VGGSfM and Mast3r, and their comparison
Code of the paper: 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model