Stars
【CVPR 2025 Highlight】MonSter: Marry Monodepth to Stereo Unleashes Power
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
An open protocol enabling communication and interoperability between opaque agentic applications.
🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
A generative world for general-purpose robotics & embodied AI learning.
No fortress, purely open ground. OpenManus is Coming.
Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"
Cost-efficient and pluggable Infrastructure components for GenAI inference
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods…
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
R1-onevision, a visual language model capable of deep CoT reasoning.
Official Repo of paper "KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction". In the paper, we propose KnowCoder, the most powerful large language model so far for…
The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.
Deep learning-based task-oriented and unified multi-task semantic communications
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
[CVPR 2025] The First Investigation of CoT Reasoning in Image Generation
The code of our work "Golden Noise for Diffusion Models: A Learning Framework".
Minimal reproduction of DeepSeek R1-Zero
Fully open data curation for reasoning models
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
SGLang is a fast serving framework for large language models and vision language models.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
Fully open reproduction of DeepSeek-R1