-
University of Hong Kong
- Hong Kong
Stars
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"
A python parametric CAD scripting framework based on OCCT
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
[CVPR 25] G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
This repo contains the code for 1D tokenizer and generator
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
An extremely fast Python package and project manager, written in Rust.
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
Build resilient language agents as graphs.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance