Stars
Official repository of the article "M3DMap: Object-aware Multimodal 3D Mapping for Dynamic Environments"
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official code of the paper LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms accepted at MICCAI 2023.
[NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)
[CVPR 2025] Official PyTorch implementation of MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views
[ECCV 2024] This is the official implementation of HRMapNet, maintaining and utilizing a low-cost global rasterized map to enhance online vectorized map perception.
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.
[CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation
[NeurIPS 2024] The official implementation of HairFastGAN. A framework for virtual hairstyle fitting.
CVPR2023 : VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
A curated list of awesome transformer models.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
1 million FPS multi-agent driving simulator
Сutting-edge Python library designed for generative image augmentation!
Hybrid ML + physics model of the Earth's atmosphere