Stars
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
[AAAI 2025]This repo contains evaluation code for the paper “UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios”
Official implementation of the paper "Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance" (WACV 2025)
[IROS 2024] BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis
[IEEE JSTARS 2024] CV-Cities: Advancing Cross-view Geo-localization in Global Cities
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
Soft Masked Mamba Diffusion Model for CT to MRI Conversion (Official PyTorch Implementation)
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
[CVPR 2024] BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers
[IEEE T-PAMI 2023] Awesome BEV perception research and cookbook for all level audience in autonomous diriving
[TIP 2024] Pytorch implementation of the paper 'CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity'
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images. http://panoptic-bev.cs.uni-freiburg.de
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Official PyTorch code for 'Translating Images Into Maps' ICRA 2022 (Outstanding Paper Award)
Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021 Oral)
Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Official code for StegoGAN: Leveraging Steganography for Non-bijective Image-to-Image Translation
Source Code for View Consistent Purification for Accurate Cross-View Localization, ICCV 2023
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model