Highlights
- Pro
Stars
[CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
[ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
We write your reusable computer vision tools. 💜
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024
Official repository of Agent Attention (ECCV2024)
Repository of "Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning" (NeurIPS 2023 Spotlight)
Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
[NeurIPS 2023] Rank-DETR for High Quality Object Detection
[T-PAMI'25] PyTorch Implementation of GDRNPP, winner (most of the awards) of the BOP Challenge 2022 at ECCV'22
[ICCV 2023] Adaptive Rotated Convolution for Rotated Object Detection
✨✨Latest Advances on Multimodal Large Language Models
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.
(TPAMI 2024) A Survey on Open Vocabulary Learning
Emu Series: Generative Multimodal Models from BAAI
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Pytorch implementation of DAPrompt: https://arxiv.org/abs/2202.06687
Official implementation of A Mixture of Surprises for Unsupervised Reinforcement Learning
[arXiv] Cross-Modal Adapter for Text-Video Retrieval