Stars
Task-Customized Mixture of Adapters for General Image Fusion (CVPR 2024)
Python scripts for the Segment Anythin 2 (SAM2) model in ONNX
BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models (IJCAI 2023) | The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 (ICMI 2023, Reproducibility A…
🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Pytorch Implementation of "SMITE: Segment Me In TimE" (ICLR 2025)
OBBDetection is an oriented object detection library, which is based on MMdetection.
[ECCV 2024] Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
[TNNLS 2025] TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Segment-Anything-2 (SAM 2) fine tune with COCO data
ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
The official implementation of the ICASSP'2023 paper Global-context aware generative protein design.
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
The official implementation of the CVPR'22 paper SimVP: Simpler Yet Better Video Prediction.
The official implementation of the CVPR'2022 paper Hyperspherical Consistency Regularization.
wsumel / CVPR2022-Paper-Code-Interpretation
Forked from extreme-assistant/CVPR2024-Paper-Code-Interpretationcvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理
[ICCV2021] Learning to Track Objects from Unlabeled Videos
The official implementation of the ACM MM'21 paper Co-learning: Learning from noisy labels with self-supervision.
Image augmentation library in Python for machine learning.
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
A webdemo that working for correcting fitness posture.