Lists (7)
Sort Name ascending (A-Z)
Starred repositories
🔥🔥First-ever hour scale video understanding models
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
UniMD: Towards Unifying Moment retrieval and temporal action Detection
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[NeurIPS 2022 Spotlight] VideoMAE for Action Detection
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
BasicTAD: an Astounding RGB-Only Baselinefor Temporal Action Detection
[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context Refinement
[CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
[TIP 2022] End-to-end Temporal Action Detection with Transformer 8000
[CVPR 2022] An Empirical Study of End-to-end Temporal Action Detection
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
The simplest, fastest repository for training/finetuning small-sized VLMs.
Python scripts for performing optical flow estimation using the RAFT model in ONNX
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Official PyTorch implementation for TCSVT 23 "Detect Any Shadow: Segment Anything for Video Shadow Detection"
Instance Shadow Detection with A Single-Stage Detector [SSIS & SSISv2] (CVPR 2021 Oral & TPAMI 2022)