Lists (1)
Sort Name ascending (A-Z)
Stars
PyTorch code and models for VJEPA2 self-supervised learning from video.
[CVPR2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Official repository for the paper "CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models"
The simplest, fastest repository for training/finetuning small-sized VLMs.
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
Falcon: A Remote Sensing Vision-Language Foundation Model
The python library for real-time communication
📄 A curated list of awesome .cursorrules files
Real-time pose estimation pipeline with 🤗 Transformers
Inference and fine-tuning examples for vision models from 🤗 Transformers
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding cap…
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Official code for "SRFormer: Permuted Self-Attention for Single Image Super-Resolution" (ICCV 2023) and SRFormerV2
Open and efficient video watermarking
(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data