Stars
A Pytorch implement of medical image segmentation U-shape architecture benchmarks
[MICCAI 2024] HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
Merging Context Clustering with Visual State Space Models for Medical Image Segmentation
paper list, dataset, and tools for radiology report generation
Unofficial Implementation of Animate Anyone
Character Animation (AnimateAnyone, Face Reenactment)
Open-Sora: Democratizing Efficient Video Production for All
[arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices
Official repository of Polarity-aware Linear Attention for Vision Transformers (ICLR 2025)
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Official PyTorch implementation of the paper "RWKV-UNet: Improving UNet with Long-Range Cooperation for Effective Medical Image Segmentation"
This repository collects papers on VLLM applications. We will update new papers irregularly.
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository
Official Repository of 'Multi-Scale Temporal Mamba for Efficient Temporal Action Detection'
GLCONet: Learning Multisource Perception Representation for Camouflaged Object Detection (2024, TNNLS)
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection (2024, ECCV)
HINT: High-quality INpainting Transformer with Enhanced Attention and Mask-aware Encoding
The code resource for Depth-aware Endoscopic Video Inpainting
We present a novel few-shot generative residual image inpainting method that produces high-quality inpainting results.
(CVPR 2024) Official code for paper "Towards Language-Driven Video Inpainting via Multimodal Large Language Models"
A paper list of some recent Mamba-based CV works.
Official Pytorch implementation of " Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation? "