Stars
[CVPR 2025 Highlight] Official implementation of "Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters"
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Falcon: A Remote Sensing Vision-Language Foundation Model
(CVPR2025) Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
Paper list about hyperbolic embedding, hyperbolic models,hyperbolic applications
beneroth13 / dinov2
Forked from facebookresearch/dinov2PyTorch code and models for the DINOv2 self-supervised learning method, own data set and own adapted training.
Official repo for "Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling"
SSL4EO-S12: a large-scale dataset for self-supervised learning in Earth observation
Official Implementation of LADS (Latent Augmentation using Domain descriptionS)
(ACM MM2024)HICEScore: A Hierarchical Metric for Image Captioning Evaluation
An open source implementation of CLIP.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiase…
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
A curated list of prompt-based paper in computer vision and vision-language learning.
A Python Library for Deep Probabilistic Models
Language Models Can See: Plugging Visual Controls in Text Generation