ViT
This is a collection of our NAS and Vision Transformer work.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
EVA Series: Visual Representation Fantasies from BAAI
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Code for CRATE (Coding RAte reduction TransformEr).
Associating Objects with Transformers for Video Object Segmentation
[ECCVW 2022] The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Efficient vision foundation models for high-resolution generation and perception.
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
PyTorch code and models for V-JEPA self-supervised learning from video.
VMamba: Visual State Space Models,code is based on mamba