Transformer
Implementation of Slot Attention from GoogleAI
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Fast and memory-efficient exact attention
Taming Transformers for High-Resolution Image Synthesis
Official PyTorch implementation of SegFormer
[ECCV 2022] EdgeViT: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Codes for "RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer"
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
This is a collection of our NAS and Vision Transformer work.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Code for CRATE (Coding RAte reduction TransformEr).
Associating Objects with Transformers for Video Object Segmentation
[ECCVW 2022] The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers. (ICCV 2021 Oral)
Flash Attention in ~100 lines of CUDA (forward pass only)
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
FlashMLA: Efficient MLA decoding kernels