This repo contains the implementation for ToST (Token Statistics Transformer), a linear-time architecture derived via algorithmic unrolling.
- [02/05/25] Code for ToST on vision, language tasks is released!
- [01/22/25] Accepted to ICLR 2025!
We have organized the implementation for vision and language tasks into the respective tost_vision
and tost_lang
directories. Please follow the instuctions within them. We recommend useing separate environments for these two implementations.
If you find this project helpful for your research and applications, please consider cite our work:
@arti
5BF8
cle{wu2024token,
title={Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction},
author={Wu, Ziyang and Ding, Tianjiao and Lu, Yifu and Pai, Druv and Zhang, Jingyuan and Wang, Weida and Yu, Yaodong and Ma, Yi and Haeffele, Benjamin D},
journal={arXiv preprint arXiv:2412.17810},
year={2024}
}
- XCiT: Cross-Covariance Image Transformer: the code for vision is largely based on this repo.
- nanogpt: the code for language is mostly based on this repo.