Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

This repo contains the implementation for ToST (Token Statistics Transformer), a linear-time architecture derived via algorithmic unrolling.

Updates

[02/05/25] Code for ToST on vision, language tasks is released!
[01/22/25] Accepted to ICLR 2025!

Usage

We have organized the implementation for vision and language tasks into the respective tost_vision and tost_lang directories. Please follow the instuctions within them. We recommend useing separate environments for these two implementations.

Citation

If you find this project helpful for your research and applications, please consider cite our work:

@arti
5BF8
cle{wu2024token,
  title={Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction},
  author={Wu, Ziyang and Ding, Tianjiao and Lu, Yifu and Pai, Druv and Zhang, Jingyuan and Wang, Weida and Yu, Yaodong and Ma, Yi and Haeffele, Benjamin D},
  journal={arXiv preprint arXiv:2412.17810},
  year={2024}
}

Acknowledgements

XCiT: Cross-Covariance Image Transformer: the code for vision is largely based on this repo.
nanogpt: the code for language is mostly based on this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
tost_lang		tost_lang
tost_vision		tost_vision
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Updates

Usage

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

RobinWu218/ToST

Folders and files

Latest commit

History

Repository files navigation

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Updates

Usage

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages