8000 GitHub - zaydzuhri/softpick-attention: Implementations of attention with the softpick function, naive and FlashAttention-2
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

zaydzuhri/softpick-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

softpick-attention

From the paper: Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

In this repository are implementations of attention with the softpick function. Both a naive implementation and a FlashAttention-2 kernel modification are included. We do NOT recommend using the triton kernels here directly, since they are taken from the flash-linear-attention repository and are untested outside of that context. The code here is meant as a reference for those who want to implement softpick in their own kernels.

For the training code that we used in the paper, see: https://github.com/zaydzuhri/flame/tree/softpick-attention

All trained models and checkpoints are on my huggingface: https://huggingface.co/zaydzuhri

About

Implementations of attention with the softpick function, naive and FlashAttention-2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0