This repository is a quick and dirty modification of Apple/SigmoidAttention to compute the exponential function instead of sigmoid. That parent repo is in turn a modification of Dao-AILab/FlashAttention2.
See FlashSigmoid for documentation on how to use this repo.