My Kernels Collection I write kernels here Fused Self Attention (forward) check out the giude to understanding it: https://alexdremov.me/understanding-flash-attention-writing-the-algorithm-from-scratch-in-triton/ Streaming Attention (forward, backward, pt2 compliant) detailed description in the docstring TBD: guide to why it is cool (nudge me if interested)