attn kernel 读取 kv cache 时,prefill 用了 LinearIter,decode 用了 BlockIter,这种设计是出于什么考虑呢? #2518
Time-Limit
started this conversation in
General
Replies: 1 comment
-
Attention 用 BlockIter 在短 context 上比 LinearIter 上稍微好一点点,但长 context 要差不少 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
attn kernel 读取 kv cache 时,prefill 用了 LinearIter,decode 用了 BlockIter,这种设计是出于什么考虑呢?
Beta Was this translation helpful? Give feedback.
All reactions