allow hooking whisper model cross-attention #2963

jnnnnn · 2025-05-22T09:18:00Z

Candle's Whisper model is great.

To get good timestamps from the model, we need the hottest cross-attention cache values for each layer.

Not sure whether adding the pubs is good (it's definitely simpler) or whether it's better to add several layers of methods to reach through the TextDecoder → ResidualAttentionBlock → MultiHeadAttention → key,value Linear layers.

allow hooking whisper model cross-attention

fe1368b

jnnnnn force-pushed the whisper-hooked branch from cda25f7 to fe1368b Compare May 22, 2025 21:10

jnnnnn closed this May 22, 2025

jnnnnn deleted the whisper-hooked branch May 22, 2025 21:25

jnnnnn restored the whisper-hooked branch May 22, 2025 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow hooking whisper model cross-attention #2963

allow hooking whisper model cross-attention #2963

Uh oh!

Uh oh!

Uh oh!

allow hooking whisper model cross-attention #2963

allow hooking whisper model cross-attention #2963

Uh oh!

Conversation

Uh oh!

Uh oh!