Student - reference attentions seem to be flipped twice in `gpt149.py` #4

IonMich · 2024-10-27T02:25:41Z

In the definition of CustomAttention there is an attribute named isRef that probably stands for is_reference. This variable is (correctly) set to True for the attention imported from module_ref, and False for the attention from module.cpp. However in the four attention methods of gpt149.py, when self.isRef==True we run the student attention, while self.isRef==False runs the reference attention. ends up giving the correct order because later in the same file there the following statements:

testTemplate(attentionModuleStudent.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION")
...
testTemplate(attentionModuleReference.myUnfusedAttention, params, "STUDENT - NAIVE ATTENTION")

But this is quite confusing. To make the code easier to follow, one could make the following changes:

import module_ref as mr, and module.cpp as ms. All references to ms and mr should be flipped accordingly in gpt149.py.
The conditionals in parts 1 to 4 should be changed from is if self.isRef: to if not self.isRef:.
The testTemplate should be called with consistent args, e.g. testTemplate(attentionModuleReference.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION"), etc.

Also a couple other minor changes:

--model command-line argument should have another option shakes256 (not listed). Option kayvon is not available,
incorrect help text for command-line option -N,
The image of the FlashAttention algorithm that appears in Part 4 of the README, should say K_0,...,K_Tc in line 8. Similarly in line 9. Matrix PV is initialized in the code, but it is not referenced in the image. A note could be added that the vector l needs to be reset for each (b, h) pair, before li is loaded.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Student - reference attentions seem to be flipped twice in `gpt149.py` #4

Student - reference attentions seem to be flipped twice in `gpt149.py` #4

Student - reference attentions seem to be flipped twice in gpt149.py #4

Student - reference attentions seem to be flipped twice in gpt149.py #4

Comments

Student - reference attentions seem to be flipped twice in `gpt149.py` #4

Student - reference attentions seem to be flipped twice in `gpt149.py` #4