Student - reference attentions seem to be flipped twice in `gpt149.py` · Issue #4 · stanford-cs149/cs149gpt · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the definition of CustomAttention there is an attribute named isRef that probably stands for is_reference. This variable is (correctly) set to True for the attention imported from module_ref, and False for the attention from module.cpp. However in the four attention methods of gpt149.py, when self.isRef==True we run the student attention, while self.isRef==False runs the reference attention. ends up giving the correct order because later in the same file there the following statements:
But this is quite confusing. To make the code easier to follow, one could make the following changes:
import module_ref as mr, and module.cpp as ms. All references to ms and mr should be flipped accordingly in gpt149.py.
The conditionals in parts 1 to 4 should be changed from is if self.isRef: to if not self.isRef:.
The testTemplate should be called with consistent args, e.g. testTemplate(attentionModuleReference.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION"), etc.
Also a couple other minor changes:
--model command-line argument should have another option shakes256 (not listed). Option kayvon is not available,
incorrect help text for command-line option -N,
The image of the FlashAttention algorithm that appears in Part 4 of the README, should say K_0,...,K_Tc in line 8. Similarly in line 9. Matrix PV is initialized in the code, but it is not referenced in the image. A note could be added that the vector l needs to be reset for each (b, h) pair, before li is loaded.
The text was updated successfully, but these errors were encountered:
In the definition of CustomAttention there is an attribute named
isRef
that probably stands foris_reference
. This variable is (correctly) set toTrue
for the attention imported frommodule_ref
, andFalse
for the attention frommodule.cpp
. However in the four attention methods ofgpt149.py
, whenself.isRef==True
we run the student attention, whileself.isRef==False
runs the reference attention. ends up giving the correct order because later in the same file there the following statements:But this is quite confusing. To make the code easier to follow, one could make the following changes:
module_ref
asmr
, andmodule.cpp
asms
. All references toms
andmr
should be flipped accordingly ingpt149.py
.if self.isRef:
toif not self.isRef:
.testTemplate
should be called with consistent args, e.g.testTemplate(attentionModuleReference.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION"
), etc.Also a couple other minor changes:
--model
command-line argument should have another optionshakes256
(not listed). Optionkayvon
is not available,-N
,K_0,...,K_Tc
in line 8. Similarly in line 9. MatrixPV
is initialized in the code, but it is not referenced in the image. A note could be added that the vectorl
needs to be reset for each(b, h)
pair, beforeli
is loaded.The text was updated successfully, but these errors were encountered: