8000 Student - reference attentions seem to be flipped twice in `gpt149.py` · Issue #4 · stanford-cs149/cs149gpt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Student - reference attentions seem to be flipped twice in gpt149.py #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
IonMich opened this issue Oct 27, 2024 · 0 comments
Open

Comments

@IonMich
Copy link
IonMich commented Oct 27, 2024

In the definition of CustomAttention there is an attribute named isRef that probably stands for is_reference. This variable is (correctly) set to True for the attention imported from module_ref, and False for the attention from module.cpp. However in the four attention methods of gpt149.py, when self.isRef==True we run the student attention, while self.isRef==False runs the reference attention. ends up giving the correct order because later in the same file there the following statements:

testTemplate(attentionModuleStudent.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION")
...
testTemplate(attentionModuleReference.myUnfusedAttention, params, "STUDENT - NAIVE ATTENTION")

But this is quite confusing. To make the code easier to follow, one could make the following changes:

  • import module_ref as mr, and module.cpp as ms. All references to ms and mr should be flipped accordingly in gpt149.py.
  • The conditionals in parts 1 to 4 should be changed from is if self.isRef: to if not self.isRef:.
  • The testTemplate should be called with consistent args, e.g. testTemplate(attentionModuleReference.myUnfusedAttention, params, "REFERENCE - NAIVE ATTENTION"), etc.

Also a couple other minor changes:

  • --model command-line argument should have another option shakes256 (not listed). Option kayvon is not available,
  • incorrect help text for command-line option -N,
  • The image of the FlashAttention algorithm that appears in Part 4 of the README, should say K_0,...,K_Tc in line 8. Similarly in line 9. Matrix PV is initialized in the code, but it is not referenced in the image. A note could be added that the vector l needs to be reset for each (b, h) pair, before li is loaded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0