-
Notifications
You must be signed in to change notification settings - Fork 10
Incorporate Rotary positional embedding into LlamaAttention
#106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
separate src and dst tensors. Some operations are not yet general and require input and output tensors of specific shape.
Added python test for rope, it checks correctness of elements' rotation nntile vs numpy. Current version does not work with cuda yet.
RoPE operation. Introduced python test to check correctness of NNTile RopE versus llama RoPE.
RoPE operation. Introduced python test to check correctness of NNTile RopE versus llama RoPE.
Get rid of unused old primary version - rope3.
Work in progress.
RoPE backward has been embedded in llama attention layer Test for W_Q grad is working, test for W_K grad is NOT. Minor: little fix in rope tensor level, to ensure it will work both with 4 and 5 dimensional tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not leave commented code or dead code.
if(src.shape[0] != 2*sin.shape[0]) | ||
{ | ||
throw std::runtime_error("src.shape[0] != 2*sin.shape[0]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it is a good idea check some properties of input twice. Specifically, we check a leading dimension of input tile and sine tile. It duplicates the same checks in nntile/tensor/rope.cc
. This results in code duplication and confusion in debugging the wrong if-condition. Are tile operations part of public C++ API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nntile/tile/rope.cc
is not used by nntile/tensor/rope.cc
at all. Shape checks are the same indeed in both files, but tensor
version also checks basetile_shape
. Tile operations are not open via python interface.
params: LlamaAttentionTestParams, bias: bool): | ||
torch_layer, nntile_layer, _, _ = generate_inputs(dtype, params, bias) | ||
torch_layer, nntile_layer, _, _, _ = \ | ||
generate_inputs(dtype, params, bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI You can use matching to variable length dummy arguments instead of enumerating dummy variable per element of result tuple as follows.
torch_layer, nntile_layer, *_ = \
generate_inputs(dtype, params, bias)
Please check you indentation configuration. It seems that indent width (8 spaces) is twice standard indent (4 spaces).
LlamaAttention
Added kernel test for rope and rope_backward, so far for CPU only
Incorporate Rotary Positional Embedding into LlamaAttention