8000 Incorporate Rotary positional embedding into `LlamaAttention` by Muxas · Pull Request #106 · nntile/nntile · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Incorporate Rotary positional embedding into LlamaAttention #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jul 20, 2024
Merged

Conversation

Muxas
Copy link
Member
@Muxas Muxas commented Jul 17, 2024

Incorporate Rotary Positional Embedding into LlamaAttention

glkarpov and others added 22 commits June 25, 2024 10:24
separate src and dst tensors. Some operations are not yet general and
require input and output tensors of specific shape.
Added python test for rope, it checks correctness of elements' rotation nntile vs numpy.
Current version does not work with cuda yet.
RoPE operation.
Introduced python test to check correctness of NNTile RopE versus llama
RoPE.
RoPE operation.
Introduced python test to check correctness of NNTile RopE versus llama
RoPE.
Get rid of unused old primary version - rope3.
RoPE backward has been embedded in llama attention layer
Test for W_Q grad is working, test for W_K grad is NOT.

Minor: little fix in rope tensor level, to ensure it will work both with 4
and 5 dimensional tensors.
@Muxas Muxas requested a review from daskol July 17, 2024 21:44
Copy link
Member
@daskol daskol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not leave commented code or dead code.

if(src.shape[0] != 2*sin.shape[0])
{
throw std::runtime_error("src.shape[0] != 2*sin.shape[0]");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it is a good idea check some properties of input twice. Specifically, we check a leading dimension of input tile and sine tile. It duplicates the same checks in nntile/tensor/rope.cc. This results in code duplication and confusion in debugging the wrong if-condition. Are tile operations part of public C++ API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nntile/tile/rope.cc is not used by nntile/tensor/rope.cc at all. Shape checks are the same indeed in both files, but tensor version also checks basetile_shape. Tile operations are not open via python interface.

params: LlamaAttentionTestParams, bias: bool):
torch_layer, nntile_layer, _, _ = generate_inputs(dtype, params, bias)
torch_layer, nntile_layer, _, _, _ = \
generate_inputs(dtype, params, bias)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI You can use matching to variable length dummy arguments instead of enumerating dummy variable per element of result tuple as follows.

torch_layer, nntile_layer, *_ = \
    generate_inputs(dtype, params, bias)

Please check you indentation configuration. It seems that indent width (8 spaces) is twice standard indent (4 spaces).

@daskol daskol changed the title Gkarpov/rope Incorporate Rotary p 8000 ositional embedding into LlamaAttention Jul 18, 2024
@daskol daskol added enhancement New feature or request api Public programming interfaces labels Jul 18, 2024
glkarpov and others added 2 commits July 18, 2024 17:37
Added kernel test for rope and rope_backward, so far for CPU only
@Muxas Muxas merged commit fee8d38 into main Jul 20, 2024
5 checks passed
@Muxas Muxas deleted the gkarpov/rope branch July 20, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Public programming interfaces enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0