Incorporate Rotary positional embedding into `LlamaAttention` #106

Muxas · 2024-07-17T21:25:12Z

Incorporate Rotary Positional Embedding into LlamaAttention

separate src and dst tensors. Some operations are not yet general and require input and output tensors of specific shape.

Added python test for rope, it checks correctness of elements' rotation nntile vs numpy. Current version does not work with cuda yet.

RoPE operation. Introduced python test to check correctness of NNTile RopE versus llama RoPE.

…rope

Get rid of unused old primary version - rope3.

Work in progress.

RoPE.

RoPE backward has been embedded in llama attention layer Test for W_Q grad is working, test for W_K grad is NOT. Minor: little fix in rope tensor level, to ensure it will work both with 4 and 5 dimensional tensors.

daskol

Please do not leave commented code or dead code.

include/nntile/kernel/rope.hh

include/nntile/kernel/rope_backward.hh

src/starpu/rope.cc

src/starpu/rope_backward.cc

daskol · 2024-07-18T09:29:12Z

src/tile/rope.cc

+    if(src.shape[0] != 2*sin.shape[0])
+    {
+        throw std::runtime_error("src.shape[0] != 2*sin.shape[0]");
+    }


I'm not sure that it is a good idea check some properties of input twice. Specifically, we check a leading dimension of input tile and sine tile. It duplicates the same checks in nntile/tensor/rope.cc. This results in code duplication and confusion in debugging the wrong if-condition. Are tile operations part of public C++ API?

nntile/tile/rope.cc is not used by nntile/tensor/rope.cc at all. Shape checks are the same indeed in both files, but tensor version also checks basetile_shape. Tile operations are not open via python interface.

daskol · 2024-07-18T09:34:14Z

wrappers/python/tests/layer/test_llama_attention.py

                            params: LlamaAttentionTestParams, bias: bool):
-        torch_layer, nntile_layer, _, _ = generate_inputs(dtype, params, bias)
+        torch_layer, nntile_layer, _, _, _ = \
+                generate_inputs(dtype, params, bias)


FYI You can use matching to variable length dummy arguments instead of enumerating dummy variable per element of result tuple as follows.

torch_layer, nntile_layer, *_ = \ generate_inputs(dtype, params, bias)

Please check you indentation configuration. It seems that indent width (8 spaces) is twice standard indent (4 spaces).

Added kernel test for rope and rope_backward, so far for CPU only

…rope

glkarpov and others added 22 commits June 25, 2024 10:24

First commit for the kernel and starpu levels of RoPE

4974afc

Introduced new version of RoPE according to up-to-date naming, with

9648d7c

separate src and dst tensors. Some operations are not yet general and require input and output tensors of specific shape.

Clean up, some work on tensor level.

d6c9473

Added python test for rope, it checks correctness of elements' rotation nntile vs numpy. Current version does not work with cuda yet.

Minor changes on kernel and tensor levels of NNTile RoPE to suit llama

365d9b6

RoPE operation. Introduced python test to check correctness of NNTile RopE versus llama RoPE.

First commit for the kernel and starpu levels of RoPE

a88405b

Minor changes on kernel and tensor levels of NNTile RoPE to suit llama

1465d5e

RoPE operation. Introduced python test to check correctness of NNTile RopE versus llama RoPE.

Merge branch 'gkarpov/rope' of github.com:nntile/nntile into gkarpov/…

b45dd4d

…rope

Added support of bf16 in RoPE.

bf95f37

Get rid of unused old primary version - rope3.

Add rotate functions for rope in LlamaAttention

8373893

Attempt to incorporate RoPE into llama attention layer.

11e7312

Work in progress.

Lots of rework; Big test does not work yet

6c9a7f4

Adopted new style of permutation of torch tensors to compare with nntile

6521953

RoPE.

Test seems to work. This is strange

9d945e4

Testing is now correctly working

e766b66

Major: Added RoPE backward operation,

3a1be5a

RoPE backward has been embedded in llama attention layer Test for W_Q grad is working, test for W_K grad is NOT. Minor: little fix in rope tensor level, to ensure it will work both with 4 and 5 dimensional tensors.

LlamaAttention now tests OK fully

de2e01f

Merge branch 'main' into gkarpov/rope

a2d4b49

Fix testing, that was reworked by a lot in different branch!

d2bf5b6

Update rtol for fp32_fast_tf32

e5d56ab

rm old rope layer and tests

e331250

fix linting

a80a759

Fix more lint

b5c62fe

Muxas requested a review from daskol July 17, 2024 21:44

Fix license

f8e8e49

daskol requested changes Jul 18, 2024

View reviewed changes

daskol reviewed Jul 18, 2024

View reviewed changes

daskol changed the title ~~Gkarpov/rope~~ Incorporate Rotary p 8000 ositional embedding into LlamaAttention Jul 18, 2024

daskol added enhancement New feature or request api Public programming interfaces labels Jul 18, 2024

Add kernel cuda.cu codes for rope (no include no test)

df242f4

glkarpov and others added 2 commits July 18, 2024 17:37

Added required headers for CUDA RoPE

60542d2

Added kernel test for rope and rope_backward, so far for CPU only

LlamaAttention test now works with CUDA code

6e15301

daskol approved these changes Jul 18, 2024

View reviewed changes

glkarpov and others added 8 commits July 20, 2024 11:19

Added CUDA test in kernel test for rope

de8ab20

Add attention mask with test

3e2e4d4

Added CUDA part in the kernel test for the backward RoPE operation

cbc25db

Merge branch 'gkarpov/rope' of github.com:nntile/nntile into gkarpov/…

e1fb76e

…rope

Fix src/kernel/rope_backward test

dc1d83b

Merge branch 'main' into gkarpov/rope

608f162

Fix locking onto Python 3.12

b97a0f8

Fix linting

68132af

Muxas merged commit fee8d38 into main Jul 20, 2024
5 checks passed

Muxas deleted the gkarpov/rope branch July 20, 2024 13:47

daskol mentioned this pull request Jul 20, 2024

Maintain typing compatibility with python_version<3.12 #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate Rotary positional embedding into `LlamaAttention` #106

Incorporate Rotary positional embedding into `LlamaAttention` #106

Incorporate Rotary positional embedding into LlamaAttention #106

Incorporate Rotary positional embedding into LlamaAttention #106

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Incorporate Rotary positional embedding into `LlamaAttention` #106

Incorporate Rotary positional embedding into `LlamaAttention` #106