Add Llama model #111

amkatrutsa · 2024-07-20T19:44:10Z

No description provided.

Muxas · 2024-07-21T10:05:30Z

wrappers/python/examples/llama_training.py

This file seems to be still in progress. What are the next steps to be done?

wrappers/python/nntile/layer/__init__.py

Muxas · 2024-07-21T10:15:30Z

wrappers/python/nntile/layer/embedding.py

                redux=0)
        self.x.wont_use()
        self.y.grad.wont_use()
        self.w.grad.wont_use()
+
+    def to_torch(self):


Added #112 to add tests for new functions

Muxas · 2024-07-21T10:59:55Z

wrappers/python/nntile/layer/prod.py

+        self.y.grad.wont_use()
+        self.res.grad.wont_use()
+
+    def unregister(self):


This unregister is unnecessary. Tensor self.res is neither a parameter of a layer nor a temporary tensor, but an activation. Activations are cleared out by base model, not by layer itself.

Since this unregister is just copied from add layer, it shall be removed from add also.

Muxas · 2024-07-21T11:07:43Z

wrappers/python/nntile/layer/rms_norm.py

@@ -171,3 +175,30 @@ def backward_async(self):
        self.tmp_y_value.invalidate_submit()
        # dX can offloade from GPU
        self.x.grad.wont_use()
+
+    @staticmethod
+    def from_torch(torch_rmsnorm, x: TensorMoments,


Add #113 to test new functionality

Muxas · 2024-07-21T11:25:31Z

wrappers/python/tests/layer/test_prod.py

If this file tests LlamaMLP, then it shall be deleted because test_llama_mlp.py is already implemented.

Muxas · 2024-07-21T11:31:44Z

wrappers/python/tests/model/test_llama.py

+    return torch_model, nntile_model, x_torch, pos_ids, y_grad_torch
+
+
+@pytest.mark.parametrize("params", TEST_PARAMS)


Added #114 to simplify testing parameters

Muxas · 2024-07-21T11:35:01Z

wrappers/python/tests/model/test_llama_decoder.py

+import nntile
+from nntile.model.llama_config import LlamaConfigNNTile
+from nntile.model.llama_decoder import LlamaDecoder as LlamaDecoder_nntile
+# from nntile.model.llama import LlamaConfigNNTile


rm this line

Muxas · 2024-07-21T11:36:55Z

wrappers/python/tests/model/test_llama_decoder.py

+    return torch_layer, nntile_layer, x_torch, y_grad_torch, pos_ids, mask
+
+
+@pytest.mark.parametrize("params", TEST_PARAMS)


#115 is to simplify parameters of this test

Muxas · 2024-07-21T11:38:09Z

wrappers/python/tests/model/test_llama_mlp.py

+    return torch_layer, nntile_layer, x_torch, y_grad_torch
+
+
+@pytest.mark.parametrize("params", TEST_PARAMS)


#116 is to simplify parameters of this test

daskol · 2024-07-21T14:24:07Z

wrappers/python/nntile/model/llama_config.py

+from typing import Dict
+
+
+class LlamaConfigNNTile(Dict):


Use dict in favor of dataclass does not make a lot of sense if all attributes are known a priori.

from dataclasses import asdict, dataclass @dataclass class Config: ... config = Config(...) value = asdict(config)['key']['subkey']['subsubkey']

daskol · 2024-07-21T14:28:18Z

wrappers/python/nntile/layer/llama_attention.py

    ):  # -> Self: does not work with Python 3.10
-        layer, _ = __class__.generate_simple(
+        layer, next_tag = __class__.generate_simple(


This is exactly use case of classmethod.

class Attention: @classmethod def from_torch(cls, ...): layer, next_tag = cls.generate_simple(...) ...

daskol · 2024-07-21T14:28:18Z

wrappers/python/nntile/layer/llama_attention.py

    ):  # -> Self: does not work with Python 3.10
-        layer, _ = __class__.generate_simple(
+        layer, next_tag = __class__.generate_simple(


This is exactly use case of classmethod.

class Attention: @classmethod def from_torch(cls, ...): layer, next_tag = cls.generate_simple(...) ...

wrappers/python/nntile/layer/llama_attention.py

daskol · 2024-07-21T14:29:38Z

wrappers/python/nntile/layer/llama_attention.py

    ):  # -> Self: does not work with Python 3.10
-        layer, _ = __class__.generate_simple(
+        layer, next_tag = __class__.generate_simple(


This is exactly use case of classmethod.

class Attention: @classmethod def from_torch(cls, ...): layer, next_tag = cls.generate_simple(...) ...

amkatrutsa and others added 19 commits July 20, 2024 17:07

Add prod layer for LlamaMLP block

a319f9d

LlamaMLP works

72f6e3d

Tests for LlamaMLP works

caac0f9

Fix tests for LlamaMLP

dd54522

Move llama config in separate file

5d701a4

Llama works, but result differs from pytorch version

9456fa1

Add a draft for Llama testing

91c1fe9

Fix tests for LlamaMLP for 3d input

46ba030

Update rmsnorm layer

458dfc9

Update next_tag

184e670

Add llama decoder model with tests

363597e

Fix comment

99e451c

Add llama model with tests, torch conversion works, forward is wrong

a8eb848

Add to_torch function to embedding layer

345bf95

Llama works with correct result, no support of casual mask

7e78e68

Add to_torch_with_grads function to embedding layer

33d2e02

Add tests for backward

bd27963

Sync llama and tests with rope

5906222

Add mask support in Llama and casual mask usage in tests

adf71d6

amkatrutsa requested a review from Muxas July 20, 2024 19:47

amkatrutsa added 3 commits July 20, 2024 22:53

Test with nonzero pos_ids is correct

6b2f2ec

Add mask and pos_ids in test of decoder block

3028145

All tests related to llama model are correct

601c96f

amkatrutsa marked this pull request as ready for review July 21, 2024 08:36

Muxas requested changes Jul 21, 2024

View reviewed changes

daskol reviewed Jul 21, 2024

View reviewed changes

amkatrutsa added 4 commits July 22, 2024 10:42

Remove extra unregister for Add and Prod layers

cb25248

Usa dataclass for Llama parameters and update all modules and tests

3133213

Remove wrong test for Prod

c598eda

Replace staticmethod with classmethod in LlamaAtention from_torch

564ad01

Muxas approved these changes Jul 22, 2024

View reviewed changes

amkatrutsa merged commit 44209c2 into main Jul 22, 2024
5 checks passed

amkatrutsa deleted the amkatrutsa/llama branch July 22, 2024 12:30

Muxas mentioned this pull request Jul 22, 2024

Model: LLaMa #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama model #111

Add Llama model #111

		return torch_model, nntile_model, x_torch, pos_ids, y_grad_torch


		@pytest.mark.parametrize("params", TEST_PARAMS)

		return torch_layer, nntile_layer, x_torch, y_grad_torch, pos_ids, mask


		@pytest.mark.parametrize("params", TEST_PARAMS)

Add Llama model #111

Add Llama model #111

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason E864 for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment