Muxas/gqa #101

Muxas · 2024-07-14T11:59:50Z

This PR implements LlamaAttention without RotaryEmbedding. It is checked against LlamaAttention from transformers by providing zeroed argument position_ids. @daskol please check if test can be further improved for readability and pytest ideology.

daskol

General comment is that let's stick to numpy.testing rather than torch.testing. NumPy provides a richer set of assertions. Its core idea is a drop-in replacement for built-in assert statement with pretty printing and numerical precision controls, i.e.

from numpy.testing import assert_allclose, assert_equal, ...
assert_equal(lhs, rhs)

Mixing different APIs could be potentially confusing for a reader. Also, inter-op in PyTorch is a quite lame.

daskol · 2024-07-14T19:22:45Z

wrappers/python/tests/layer/test_llama_attention.py

+        n_head=16,
+        n_head_tile=8,
+        n_head_kv=4,
+        dtype=np.dtypes.Float32DType(),


Just np.float32.

I just repeated how it is done in a batch norm test. We have to either use Torch dtypes, or rely on our own dtypes. Supporting bf16, fp16, fp8 and quantised formats is out of Numpy scope.

#102 is to resolve this issue in the future

Supporting bf16, fp16, fp8 and quantised formats is out of Numpy scope.

Absolutely not. There are widely-adopted custom types (e.g. pandas). Also, NumPy facilitates an extensions with NEP-42. Thanks to JAX team, we have ml-dtypes which provide in framework-agnostic way common low-bits floating types.

[1] NEP-42
[2] numpy/numpy-user-dtypes
[3] jax-ml/ml-dtypes

NEP-42 is yet under construction, as the link says, but in the future it will allow us to add our own quantised formats. JAX ml-dtypes can be used for now, as it covers bfloat16 and float8 types. We need these only for testing purposes, as an interoperation between NNTile and Numpy is through upcasting NNTile data into a float for all 32 and less bit floats and into a double for 64-bit floats.

We need these only for testing purposes, as an interoperation between NNTile and Numpy is through upcasting NNTile data into a float for all 32 and less bit floats and into a double for 64-bit floats.

Then we do not actually need dtypes. Just define upcast() routine and expose it to Python with limited visiblity. That's all.

This is a distinct topic. I don't understand why you brought it now.

Original comment was about replacing internal numpy.dtypes.* in favor of numpy.float32.

I just made it a string. Looks much simpler now

wrappers/python/tests/layer/test_llama_attention.py

amkatrutsa · 2024-07-15T09:26:31Z

wrappers/python/nntile/layer/llama_attention.py

+        self.w_q.value.wont_use()
+        # Apply bias if needed
+        if self.in_proj_bias_q is not None:
+            # batched add_fiber (head_size, batch=(kv_group_size, n_head_kv))


What does internal tuple batch=() mean? What shape should return in this case?

This is just a virtual union of several dimensions into a single batch dimension. Parameter batch_ndim=2 is set in the next line.

Muxas added 10 commits July 14, 2024 14:37

Initial implementation of GQA

f863c36

GQA seems finished, but not tested

6da98a2

Test for LlamaAttention is here, but does not work

c9a7431

Test runs with success but for very limited inputs

02ef60a

Current debug is at deadend

71ba786

Test finally works

9ba1d72

Add to_torch and to_torch_with_grads for LlamaAttention

035bd60

Half way into pytesting LlamaAttention

b71c144

LlamaAttention is now tested by pytest

f04b33c

Fix all but ruff I001

5ed5d3b

Muxas requested review from amkatrutsa and daskol July 14, 2024 11:59

Pre-commit fixes for the test

ac8ef3f

Muxas linked an issue Jul 14, 2024 that may be closed by this pull request

Operation: Group Query Attention #86

Closed

daskol reviewed Jul 14, 2024

View reviewed changes

Muxas added 3 commits July 14, 2024 23:29

Fix comments

6bec59a

rework testing types; tst bf16 and tf32

2b79b2d

Update test, Dockerfile and constructors

bf9b6c1

amkatrutsa reviewed Jul 15, 2024

View reviewed changes

Muxas merged commit ccbb219 into main Jul 15, 2024
2 of 5 checks passed

Muxas deleted the muxas/gqa branch July 15, 2024 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Muxas/gqa #101

Muxas/gqa #101

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Muxas/gqa #101

Muxas/gqa #101

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!