-
Notifications
You must be signed in to change notification settings - Fork 704
[Codegen][GPU] Add placeholder op for buffer casts on tensors #20589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something: could you explain why we need this and also why it isn't folded into the "can we use buffers for this" checking pass?
Four main reasons for the op:
I might be able to think of more reasons, but I'm quite sure we want the op based on what I've seen Triton do with buffer instructions.
Good q, I wasn't quite sure how best to do that yet, and because I was hand coding the op in my use case I kind of cheated with a separate pass. Rolling the DropResourceCasts pass into the checking pass makes sense after I just reviewed the code there. |
9037460
to
a79751b
Compare
@krzysz00 still missing some more tests, but this should be what we discussed. Lot's of room for improvement in the future too, but PTAL whenever you have a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks reasonable - I'm much happier with the design here
compiler/src/iree/compiler/Codegen/Common/GPU/GPUBubbleResourceCasts.cpp
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUBubbleResourceCasts.cpp
Show resolved
Hide resolved
a79751b
to
4c9081b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - is there a followup patch where we start using this more?
if |input| bufferizes to `storage_buffer` memory space. If |input| resolves | ||
to any other memory space this op is silently dropped and has no effect. | ||
|
||
If |cache_swizzle_stride| is present, there is verification before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why're you using |param|
? Is that a style thing I'm unaware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty common through the rest of the codebase, e.g.
iree/runtime/src/iree/vm/buffer.h
Lines 76 to 82 in 72f1157
// |data| will be freed with |allocator| when the buffer is deinitialized. // If the data is not owned then iree_allocator_null can be used to no-op the // free. // // |access| can be used to control who (guest, host, etc) and how (read/write) // the buffer may be accessed. If the allocation being wrapped has its own // access requirements (read-only, etc) the caller must specify those flags. Aligns |value| up to the given power-of-two |alignment| if required.
but isn't formal style as far as I am aware, I just like it over backticks or quotes.
Implemented? No :P Yes in the semi near future though. I need to work out/benchmark how to pick the right cache swizzle value before we get benefit of on-by-default, but the hope is it should be easy to generate after operand promotion (maybe earlier before workgroup tiling). |
Introduces iree_gpu.buffer_resource_cast to represent where to insert amdgpu.fat_raw_buffer_cast ops when bufferizing. It also supports taking a cache_swizzle stride. To avoid colliding the the existing plumbing of amdgpu buffer resources, there is a pass that walks producers of these cast ops and either drops the casts or drops the annotation on the binding. Right now the verification for cache_swizzle is simply checking for a chain of single-use producers, but in the future we can add more involved verification that checks that all users specify the same cache swizzle value.
8cd8e0c
to
34f9f7b
Compare
All non-bazel tests passed last night. Going to merge through remaining ONNX tests. |
Introduces iree_gpu.buffer_resource_cast to represent where to insert amdgpu.fat_raw_buffer_cast ops when bufferizing. It also supports taking a cache_swizzle stride. To avoid colliding the the existing plumbing of amdgpu buffer resources, there is a pass that walks producers of these cast ops and either drops the casts or drops the annotation on the binding. Right now the verification for cache_swizzle is simply checking for a chain of single-use producers, but in the future we can add more involved verification that checks that all users specify the same cache swizzle value.
…rg#20589) Introduces iree_gpu.buffer_resource_cast to represent where to insert amdgpu.fat_raw_buffer_cast ops when bufferizing. It also supports taking a cache_swizzle stride. To avoid colliding the the existing plumbing of amdgpu buffer resources, there is a pass that walks producers of these cast ops and either drops the casts or drops the annotation on the binding. Right now the verification for cache_swizzle is simply checking for a chain of single-use producers, but in the future we can add more involved verification that checks that all users specify the same cache swizzle value.
Introduces iree_gpu.buffer_resource_cast to represent where to insert amdgpu.fat_raw_buffer_cast ops when bufferizing. It also supports taking a cache_swizzle stride. To avoid colliding the the existing plumbing of amdgpu buffer resources, there is a pass that walks producers of these cast ops and either drops the casts or drops the annotation on the binding.
Right now the verification for cache_swizzle is simply checking for a chain of single-use producers, but in the future we can add more involved verification that checks that all users specify the same cache swizzle value.