8000 Add AMDGPU dialect ops for scaled fp conversions · Issue #20890 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add AMDGPU dialect ops for scaled fp conversions #20890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
krzysz00 opened this issue May 22, 2025 · 1 comment
Open

Add AMDGPU dialect ops for scaled fp conversions #20890

krzysz00 opened this issue May 22, 2025 · 1 comment

Comments

@krzysz00
Copy link
Contributor

There are a bunch of intrinsics in the rocdl dialect for doing scaled conversion to/from the fp4/6/8 types - it's all the ones with scale32 in their name (though not the sr ones - those are stochiastic rounding, wich we don't use). However, they use different intrinsics for different types and have somewhat funky calling conventions.

In the AMDGPU dialect, we currently have operations like amdgpu.ext_packed_fp8 and packed_trunc_2xfp8 for regular conversions to/from fp8, which use the types of the input and output to distinguish the operation being performed.

We should add wrapper operations around those intrinsics for the scaled cases, which also have the implicit "pad with undef" semantics. For scaling_extf, we can just take up to (32 for 6-bit, 4 for 8-bit, 8 for 4-bit) elements and a selector index that picks out the relevant byte (in all but the 6-bit case)). We might need a special operation for the scaling f8 => f16 operations, which have a tied input, unlike the other extf-likes.

For the truncation operation, we'll likely want to have an operation for all the tied input cases (where you select a byte of the output to place the result into), and one for the 6-bit cases where it just does the truncation.

These ops should lower to the ROCDL intrinsics following the existing fp8 operation patterns.

@tgymnich
Copy link
Contributor

llvm/llvm-project#141554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0