8000 Add environment variables to override vector width by Pennycook · Pull Request #1900 · pocl/pocl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add environment variables to override vector width #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 5, 2025

Conversation

Pennycook
Copy link
Collaborator

The changes in this pull request introduce two new environment variables:

  • POCL_FORCE_VECTOR_WIDTH
    This is equivalent to LLVM's force-vector-width option, and forces the LLVM loop vectorizer to use the same vector width for all loops.

  • POCL_PREFER_VECTOR_WIDTH
    This is equivalent to clang's prefer-vector-width option, and overrides clang's preference for 256- or 512-bit vectors on certain x86 targets (e.g., processors previously codenamed Skylake).

It might be a little confusing that one of these environment variables expects the width in loop iterations while the other expects the width in terms of bits. However, that's how force-vector-width and prefer-vector-width work, and it would probably be even more confusing to try and come up with some PoCL-specific convention that didn't match LLVM. I've tried to draw attention to this discrepancy in the documentation.

These changes were motivated by an application I've been working with, where overriding LLVM's heuristics with POCL_FORCE_VECTOR_WIDTH=16 and POCL_PREFER_VECTOR_WIDTH=512 results in a 2x speedup.

@Pennycook Pennycook requested a review from pjaaskel April 25, 2025 10:35
@pjaaskel
Copy link
Member

Thanks! Could you add an LLVM IR check case where vectorization to the given length occurs (only) if the env is given?

@pjaaskel
Copy link
Member

Also a mention in releasenotes would be nice. If you get this done within the next week or so we can pull it to 7.0.

@pjaaskel pjaaskel added this to the 7.0 milestone Apr 25, 2025
@@ -407,6 +407,29 @@ pocl.
When set to 1, prints out remarks produced by the loop vectorizer of LLVM
during kernel compilation.

- **POCL_FORCE_VECTOR_WIDTH**

Forces the LLVM loop vectorizer to use the specified vector width, overriding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "vector width" measured in number of elements or bytes (since the other env variable measures in bits)? Perhaps this should be clarified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a number of loop iterations. I've tried to clarify that here: ff3299d

@Pennycook
Copy link
Collaborator Author
Pennycook commented Apr 28, 2025

Thanks! Could you add an LLVM IR check case where vectorization to the given length occurs (only) if the env is given?

@pjaaskel - I've never used FileCheck before, but I've had a go. Note that the two *.cl files are currently the same, because I didn't want to change your existing test infrastructure too much. There's no reason that they have to be the same, really -- only the force-vector-width test actually looks at the instructions generated for the kernel.

Also, please let me know how you would prefer me to resolve the merge conflict. I know some projects prefer a merge commit while others prefer a rebase.

Copy link
Member
@pjaaskel pjaaskel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please do a rebase instead of a merge commit.

- **POCL_PREFER_VECTOR_WIDTH**

Override the preferred vector width (expressed as a number of **bits**) for
x86 targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this x86-specific or might it work for other targets such as ARM or RISC-V?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it's x86-specific. I've only ever seen the prefer-vector-width=256 attribute being set for x86 targets, and only for the subset of x86 architectures where the clang/gcc community decided that 256-bit was a better default.

Setting POCL_FORCE_VECTOR_WIDTH allows a user to request that a specific vector
width is used for all loops.
Setting POCL_PREFER_VECTOR_WIDTH allows a user to set the prefer-vector-width
attribute on x86 targets. The primary use case for this environment variable
is to enable the use of 512-bit SIMD on processors where clang otherwise
defaults to 256-bit SIMD.
Checks that the LLVM IR is vectorized with the expected width (4).
Checks that the LLVM IR contains the expected prefer-vector-width attribute.
Whether the attribute does anything is target-dependent, so harder to test.
@Pennycook Pennycook force-pushed the override-vector-width branch from ea54d1c to 936f488 Compare April 28, 2025 16:14
@franz
Copy link
Contributor
franz commented Apr 29, 2025

I have only one objection: since these variables are very LLVM specific, I suggest we rename them to POCL_LLVM_FORCE_VECTOR_WIDTH and POCL_LLVM_PREFER_VECTOR_WIDTH.

The current names make it seem like these variables 1) somehow affect the CL_DEVICE_PREFERRED_VECTOR_WIDTH properties of clGetDeviceInfo, and 2) apply to all devices supported by PoCL - neither of which is true, which IMO makes them somewhat misleading.

@pjaaskel
Copy link
Member

I have only one objection: since these variables are very LLVM specific, I suggest we rename them to POCL_LLVM_FORCE_VECTOR_WIDTH and POCL_LLVM_PREFER_VECTOR_WIDTH.

The current names make it seem like these variables 1) somehow affect the CL_DEVICE_PREFERRED_VECTOR_WIDTH properties of clGetDeviceInfo, and 2) apply to all devices supported by PoCL - neither of which is true, which IMO makes them somewhat misleading.

Or how about POCL_VECTORIZER_{PREFER,FORCE}_VECTOR_WIDTH, as there's also POCL_VECTORIZER_REMARKS? We could start a "subcategory" of settings that affect the vectorization process.

@Pennycook
Copy link
Collaborator Author

@pjaaskel - Should I basically ignore all the tests that passed, here? I can't really tell if the tests I added are always failing and some tests are irrelevant, or whether the tests I added sometimes pass.

@pjaaskel
Copy link
Member

@pjaaskel - Should I basically ignore all the tests that passed, here? I can't really tell if the tests I added are always failing and some tests are irrelevant, or whether the tests I added sometimes pass.

Some of the CIs do not run the internal tests, this is a bit confusing (@franz is fixing this). Seems there's some genuine problem there in the failing ones as it doesn't find your check string.

@Pennycook
Copy link
Collaborator Author

Some of the CIs do not run the internal tests, this is a bit confusing (@franz is fixing this). Seems there's some genuine problem there in the failing ones as it doesn't find your check string.

Hm. Okay, I'll take a look at this tomorrow. I don't understand why it seems to be accepting the options but silently ignoring them. I would have expected things to blow up more spectacularly than this.

...and of course, things run on my machine. 😆

@Pennycook
Copy link
Collaborator Author

I think I fixed the basic tests: the issue wasn't the tests themselves, but that the environment variable wasn't being set correctly by the testing infrastructure. The new failures don't seem to be related to anything I've changed.

Let me know what you decide regarding the renaming and I'll make that change.

@pjaaskel
Copy link
Member
pjaaskel commented May 2, 2025

Please use POCL_VECTORIZER_{PREFER,FORCE}_VECTOR_WIDTH, perhaps a bit more descriptive and we can expand with other vectorizer-affecting properties, some of those properties might be implemented in LLVM upstream, some in our downstream PoCL-passes, so I think adding 'LLVM' to the name is not good.

- POCL_FORCE_VECTOR_WIDTH => POCL_VECTORIZER_FORCE_VECTOR_WIDTH
- POCL_PREFER_VECTOR_WIDTH => POCL_VECTORIZER_PREFER_VECTOR_WIDTH
@Pennycook
Copy link
Collaborator Author

Please us 8000 e POCL_VECTORIZER_{PREFER,FORCE}_VECTOR_WIDTH, perhaps a bit more descriptive and we can expand with other vectorizer-affecting properties, some of those properties might be implemented in LLVM upstream, some in our downstream PoCL-passes, so I think adding 'LLVM' to the name is not good.

Makes sense to me. I've changed the names in fa75cc1.

@Pennycook
Copy link
Collaborator Author

It doesn't look like the failing test is related to anything I've changed.

@franz franz merged commit 56a71ee into pocl:main May 5, 2025
59 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0