-
8000
-
Notifications
You must be signed in to change notification settings - Fork 267
Add environment variables to override vector width #1900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks! Could you add an LLVM IR check case where vectorization to the given length occurs (only) if the env is given? |
Also a mention in releasenotes would be nice. If you get this done within the next week or so we can pull it to 7.0. |
doc/sphinx/source/using.rst
Outdated
@@ -407,6 +407,29 @@ pocl. | |||
When set to 1, prints out remarks produced by the loop vectorizer of LLVM | |||
during kernel compilation. | |||
|
|||
- **POCL_FORCE_VECTOR_WIDTH** | |||
|
|||
Forces the LLVM loop vectorizer to use the specified vector width, overriding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "vector width" measured in number of elements or bytes (since the other env variable measures in bits)? Perhaps this should be clarified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a number of loop iterations. I've tried to clarify that here: ff3299d
@pjaaskel - I've never used FileCheck before, but I've had a go. Note that the two *.cl files are currently the same, because I didn't want to change your existing test infrastructure too much. There's no reason that they have to be the same, really -- only the force-vector-width test actually looks at the instructions generated for the kernel. Also, please let me know how you would prefer me to resolve the merge conflict. I know some projects prefer a merge commit while others prefer a rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please do a rebase instead of a merge commit.
- **POCL_PREFER_VECTOR_WIDTH** | ||
|
||
Override the preferred vector width (expressed as a number of **bits**) for | ||
x86 targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this x86-specific or might it work for other targets such as ARM or RISC-V?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it's x86-specific. I've only ever seen the prefer-vector-width=256 attribute being set for x86 targets, and only for the subset of x86 architectures where the clang/gcc community decided that 256-bit was a better default.
Setting POCL_FORCE_VECTOR_WIDTH allows a user to request that a specific vector width is used for all loops.
Setting POCL_PREFER_VECTOR_WIDTH allows a user to set the prefer-vector-width attribute on x86 targets. The primary use case for this environment variable is to enable the use of 512-bit SIMD on processors where clang otherwise defaults to 256-bit SIMD.
Checks that the LLVM IR is vectorized with the expected width (4).
Checks that the LLVM IR contains the expected prefer-vector-width attribute. Whether the attribute does anything is target-dependent, so harder to test.
ea54d1c
to
936f488
Compare
I have only one objection: since these variables are very LLVM specific, I suggest we rename them to The current names make it seem like these variables 1) somehow affect the |
Or how about POCL_VECTORIZER_{PREFER,FORCE}_VECTOR_WIDTH, as there's also POCL_VECTORIZER_REMARKS? We could start a "subcategory" of settings that affect the vectorization process. |
@pjaaskel - Should I basically ignore all the tests that passed, here? I can't really tell if the tests I added are always failing and some tests are irrelevant, or whether the tests I added sometimes pass. |
Some of the CIs do not run the internal tests, this is a bit confusing (@franz is fixing this). Seems there's some genuine problem there in the failing ones as it doesn't find your check string. |
Hm. Okay, I'll take a look at this tomorrow. I don't understand why it seems to be accepting the options but silently ignoring them. I would have expected things to blow up more spectacularly than this. ...and of course, things run on my machine. 😆 |
I think I fixed the basic tests: the issue wasn't the tests themselves, but that the environment variable wasn't being set correctly by the testing infrastructure. The new failures don't seem to be related to anything I've changed. Let me know what you decide regarding the renaming and I'll make that change. |
Please use POCL_VECTORIZER_{PREFER,FORCE}_VECTOR_WIDTH, perhaps a bit more descriptive and we can expand with other vectorizer-affecting properties, some of those properties might be implemented in LLVM upstream, some in our downstream PoCL-passes, so I think adding 'LLVM' to the name is not good. |
- POCL_FORCE_VECTOR_WIDTH => POCL_VECTORIZER_FORCE_VECTOR_WIDTH - POCL_PREFER_VECTOR_WIDTH => POCL_VECTORIZER_PREFER_VECTOR_WIDTH
Makes sense to me. I've changed the names in fa75cc1. |
It doesn't look like the failing test is related to anything I've changed. |
The changes in this pull request introduce two new environment variables:
POCL_FORCE_VECTOR_WIDTH
This is equivalent to LLVM's
force-vector-width
option, and forces the LLVM loop vectorizer to use the same vector width for all loops.POCL_PREFER_VECTOR_WIDTH
This is equivalent to clang's
prefer-vector-width
option, and overrides clang's preference for 256- or 512-bit vectors on certain x86 targets (e.g., processors previously codenamed Skylake).It might be a little confusing that one of these environment variables expects the width in loop iterations while the other expects the width in terms of bits. However, that's how
force-vector-width
andprefer-vector-width
work, and it would probably be even more confusing to try and come up with some PoCL-specific convention that didn't match LLVM. I've tried to draw attention to this discrepancy in the documentation.These changes were motivated by an application I've been working with, where overriding LLVM's heuristics with
POCL_FORCE_VECTOR_WIDTH=16
andPOCL_PREFER_VECTOR_WIDTH=512
results in a 2x speedup.