-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2 #4248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`_bzhi_u64` is available only for 64-bit builds, while `BIT_getLowerBits` expects `nbBits` to be less than `BIT_MASK_SIZE` (`BIT_MASK_SIZE` is 32)
lib/common/bitstream.h
Outdated
@@ -162,7 +162,11 @@ MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC, | |||
FORCE_INLINE_TEMPLATE size_t BIT_getLowerBits(size_t bitContainer, U32 const nbBits) | |||
{ | |||
#if defined(STATIC_BMI2) && STATIC_BMI2 == 1 && !defined(ZSTD_NO_INTRINSICS) | |||
return _bzhi_u64(bitContainer, nbBits); | |||
#if defined(__x86_64__) || defined(_M_X64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: indentation:
we would use :
# if ...
# else
# endif
(notice the space between #
and the keyword)
to distinguish the indentation level,
thus making it visually clear that it's inside the upper #if ...
.
Here also, if there were some benchmark numbers to illustrate the potential performance change introduced by this patch, this would be great for future reference.
My understanding is that the change is only expected to impact 32-bit mode.
I'm not expecting huge changes to be fair, but even just ensuring that performance is not negatively impacted would be a good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Try to benchmark the change for documentation purpose.
Also, the discussion opened a Pandora Box of additional questions, sometimes quite subtle, so I'll take the follow ups from there.
I do 64 build builds, this change won't affect those. But I also do 32-bit builds for correctness check and got an error when building for 32bit with |
And i presume this error gets fixed after this PR ? If that's the case, it might be worth extending the CI test suite to cover this scenario, which has been missing so far. |
which is expected to be quite rare, but nonetheless possible. This test is initially expected to fail, before integration of #4248 fix
Something I'm currently wondering : I was wondering: why is it not automatically set, on detection of Well, looking at the code where this variable is automatically set, it indeed requires the compilation mode to be And now I'm wondering: is it only set in 64-bit mode because otherwise compilation fails, precisely because this fix was missing ? In which case, it would be a good reason to change the way |
which is expected to be quite rare, but nonetheless possible. This test is initially expected to fail, before integration of #4248 fix
this was previously no triggered in x86 32-bit mode, due to a limitation in `bitstream.h`, that was fixed in #4248. Now, `bmi2` will be automatically detected and triggered at compilation time, if the corresponding instruction set is enabled, even in 32-bit mode. Also: updated library documentation, to feature STATIC_BMI2 build variable
I set -DSTATIC_BMI2=1 for compilation.
it starts with
|
I could understand why But I do not understand why it comes first in the dependency chain. It looks to me that This would solve the issue that This code likely requires some refactor. |
I now see this comment in
That would explain why it doesn't include But to be frank, it looks to me that it's Anyway, a refactor exercise isn't going to be trivial, and requires some attention to be done properly. |
IMO, the should be merged into a single file to avoid the mess (all macro from compiler.h should be moved to portability_macros.h) |
_bzhi_u64
is available only for 64-bit builds, whileBIT_getLowerBits
expectsnbBits
to be less thanBIT_MASK_SIZE
(BIT_MASK_SIZE
is 32)