8000 Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2 by pps83 · Pull Request #4248 · facebook/zstd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Use _bzhi_u32 for 32-bit builds when building with STATIC_BMI2 #4248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 19, 2025

Conversation

pps83
Copy link
Contributor
@pps83 pps83 commented Jan 18, 2025

_bzhi_u64 is available only for 64-bit builds, while BIT_getLowerBits expects nbBits to be less than BIT_MASK_SIZE (BIT_MASK_SIZE is 32)

`_bzhi_u64` is available only for 64-bit builds, while `BIT_getLowerBits` expects `nbBits` to be less than `BIT_MASK_SIZE` (`BIT_MASK_SIZE` is 32)
@@ -162,7 +162,11 @@ MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC,
FORCE_INLINE_TEMPLATE size_t BIT_getLowerBits(size_t bitContainer, U32 const nbBits)
{
#if defined(STATIC_BMI2) && STATIC_BMI2 == 1 && !defined(ZSTD_NO_INTRINSICS)
return _bzhi_u64(bitContainer, nbBits);
#if defined(__x86_64__) || defined(_M_X64)
Copy link
Contributor
@Cyan4973 Cyan4973 Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: indentation:
we would use :

# if ...
# else
# endif

(notice the space between # and the keyword)
to distinguish the indentation level,
thus making it visually clear that it's inside the upper #if ....

@Cyan4973
Copy link
Contributor

Here also, if there were some benchmark numbers to illustrate the potential performance change introduced by this patch, this would be great for future reference.
My understanding is that the change is only expected to impact 32-bit mode.
I'm not expecting huge changes to be fair, but even just ensuring that performance is not negatively impacted would be a good start.

Copy link
Contributor
@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Try to benchmark the change for documentation purpose.

Also, the discussion opened a Pandora Box of additional questions, sometimes quite subtle, so I'll take the follow ups from there.

@pps83
Copy link
Contributor Author
pps83 commented Jan 19, 2025

Try to benchmark the change for documentation purpose.

I do 64 build builds, this change won't affect those. But I also do 32-bit builds for correctness check and got an error when building for 32bit with -DSTATIC_BMI2

@Cyan4973
Copy link
Contributor

And i presume this error gets fixed after this PR ?

If that's the case, it might be worth extending the CI test suite to cover this scenario, which has been missing so far.

Cyan4973 added a commit that referenced this pull request Jan 19, 2025
which is expected to be quite rare, but nonetheless possible.

This test is initially expected to fail, before integration of #4248 fix
@Cyan4973
Copy link
Contributor

Something I'm currently wondering :
in order to make the compilation fail, one must set explicitly -DSTATIC_BMI2=1.

I was wondering: why is it not automatically set, on detection of AVX2 typically ?

Well, looking at the code where this variable is automatically set, it indeed requires the compilation mode to be __x86_64__ .
But why ? Shouldn't just the detection of AVX2 or even BMI2 be enough to set STATIC_BMI2 to 1 ?

And now I'm wondering: is it only set in 64-bit mode because otherwise compilation fails, precisely because this fix was missing ?

In which case, it would be a good reason to change the way STATIC_BMI2 is set after this patch is merged.

@Cyan4973 Cyan4973 merged commit a469e7c into facebook:dev Jan 19, 2025
94 checks passed
Cyan4973 added a commit that referenced this pull request Jan 19, 2025
which is expected to be quite rare, but nonetheless possible.

This test is initially expected to fail, before integration of #4248 fix
Cyan4973 added a commit that referenced this pull request Jan 19, 2025
this was previously no triggered in x86 32-bit mode,
due to a limitation in `bitstream.h`, that was fixed in #4248.

Now, `bmi2` will be automatically detected and triggered
at compilation time, if the corresponding instruction set is enabled,
even in 32-bit mode.

Also: updated library documentation, to feature STATIC_BMI2 build variable
@pps83
Copy link
Contributor Author
pps83 commented Jan 20, 2025

Something I'm currently wondering : in order to make the compilation fail, one must set explicitly -DSTATIC_BMI2=1.

I was wondering: why is it not automatically set, on detection of AVX2 typically ?

I set -DSTATIC_BMI2=1 for compilation.
There is a weird issue with zstd. It has compiler.h and portability_macros.h (compiler includes portability_macros). Then, entire STATIC_BMI2 block inside compiler.h is weird:

/*Like DYNAMIC_BMI2 but for compile time determination of BMI2 support*/
#ifndef STATIC_BMI2
#  if defined(_MSC_VER) && (defined(_M_X64) || defined(_M_I86))
#    ifdef __AVX2__  //MSVC does not have a BMI2 specific flag, but every CPU that supports AVX2 also supports BMI2
#       define STATIC_BMI2 1
#    endif
#  elif defined(__BMI2__) && defined(__x86_64__) && defined(__GNUC__)
#    define STATIC_BMI2 1
#  endif
#endif

#ifndef STATIC_BMI2
    #define STATIC_BMI2 0
#endif

it starts with #ifndef STATIC_BMI2 meaning that it could be declared from outside (that's what I did). Then, STATIC_BMI2 gets declared for msvc for 32 and 64 bit builds. but only for 64bit builds with gcc. That's weird. Then, if you search entire code base, __BMI2__ is still used in the code in a couple of places, while STATIC_BMI2 should have been used instead imo.

DYNAMIC_BMI2 clearly should have been declared rigth there, but it's inside portability_macros.h which gets processed before compiler.h. Also, DYNAMIC_BMI2 should be 1 only when STATIC_BMI2 is 0, but STATIC_BMI2 gets defined before it. Quite convoluted mess imo.

@pps83 pps83 deleted the dev-bzhi32 branch January 20, 2025 01:00
@Cyan4973
Copy link
Contributor

I could understand why DYNAMIC_BMI2 ends up into portability_macros.h:
because it's a runtime capability which is target specific,
whereas STATIC_BMI2 is a fixed decision selected for the entire compilation session.

But I do not understand why it comes first in the dependency chain.

It looks to me that portability_macros.h should depend on compiler.h, and not the other way round.

This would solve the issue that DYNAMIC_BMI2 should only be enabled when STATIC_BMI2 isn't.

This code likely requires some refactor.

@Cyan4973
Copy link
Contributor

I now see this comment in portability_macros.h

This header is shared between C and ASM code, so it MUST only contain macro definitions. It MUST not contain any C code.

That would explain why it doesn't include comp A7AB iler.h,
because compiler.h does include some prototype definitions, not just macros.

But to be frank, it looks to me that it's compiler.h which has been over-extended and does way too much.

Anyway, a refactor exercise isn't going to be trivial, and requires some attention to be done properly.

@pps83
Copy link
Contributor Author
pps83 commented Jan 20, 2025

IMO, the should be merged into a single file to avoid the mess (all macro from compiler.h should be moved to portability_macros.h)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0