You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently detect BMI2 instructions at runtime, but users can only benefit from AVX2 if they compile with -march=haswell. It would be nice to provide AVX2 support to users who are compiling with default options.
This issue is motivated specifically by this loop in ZSTD_copyCDictTableIntoCCtx() which was added as part of my short cache PR. Overall extDict compression speed at level 1 is 2-3% slower if that loop is compiled to SSE2 instructions vs AVX2 instructions.
There may be other functions which can be tagged for AVX2 dispatch in the future. I expect this issue would be closed after tagging ZSTD_copyCDictTableIntoCCtx(), and we can tag additional functions gradually.
The text was updated successfully, but these errors were encountered:
@ValZapod At least on Linux, x86 feature levels have become a thing. Some distributions such as CachyOS offer x86-64-v3 compiled repositories already which is very near to -march=haswell.
We currently detect BMI2 instructions at runtime, but users can only benefit from AVX2 if they compile with
-march=haswell
. It would be nice to provide AVX2 support to users who are compiling with default options.This issue is motivated specifically by this loop in ZSTD_copyCDictTableIntoCCtx() which was added as part of my short cache PR. Overall extDict compression speed at level 1 is 2-3% slower if that loop is compiled to SSE2 instructions vs AVX2 instructions.
There may be other functions which can be tagged for AVX2 dispatch in the future. I expect this issue would be closed after tagging ZSTD_copyCDictTableIntoCCtx(), and we can tag additional functions gradually.
The text was updated successfully, but these errors were encountered: