Enable compile-time optimisation for prebuilt wheels #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Vectorisation is crucial for good performance in si4ti as Eigen relies heavily on vectorisation. In tests on a x86_64 platform (Intel Xeon Platinum 8462Y+ using 8 cores), enabling target specific optimisation
x86-64-v3
1 showed a reduction of runtime ~40% (from 60s to 38s wallclock time). Enabling newer archtitectures or-march=native
did not show improved performance during our tests.This impacts usability of the wheels as they only work on devices with support for the used instruction set. This should still be true for all reasonably new CPUs as as
x86-64-v3
refers to common CPUs with AVX2 such as Intel Haswell (released 2013) and AMD Excavator (released 2015) and newer "big cores" 2. It is expected that people running the si4ti application are on powerful, reasonably new machines, and appreciate the performance gains. Mac users on arm64 Macs should be unaffected since do not turn any vectorisation features on besides the ones that are activated by default.