Enable compile-time optimisation for prebuilt wheels #19

ajaust · 2025-05-12T10:36:47Z

Vectorisation is crucial for good performance in si4ti as Eigen relies heavily on vectorisation. In tests on a x86_64 platform (Intel Xeon Platinum 8462Y+ using 8 cores), enabling target specific optimisation x86-64-v3 1 showed a reduction of runtime ~40% (from 60s to 38s wallclock time). Enabling newer archtitectures or -march=native did not show improved performance during our tests.

This impacts usability of the wheels as they only work on devices with support for the used instruction set. This should still be true for all reasonably new CPUs as as x86-64-v3 refers to common CPUs with AVX2 such as Intel Haswell (released 2013) and AMD Excavator (released 2015) and newer "big cores" 2. It is expected that people running the si4ti application are on powerful, reasonably new machines, and appreciate the performance gains. Mac users on arm64 Macs should be unaffected since do not turn any vectorisation features on besides the ones that are activated by default.

achaikou

Looks good!
You've convinced me that we should sacrifice compatibility for performance.

Compile-time optimisation is crucial for good performance in si4ti, most likely because of Eigen. In tests on a x86_64 platform (Intel Xeon Platinum 8462Y+ using 8 cores), careful choise of compile-time optimisation could decrease the runtime by about 40%. Number of runs per case was 5. We report the lowest, average, and median runtime recorded in seconds. The data set is a sample data set from the domain expert (2 files, one baseline and one monitor cube). The input data set has the following dimensions 361 x 221 x 376 (number of rows x number of columns x number of layers) as xtgeo.Cube. | Target | Min | Avg | Median | | --------- | ----- | ----- | ------ | | generic | 63.97 | 64.06 | 64.03 | | x86-64-v3 | 39.44 | 39.95 | 40.06 | | native | 51.04 | 51.40 | 51.36 | The table shows that the target `x86-64-v3` [1] showed a reduction of runtime ~40% (from 60s to 38s wallclock time). Enabling newer archtitectures or `-march=native` did not show improved performance during our tests. The `compute_impedance` function was called with the following parameters: _a, _b = si4ti.compute_impedance( cubes, segments = 1, max_iter = 100, damping_3D = 0.001, damping_4D = 0.001, latsmooth_3D = 0.01, latsmooth_4D = 0.05, ) Enabling compile-time optimisations impacts portability of the wheels as the wheels only work on devices with support for the used instruction set. This should still be true for all reasonably new CPUs as as `x86-64-v3` refers to common CPUs with AVX2 support such as Intel Haswell (released 2013) and AMD Excavator (released 2015) and newer "big cores" [2]. It is expected that people running the si4ti application are on powerful, reasonably new machines, and appreciate the performance gains. Mac users on arm64 Macs should be unaffected since we do not turn on any additional vectorisation features besides the ones that are activated by default. [1]: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html [2]: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

Tagging with the new version allows us and users to specifically install the bindings with vectorisation enabled. This helps with testing. A new shall be published after merging this change.

ajaust force-pushed the impr/add-wheels-vectorisation branch 5 times, most recently from 02895e4 to ab89607 Compare May 13, 2025 06:06

ajaust changed the title ~~Enable vectorisation for prebuilt wheels~~ Enable compile-time optimisation for prebuilt wheels May 13, 2025

ajaust marked this pull request as ready for review May 13, 2025 07:32

achaikou approved these changes May 13, 2025

View reviewed changes

ajaust added 2 commits May 13, 2025 10:25

Bump project version number

0a614e1

Tagging with the new version allows us and users to specifically install the bindings with vectorisation enabled. This helps with testing. A new shall be published after merging this change.

ajaust force-pushed the impr/add-wheels-vectorisation branch from cd0a410 to 0a614e1 Compare May 13, 2025 08:25

ajaust merged commit f2db721 into equinor:main May 13, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable compile-time optimisation for prebuilt wheels #19

Enable compile-time optimisation for prebuilt wheels #19

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable compile-time optimisation for prebuilt wheels #19

Enable compile-time optimisation for prebuilt wheels #19

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!