8000 Enable compile-time optimisation for prebuilt wheels by ajaust · Pull Request #19 · equinor/si4ti · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Enable compile-time optimisation for prebuilt wheels #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 13, 2025

Conversation

ajaust
Copy link
Contributor
@ajaust ajaust commented May 12, 2025

Vectorisation is crucial for good performance in si4ti as Eigen relies heavily on vectorisation. In tests on a x86_64 platform (Intel Xeon Platinum 8462Y+ using 8 cores), enabling target specific optimisation x86-64-v3 1 showed a reduction of runtime ~40% (from 60s to 38s wallclock time). Enabling newer archtitectures or -march=native did not show improved performance during our tests.

This impacts usability of the wheels as they only work on devices with support for the used instruction set. This should still be true for all reasonably new CPUs as as x86-64-v3 refers to common CPUs with AVX2 such as Intel Haswell (released 2013) and AMD Excavator (released 2015) and newer "big cores" 2. It is expected that people running the si4ti application are on powerful, reasonably new machines, and appreciate the performance gains. Mac users on arm64 Macs should be unaffected since do not turn any vectorisation features on besides the ones that are activated by default.

@ajaust ajaust force-pushed the impr/add-wheels-vectorisation branch 5 times, most recently from 02895e4 to ab89607 Compare May 13, 2025 06:06
@ajaust ajaust changed the title Enable vectorisation for prebuilt wheels Enable compile-time optimisation for prebuilt wheels May 13, 2025
@ajaust ajaust marked this pull request as ready for review May 13, 2025 07:32
Copy link
Contributor
@achaikou achaikou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
You've convinced me that we should sacrifice compatibility for performance.

ajaust added 2 commits May 13, 2025 10:25
Compile-time optimisation is crucial for good performance in si4ti,
most likely because of Eigen. In tests on a x86_64 platform (Intel Xeon
Platinum 8462Y+ using 8 cores), careful choise of compile-time optimisation
could decrease the runtime by about 40%.

Number of runs per case was 5. We report the lowest, average, and median
runtime recorded in seconds. The data set is a sample data set from the
domain expert (2 files, one baseline and one monitor cube). The input
data set has the following dimensions 361 x 221 x 376 (number of rows x
number of columns x number of layers) as xtgeo.Cube.

	| Target    |   Min |   Avg | Median |
	| --------- | ----- | ----- | ------ |
	| generic   | 63.97 | 64.06 |  64.03 |
	| x86-64-v3 | 39.44 | 39.95 |  40.06 |
	| native    | 51.04 | 51.40 |  51.36 |

The  table shows that the target `x86-64-v3` [1] showed a reduction of
runtime ~40% (from 60s to 38s wallclock time). Enabling newer
archtitectures or `-march=native` did not show improved performance
during our tests.

The `compute_impedance` function was called with the following parameters:

    _a, _b = si4ti.compute_impedance(
        cubes,
        segments = 1,
        max_iter = 100,
        damping_3D = 0.001,
        damping_4D = 0.001,
        latsmooth_3D = 0.01,
        latsmooth_4D = 0.05,
    )

Enabling compile-time optimisations impacts portability of the wheels
as the wheels only work on devices with support for the used instruction
set. This should still be true for all reasonably new CPUs as as
`x86-64-v3` refers to common CPUs with AVX2 support such as Intel
Haswell (released 2013) and AMD Excavator (released 2015) and newer "big
cores" [2]. It is expected that people running the si4ti application are
on powerful, reasonably new machines, and appreciate the performance
gains. Mac users on arm64 Macs should be unaffected since we do not turn
on any additional vectorisation features besides the ones that are
activated by default.

[1]: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
[2]: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
Tagging with the new version allows us and users to specifically install
the bindings with vectorisation enabled. This helps with testing. A new
shall be published after merging this change.
@ajaust ajaust force-pushed the impr/add-wheels-vectorisation branch from cd0a410 to 0a614e1 Compare May 13, 2025 08:25
@ajaust ajaust merged commit f2db721 into equinor:main May 13, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0