[BUG] Enabling AVX2 makes gather slower

Describe the bug

Gather operation is 5 times slower when AVX2 instructions are allowed with -mavx2.

To Reproduce

Set up Google Benchmark project
Disable CPU frequency scaling with sudo cpupower frequency-set --governor performance
Test the following code with -O3 and -O3 -mavx2:

void run(benchmark::State& state) {
    float data[4] = {1, 2, 3, 4};
    for(auto _ : state) {
        eve::wide<float, eve::fixed<4>> vec = eve::gather(data, eve::wide<unsigned char, eve::fixed<4>>{2, 3, 0, 1});
        benchmark::DoNotOptimize(vec);
    }
}

BENCHMARK(run);
BENCHMARK_MAIN();

Without -mavx2:

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
run             0.198 ns        0.198 ns   3315565533

With -mavx2:

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
run              1.01 ns         1.01 ns    648857193

Setup:

Compiler: g++ 14.2.1, clang++ 19.1.7
OS: Gentoo Linux
CPU: Ryzen 9 7940HS
Instructions Set used: SSE, AVX2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions