8000 RFC for vector length agnostic SVE Vectorized class #153471

Ryo-not-rio · 2025-05-13T15:50:22Z

🚀 The feature, motivation and pitch

This is a proposal for a vector length agnostic Vectorized class to replace the current implementation for the SVE Vectorized class. This proposal will allow us to add support for all SVE vector lengths without any duplication of code and without any regressions compared to the current SVE Vectorized class.
The main idea behind this implementation is to replace the vector in the Vectorized class with an array which we will load from and store to with each op. By making use of compiler optimizations, this allows us to make the Vectorized class vector length agnostic without any regressions.
The drawback of this approach is that we will have to make the size() function not a constexpr as SVE vector lengths are not known at compile time. This would require a rewrite of a number of existing functions but should not offer any performance regressions.

Additional context

An RFC has been created at pytorch/rfcs#73 which goes into more detail.

cc @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01

The text was updated successfully, but these errors were encountered:

malfet · 2025-05-14T14:22:02Z

The size() function which returns the number of elements in the Vectorized class cannot be constexpr in our implmentation due to SVE vector lengths being unknown at compile time. We propose we change this to be const instead of constexpr.

constexpr size allows for a numerous optimizations on the compiler side, and making it dynamic will likely result in a significant slowdown. To accept change like that, that will affect all supported CPU architectures a rigorous test plan is necessary as well as list of HW one must run it against.

Ryo-not-rio · 2025-05-15T14:12:01Z

@malfet What do you think of the following alternatives?

Keep size() constexpr for non-SVE Vectorized class and change the affected code only for SVE - we will only have to benchmark aarch64 machines in this case
Use a hybrid approach where we decide the size() on compile time. This will keep size() constexpr but will not be able to take advantage of SVE's runtime vector length detection feature

malfet · 2025-05-15T14:54:36Z

Keep size() constexpr for non-SVE Vectorized class and change the affected code only for SVE - we will only have to benchmark aarch64 machines in this case

If changes can be constrainted to just veclib, it sounds fine, but if entire codebase needs to be sprinkled with #ifndef __aarch64__ than this does not sound like a good idea.

But one still need to define a benchmarks to make sure this will not cause regressions on aarch64 platform.

Use a hybrid approach where we decide the size() on compile time. This will keep size() constexpr but will not be able to take advantage of SVE's runtime vector length detection feature

Sure, we can do that, but for practical purposes it feels like it will leave at roughly where we are right now, when we have NEON and SVE256. But it would be great if this could be auto-detected when torch.compile is invoked

Ryo-not-rio · 2025-05-15T15:54:50Z

Sure, we can do that, but for practical purposes it feels like it will leave at roughly where we are right now, when we have NEON and SVE256

The code size will be basically similar to now except adding the vector length detection for 128 & 512. It will offer significant performance boosts to any functions using the exponential function as this is accelerated on SVE but not NEON

aditew01 · 2025-05-22T10:57:11Z

cc: @maajidkhann

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8000 RFC for vector length agnostic SVE Vectorized class #153471

RFC for vector length agnostic SVE Vectorized class #153471

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

8000 RFC for vector length agnostic SVE Vectorized class #153471

RFC for vector length agnostic SVE Vectorized class #153471

Comments

Uh oh!

🚀 The feature, motivation and pitch

Additional context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!