-
-
Notifications
You must be signed in to change notification settings - Fork 30
perf: Consider something like pow2k
#23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried out implementing it as a simple loop of the Square body, manually inlining |
Point compression/decompression involve one Invert, and 1 SqrtRatio respectively. While it's true that those things aren't repeatedly called, a lot of the actual uses do involve doing both (eg: ref10-style Ed25519 signature verification). |
As another concrete example of where this would be nice would be Benchmarking my (unpublished) edwards25519 based version, Invert + Pow22523 account for ~23.5% of the function's runtime (I can reduce this by shaving off 2 decompressions by checking point canonicity on the encoded buffer, but it still will be a decent fraction). |
Use https://github.com/oasisprotocol/curve25519-voi/ which provides faster BytesMontgomery conversion due to use of Pow2k in the Element.Invert. ``` goos: linux goarch: amd64 pkg: github.com/AlexanderYastrebov/wireguard-vanity-key cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ main │ HEAD │ │ sec/op │ sec/op vs base │ FindPointParallel-8 889.2n ± 6% 798.9n ± 13% -10.15% (p=0.019 n=10) │ main │ HEAD │ │ B/op │ B/op vs base │ FindPointParallel-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ main │ HEAD │ │ allocs/op │ allocs/op vs base │ FindPointParallel-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ``` See also FiloSottile/edwards25519#23
This seems to be tricky as one needs 10 registers to hold intermediate results so there is not enough registers to store element limbs: https://github.com/oasisprotocol/curve25519-voi/blob/1f23a7beb09a352354cbd7277c6b7f0170cf6d5d/internal/asm/amd64/field_u64.go#L331 |
Use https://github.com/oasisprotocol/curve25519-voi/ which provides faster BytesMontgomery conversion due to use of Pow2k in the Element.Invert. ``` goos: linux goarch: amd64 pkg: github.com/AlexanderYastrebov/wireguard-vanity-key cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz │ main │ HEAD │ │ sec/op │ sec/op vs base │ FindPointParallel-8 889.2n ± 6% 798.9n ± 13% -10.15% (p=0.019 n=10) │ main │ HEAD │ │ B/op │ B/op vs base │ FindPointParallel-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ main │ HEAD │ │ allocs/op │ allocs/op vs base │ FindPointParallel-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ``` See also FiloSottile/edwards25519#23
Both
Invert
andPow22523
repeatedly square in a loop. The overhead of repeatedly calling Square (and having to shuffling data in/out of registers) adds up to a decent chunk of execution time.Doing something like
func (v *Element) pow2k(x *Element, k uint)
with the precondition thatk >= 1
, dramatically improves performance of the two operations like thus:Numbers taken with
purego
, but the amd64 assembly implementation will also benefit (and can be written without having to spill to the stack at all).The text was updated successfully, but these errors were encountered: