8000 perf: Consider something like `pow2k` · Issue #23 · FiloSottile/edwards25519 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

perf: Consider something like pow2k #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yawning opened this issue Aug 8, 2021 · 4 comments
Open

perf: Consider something like pow2k #23

Yawning opened this issue Aug 8, 2021 · 4 comments

Comments

@Yawning
Copy link
Yawning commented Aug 8, 2021

Both Invert and Pow22523 repeatedly square in a loop. The overhead of repeatedly calling Square (and having to shuffling data in/out of registers) adds up to a decent chunk of execution time.

Doing something like func (v *Element) pow2k(x *Element, k uint) with the precondition that k >= 1, dramatically improves performance of the two operations like thus:

Invert-4    11.3µs ± 0%   7.1µs ± 0%  -37.33%
Pow22523-4  11.1µs ± 0%   7.0µs ± 0%  -37.32%

Numbers taken with purego, but the amd64 assembly implementation will also benefit (and can be written without having to spill to the stack at all).

@FiloSottile
Copy link
Owner

I tried out implementing it as a simple loop of the Square body, manually inlining carryPropagate, and I don't see any improvement on high-level group functions. Are we missing some useful benchmark, or is this for an application that relies specifically on Invert and Pow22523?

@Yawning
Copy link
Author
Yawning commented Aug 9, 2021

I tried out implementing it as a simple loop of the Square body, manually inlining carryPropagate, and I don't see any improvement on high-level group functions. Are we missing some useful benchmark, or is this for an application that relies specifically on Invert and Pow22523?

Point compression/decompression involve one Invert, and 1 SqrtRatio respectively. While it's true that those things aren't repeatedly called, a lot of the actual uses do involve doing both (eg: ref10-style Ed25519 signature verification).

@Yawning
Copy link
Author
Yawning commented Aug 19, 2021

As another concrete example of where this would be nice would be ECVRF_prove from the VRF draft.

Benchmarking my (unpublished) edwards25519 based version, Invert + Pow22523 account for ~23.5% of the function's runtime (I can reduce this by shaving off 2 decompressions by checking point canonicity on the encoded buffer, but it still will be a decent fraction).

AlexanderYastrebov added a commit to AlexanderYastrebov/wireguard-vanity-key that referenced this issue Feb 2, 2025
Use https://github.com/oasisprotocol/curve25519-voi/
which provides faster BytesMontgomery conversion due to use of
Pow2k in the Element.Invert.

```
goos: linux
goarch: amd64
pkg: github.com/AlexanderYastrebov/wireguard-vanity-key
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                    │    main     │                 HEAD                 │
                    │   sec/op    │    sec/op     vs base                │
FindPointParallel-8   889.2n ± 6%   798.9n ± 13%  -10.15% (p=0.019 n=10)

                    │    main    │              HEAD              │
                    │    B/op    │    B/op     vs base            │
FindPointParallel-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                    │    main    │              HEAD              │
                    │ allocs/op  │ allocs/op   vs base            │
FindPointParallel-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

See also FiloSottile/edwards25519#23
@AlexanderYastrebov
Copy link
Contributor

(and can be written without having to spill to the stack at all)

This seems to be tricky as one needs 10 registers to hold intermediate results so there is not enough registers to store element limbs: https://github.com/oasisprotocol/curve25519-voi/blob/1f23a7beb09a352354cbd7277c6b7f0170cf6d5d/internal/asm/amd64/field_u64.go#L331

AlexanderYastrebov added a commit to AlexanderYastrebov/wireguard-vanity-key that referenced this issue Feb 3, 2025
Use https://github.com/oasisprotocol/curve25519-voi/
which provides faster BytesMontgomery conversion due to use of
Pow2k in the Element.Invert.

```
goos: linux
goarch: amd64
pkg: github.com/AlexanderYastrebov/wireguard-vanity-key
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                    │    main     │                 HEAD                 │
                    │   sec/op    │    sec/op     vs base                │
FindPointParallel-8   889.2n ± 6%   798.9n ± 13%  -10.15% (p=0.019 n=10)

                    │    main    │              HEAD              │
                    │    B/op    │    B/op     vs base            │
FindPointParallel-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

                    │    main    │              HEAD              │
                    │ allocs/op  │ allocs/op   vs base            │
FindPointParallel-8   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

See also FiloSottile/edwards25519#23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0