8000 crypto/merkle: innerHash is unnecessarily inefficient by creating an entire fresh byteslice, copying just to invoke tmhash.Sum once yet it cloud invoke sha256.Hash.Write multiple times · Issue #1881 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

crypto/merkle: innerHash is unnecessarily inefficient by creating an entire fresh byteslice, copying just to invoke tmhash.Sum once yet it cloud invoke sha256.Hash.Write multiple times #1881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
odeke-em opened this issue Dec 23, 2023 · 2 comments · Fixed by #1921
Labels
bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team.
Milestone

Comments

@odeke-em
Copy link
Contributor

Bug Report

Noticed while in a security audit, if we look at this code

func innerHash(left []byte, right []byte) []byte {
data := make([]byte, len(innerPrefix)+len(left)+len(right))
n := copy(data, innerPrefix)
n += copy(data[n:], left)
copy(data[n:], right)
return tmhash.Sum(data)
}

  • creates a fresh byteslice with make(...)
  • copies 3 byte slices in
  • invokes tmhash.Sum(data)

meanwhile that code could simply be self contained and much simpler and more efficient just by using crypto/sha256.New then .Write on each byte slice like this

Suggested change

// returns tmhash(0x01 || left || right).
func innerHash(left []byte, right []byte) []byte {
        h := sha256.New()
        h.Write(innerPrefix)
        h.Write(left)
        h.Write(right)
        return h.Sum(nil)
}

or as a diff

diff --git a/crypto/merkle/hash.go b/crypto/merkle/hash.go
index 2acbb1ef3..523064842 100644
--- a/crypto/merkle/hash.go
+++ b/crypto/merkle/hash.go
@@ -1,6 +1,7 @@
 package merkle
 
 import (
+	"crypto/sha256"
 	"hash"
 
 	"github.com/cometbft/cometbft/crypto/tmhash"
@@ -32,11 +33,11 @@ func leafHashOpt(s hash.Hash, leaf []byte) []byte {
 
 // returns tmhash(0x01 || left || right).
 func innerHash(left []byte, right []byte) []byte {
-	data := make([]byte, len(innerPrefix)+len(left)+len(right))
-	n := copy(data, innerPrefix)
-	n += copy(data[n:], left)
-	copy(data[n:], right)
-	return tmhash.Sum(data)
+	h := sha256.New()
+	h.Write(innerPrefix)
+	h.Write(left)
+	h.Write(right)
+	return h.Sum(nil)
 }
 
 func innerHashOpt(s hash.Hash, left []byte, right []byte) []byte {

Exhibit

Benchmarking code

package merkle

import (
	"crypto/sha256"
	"strings"
	"testing"
)

var sink any = nil

type innerHashTest struct {
	left, right string
}

var innerHashTests = []*innerHashTest{
	{"aaaaaaaaaaaaaaa", "                    "},
	{"", ""},
	{"                        ", "a    ff     b    f1    a"},
	{"ffff122fff", "ffff122fff"},
	{"😎💡✅alalalalalalalalalallalallaallalaallalalalalalalalaallalalalalalala", "😎💡✅alalalalalalalalalallalallaallalaallalalalalalalalaallalalalalalalaffff122fff"},
	{strings.Repeat("ff", 1<<10), strings.Repeat("00af", 4<<10)},
	{strings.Repeat("f", sha256.Size), strings.Repeat("00af", 10<<10)},
	{"aaaaaaaaaaaaaaaaaaaaaaaaaaaffff122fffaaaaaaaaa", "aaaaaaaaaffff1aaaaaaaaaaaaaaaaaa22fffaaaaaaaaa"},
}

func BenchmarkInnerHash(b *testing.B) {
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		for _, tt := range innerHashTests {
			got := innerHash([]byte(tt.left), []byte(tt.right))
			if g, w := len(got), sha256.Size; g != w {
				b.
8000
Fatalf("size discrepancy: got %d, want %d", g, w)
			}
			sink = got
		}
	}

	if sink == nil {
		b.Fatal("Benchmark did not run!")
	}
}

Benchmark results

$ benchstat before.txt after.txt 
name         old time/op    new time/op    delta
InnerHash-8     357µs ± 3%     353µs ± 2%     ~     (p=0.129 n=9+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=9+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)

Have you tried the latest version: yes

/cc @elias-orijtech @thanethomson @lasarojc

@odeke-em odeke-em added bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team. labels Dec 23, 2023
@ValarDragon
Copy link
Contributor

This would be appreciated imo, though we should probably make a hasher API on tmhash right?

@odeke-em
Copy link
Contributor Author
odeke-em commented Jan 2, 2024

Yeah I think that could work @ValarDragon but I was trying to avoid any API surface changes as always that would require lots of input but sure this addition is simple enough as

// SumMany takes at least 1 byteslice along with a variadic
// number of other byteslices and produces the SHA256 sum from 
// hashing them as if they were 1 joined slice.
// Its purpose is to avoid programmers from thinking of inefficient patterns
// like manually joining byteslices before hashing them altogether.
func SumMany(data []byte, rest ...[]byte) []byte {
        h := sha256.New()
        h.Write(data)
        for _, data := range rest {
                h.Write(data)
        }               
        return h.Sum(nil)
}

Let me send a PR for it.

odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 2, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.
10000
1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 2, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 2, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 3, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 3, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 3, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 3, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
odeke-em added a commit to orijtech/cometbft that referenced this issue Jan 5, 2024
…of multiple byteslices

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes cometbft#1881
github-merge-queue bot pushed a commit that referenced this issue Jan 5, 2024
…of multiple byteslices (#1921)

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes #1881
mergify bot pushed a commit that referenced this issue Jan 5, 2024
…of multiple byteslices (#1921)

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes #1881

(cherry picked from commit 8acc13c)
mergify bot pushed a commit that referenced this issue Jan 5, 2024
…of multiple byteslices (#1921)

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes #1881

(cherry picked from commit 8acc13c)
melekes added a commit that referenced this issue Jan 5, 2024
…of multiple byteslices (backport #1921) (#1968)

* perf(crypto/merkle, crypto/tmhash): simplify+optimize SHA256 hashing of multiple byteslices (#1921)

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes #1881

(cherry picked from commit 8acc13c)

* format code

* make linter happy

---------

Co-authored-by: Emmanuel T Odeke <emm.odeke@gmail.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
melekes added a commit that referenced this issue Jan 5, 2024
…of multiple byteslices (backport #1921) (#1969)

* perf(crypto/merkle, crypto/tmhash): simplify+optimize SHA256 hashing of multiple byteslices (#1921)

This change adds a more efficient API in tmhash with "SumMany" whose
job is to produce the SHA256 sum of multiple byteslices. It is used
inside crypto/merkle.innerHash which used a naive and inefficient
way of hashing multiple byteslices. Benchmark results reflect these
improvements:

```shell
$ benchstat before.txt after.txt
name         old time/op    new time/op    delta
InnerHash-8     161µs ± 1%     160µs ± 5%     ~     (p=0.143 n=10+10)

name         old alloc/op   new alloc/op   delta
InnerHash-8    69.1kB ± 0%    60.1kB ± 0%  -12.98%  (p=0.000 n=10+9)

name         old allocs/op  new allocs/op  delta
InnerHash-8      24.0 ± 0%      23.0 ± 0%   -4.17%  (p=0.000 n=10+10)
```

Fixes #1881

(cherry picked from commit 8acc13c)

* format code

* make linter happy

---------

Co-authored-by: Emmanuel T Odeke <emm.odeke@gmail.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@adizere adizere added this to the 2024-Q1 milestone Jan 9, 2024
@adizere adizere added this to CometBFT Jan 15, 2024
@github-project-automation github-project-automation bot moved this to Todo in CometBFT Jan 15, 2024
@adizere adizere moved this from Todo to Done in CometBFT Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team.
Projects
No open projects
Status: Done
3 participants
0