8000 perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) by mergify[bot] · Pull Request #3082 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 15, 2024

Conversation

mergify[bot]
Copy link
Contributor
@mergify mergify bot commented May 15, 2024

Closes #2844

We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:

BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op

NEW:

BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op

It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:
image

The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs)


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

This is an automatic backport of pull request #3003 done by [Mergify](https://mergify.com).

…sensus (#3003)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 46e2484)

# Conflicts:
#	.golangci.yml
#	go.mod
#	go.sum
#	store/store.go
@mergify mergify bot requested a review from a team as a code owner May 15, 2024 05:45
@mergify mergify bot added the conflicts label May 15, 2024
Copy link
Contributor Author
mergify bot commented May 15, 2024

Cherry-pick of 46e2484 has failed:

On branch mergify/bp/v0.38.x/pr-3003
Your branch is up to date with 'origin/v0.38.x'.

You are currently cherry-picking commit 46e24848f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .changelog/unreleased/improvements/3003-use-lru-caches-in-blockstore.md
	new file:   store/bench_test.go
	modified:   store/store_test.go
	modified:   types/block.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   .golangci.yml
	both modified:   go.mod
	both modified:   go.sum
	both modified:   store/store.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@melekes melekes merged commit f97bee9 into v0.38.x May 15, 2024
21 checks passed
@melekes melekes deleted the mergify/bp/v0.38.x/pr-3003 branch May 15, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0