perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

mergify · 2024-05-15T05:45:26Z

We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:

BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op

NEW:

BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op

It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs)

PR checklist

Tests written/updated
Changelog entry added in .changelog (we use unclog to manage our changelog)
Updated relevant documentation (docs/ or spec/) and code comments
Title follows the Conventional Commits spec

This is an automatic backport of pull request #3003 done by [Mergify](https://mergify.com).

…sensus (#3003) Closes #2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- #### PR checklist - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> (cherry picked from commit 46e2484) # Conflicts: # .golangci.yml # go.mod # go.sum # store/store.go

mergify · 2024-05-15T05:45:28Z

Cherry-pick of 46e2484 has failed:

On branch mergify/bp/v0.38.x/pr-3003
Your branch is up to date with 'origin/v0.38.x'.

You are currently cherry-picking commit 46e24848f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .changelog/unreleased/improvements/3003-use-lru-caches-in-blockstore.md
	new file:   store/bench_test.go
	modified:   store/store_test.go
	modified:   types/block.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   .golangci.yml
	both modified:   go.mod
	both modified:   go.sum
	both modified:   store/store.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify bot requested a review from a team as a code owner May 15, 2024 05:45

mergify bot added the conflicts label May 15, 2024

fix conflicts

6d4f7a7

melekes approved these changes May 15, 2024

View reviewed changes

fix bench test

0f5ca63

melekes approved these changes May 15, 2024

View reviewed changes

melekes merged commit f97bee9 into v0.38.x May 15, 2024
21 checks passed

melekes deleted the mergify/bp/v0.38.x/pr-3003 branch May 15, 2024 09:31

sergio-mena removed the conflicts label May 17, 2024

faddat mentioned this pull request Oct 15, 2024

chore: use latest cometbft-db in v0.38.x #4296

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) #3082

Uh oh!

Conversation

PR checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!