-
Notifications
You must be signed in to change notification settings - Fork 645
perf(consensus): Use TrySend for hasVote/HasBlockPart messages #3407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Going to test this in osmosis prod to see the scheduler improvement, mostly because I'm curious if it reflects my understanding. The test I added tests TryBroadcast to same extent Broadcast is tested within switch. (I didn't see any other test) |
I fixed the old Go Benchmark (It hasn't been working for months).
So a significant CPU cost reduction! |
This appears to have been an empiric 20% reduction in scheduler time! (At consensus state and mempool times being roughly equal) And a 5-10% GC time reduction! |
I will take a detailed look maybe tomorrow, but conceptually I agree, in particular because missing those two messages is not a problem: they are optimizations, preventing duplication, they are not core messages of the protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Please fix the changelog before committing.
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
use TrySend instead of Send. This saves notable amounts of performance, | ||
as TrySend just drops sending when the channel is full, instead of creating | ||
a timer / new channel. | ||
([\#3342](https://github.com/cometbft/cometbft/issues/3342)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this link right? #3342 is something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ValarDragon ❤️
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
|
I'm at conferences this week, I can work on debugging on weekend! Sorry for delay on replies |
This error is odd, are you sure it was introduced by this PR? |
I mean, |
not sure |
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 55493e0) # Conflicts: # internal/consensus/reactor.go # p2p/switch.go # p2p/switch_test.go
…ort #3407) (#3466) Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ort cometbft#3407) (cometbft#3466) Closes cometbft#3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #3151
The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the
NewTimer
call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following.We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is
CheckTimers
in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce totalmallocgc
time here by 5% by removing NewTimer, which should hopefully reduce GC load as well.PR checklist
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code comments