-
Notifications
You must be signed in to change notification settings - Fork 645
Change broadcastHasVote and broadcastHasBlockpart to use TrySend not Send #3151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
After looking at #3182, I think we should just change Broadcast to use TrySend everywhere, not Send. And adapt all relevant channel buffers accordingly. |
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Yes, I partially agree. The problem is that But the |
The easiest solution here is to clone |
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 55493e0) # Conflicts: # internal/consensus/reactor.go # p2p/switch.go # p2p/switch_test.go
…ort #3407) (#3466) Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit e731a3f) # Conflicts: # p2p/switch.go # p2p/switch_test.go
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ometbft#3480) Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what cometbft#3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in cometbft#3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ometbft#3480) (#134) Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what cometbft#3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in cometbft#3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ort cometbft#3407) (cometbft#3466) Closes cometbft#3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Feature Request
Summary
Currently both BroadcastHasVote and BroadcastHasBlockpart use Broadcast, which does a Send message to every peer.
While its not clear that BroadcastHasVote is needed, it definitely should be using TrySend not Send! TrySend makes it ignore the channel buffering capacity, and depend on time.Sleep's for cancelling. This increases the overhead for the go scheduler, which does actually take a lot of the time for the system. (Just creating the timers takes 30s of CPU time across a 1hour cpu profile, independent of the scheduler overhead)
Theres a number of things worth revisting about this
BroadcastHas*
, but it seems like we should be using TrySend everytime not Send. The channel capacity for hasVote is sent to 1, so were not really respecting that right now with the correct behavior. Its likely to me this is an accident in implementation, since the Broadcast method already existed, and used Send, with no corresponding trySend method.Problem Definition
Reduce our CPU and scheduling overheads, and respect the channel capacities, so we don't "overgossip" from
Has*
channels.Proposal
For now, just make a BroadcastTrySend method, and switch the consensus gossip routines to using that!
If for some reason we really wanted Send (which doesn't make sense to me) then we should raise the channel capacities.
The text was updated successfully, but these errors were encountered: