-
Notifications
You must be signed in to change notification settings - Fork 636
perf(p2p)!: Remove PeerSendBytesTotal metric #3184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ValarDragon ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the docs need to be updated too (p2p_peer_send_bytes_total)
https://github.com/cometbft/cometbft/blob/main/docs/explanation/core/metrics.md
Concerning the fact that this is going to be removed and not available in the future, it would be interesting to ensure this is not used in our QA or Infra tests, I briefly looked at the logic there and couldn't find references to this particular metric, but @hvanz or @sergio-mena might have more insights. If we find out that some logic critically depends on this metric, probably a better approach would be to make it a configurable option if possible (e.g. disable it through a config value). |
ah thanks, didn't think to search the docs with underscores! (Explains why I didn't find it earlier haha) |
Thoughts on merging, and then reverting if we discover some metric consumer used this? |
perf(p2p)!: Remove PeerSendBytesTotal metric cometbft#3184
…/pr-105 perf(p2p)!: Remove PeerSendBytesTotal metric cometbft#3184 (backport #105)
That's a bad dev UX 😅
please no more config options. IMO we have too many config options |
Even if some external software depends on this metric, it shouldn't prevent us from removing it. CometBFT's mission and what it should do best in the world is fast and reliable consensus. And if some metric is significantly slowing CometBFT down, we should eliminate it! I.e., metrics are secondary; consensus is primary. |
Ive tested removing it, and the prometheus time here went down ~70%! TBH that still justifies us removing the other metric or doing that more "asynchronously", e.g. via the |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Follow-up to #3184 Refs #2840 --- #### PR checklist - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
@mergify backport v1.x |
✅ Backports have been created
|
Component of #2840. As we discussed in the issue, we want to deprecate the peer send bytes total metric because: - This is a costly metric (see pprof) - This increases the cardinality of labels in Prometheus, which can slow things down in Prometheus significantly. - Its not really useful at this level of abstraction. We don't have reported usecases of this being used yet. If we want to do it in the future, we should more likely architect it as updating a live go variable, and then sending to our metrics instance on a polling ticker (ala PeerSendQueue metric). The right levels that are interesting for outbound traffic could be the consensus gossip routines or blocksync. To see its costly, 21% of all CPU time for live osmosis nodes in the latest Osmosis patch release is in peer.Send. over half of that time is in these metrics:  Please advise for how we should go about deprecating / removing this metric? Feel free to close this PR / say you want to handle it separately. --- #### PR checklist - [X] Tests written/updated - N/A - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I didn't find any relevant docs - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Andy Nogueira <me@andynogueira.dev> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 80114b2)
Follow-up to #3184 Refs #2840 --- #### PR checklist - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec (cherry picked from commit a2decd8) # Conflicts: # .changelog/unreleased/breaking-changes/3184-remove-PeerSendBytesTotal-metric.md # docs/explanation/core/metrics.md # p2p/metrics.gen.go # p2p/metrics.go
Component of #2840. As we discussed in the issue, we want to deprecate the peer send bytes total metric because: - This is a costly metric (see pprof) - This increases the cardinality of labels in Prometheus, which can slow things down in Prometheus significantly. - Its not really useful at this level of abstraction. We don't have reported usecases of this being used yet. If we want to do it in the future, we should more likely architect it as updating a live go variable, and then sending to our metrics instance on a polling ticker (ala PeerSendQueue metric). The right levels that are interesting for outbound traffic could be the consensus gossip routines or blocksync. To see its costly, 21% of all CPU time for live osmosis nodes in the latest Osmosis patch release is in peer.Send. over half of that time is in these metrics:  Please advise for how we should go about deprecating / removing this metric? Feel free to close this PR / say you want to handle it separately. --- #### PR checklist - [X] Tests written/updated - N/A - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I didn't find any relevant docs - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3184 done by [Mergify](https://mergify.com). Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Follow-up to #3184 Refs #2840 --- #### PR checklist - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3298 done by [Mergify](https://mergify.com). --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Component of #2840. As we discussed in the issue, we want to deprecate the peer send bytes total metric because:
To see its costly, 21% of all CPU time for live osmosis nodes in the latest Osmosis patch release is in peer.Send. over half of that time is in these metrics:

Please advise for how we should go about deprecating / removing this metric? Feel free to close this PR / say you want to handle it separately.
PR checklist
.changelog
(we use unclog to manage our changelog)docs/
orspec/
) and code comments - I didn't find any relevant docs