8000 perf(p2p)!: Remove PeerSendBytesTotal metric by ValarDragon · Pull Request #3184 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

perf(p2p)!: Remove PeerSendBytesTotal metric #3184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 18, 2024

Conversation

ValarDragon
Copy link
Contributor
@ValarDragon ValarDragon commented Jun 4, 2024

Component of #2840. As we discussed in the issue, we want to deprecate the peer send bytes total metric because:

  • This is a costly metric (see pprof)
  • This increases the cardinality of labels in Prometheus, which can slow things down in Prometheus significantly.
  • Its not really useful at this level of abstraction. We don't have reported usecases of this being used yet. If we want to do it in the future, we should more likely architect it as updating a live go variable, and then sending to our metrics instance on a polling ticker (ala PeerSendQueue metric). The right levels that are interesting for outbound traffic could be the consensus gossip routines or blocksync.

To see its costly, 21% of all CPU time for live osmosis nodes in the latest Osmosis patch release is in peer.Send. over half of that time is in these metrics:
image

Please advise for how we should go about deprecating / removing this metric? Feel free to close this PR / say you want to handle it separately.


PR checklist

  • Tests written/updated - N/A
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments - I didn't find any relevant docs
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested review from a team as code owners June 4, 2024 15:40
Copy link
Contributor
@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ValarDragon ❤️

andynog
andynog previously requested changes Jun 5, 2024
Copy link
Contributor
@andynog andynog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the docs need to be updated too (p2p_peer_send_bytes_total)
https://github.com/cometbft/cometbft/blob/main/docs/explanation/core/metrics.md

@andynog
Copy link
Contributor
andynog commented Jun 5, 2024

Concerning the fact that this is going to be removed and not available in the future, it would be interesting to ensure this is not used in our QA or Infra tests, I briefly looked at the logic there and couldn't find references to this particular metric, but @hvanz or @sergio-mena might have more insights.

If we find out that some logic critically depends on this metric, probably a better approach would be to make it a configurable option if possible (e.g. disable it through a config value).

@ValarDragon
Copy link
Contributor Author

ah thanks, didn't think to search the docs with underscores! (Explains why I didn't find it earlier haha)

@ValarDragon ValarDragon requested a review from andynog June 5, 2024 18:27
@ValarDragon
Copy link
Contributor Author

Thoughts on merging, and then reverting if we discover some metric consumer used this?

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 5, 2024
 perf(p2p)!: Remove PeerSendBytesTotal metric cometbft#3184
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 5, 2024
…/pr-105

 perf(p2p)!: Remove PeerSendBytesTotal metric cometbft#3184  (backport #105)
@melekes
Copy link
Contributor
melekes commented Jun 7, 2024

Thoughts on merging, and then reverting if we discover some metric consumer used this?

That's a bad dev UX 😅

If we find out that some logic critically depends on this metric, probably a better approach would be to make it a configurable option if possible (e.g. disable it through a config value).

please no more config options. IMO we have too many config options

@melekes
Copy link
Contributor
melekes commented Jun 7, 2024

If we find out that some logic critically depends on this metric

Even if some external software depends on this metric, it shouldn't prevent us from removing it. CometBFT's mission and what it should do best in the world is fast and reliable consensus. And if some metric is significantly slowing CometBFT down, we should eliminate it! I.e., metrics are secondary; consensus is primary.

@ValarDragon
Copy link
Contributor Author

Ive tested removing it, and the prometheus time here went down ~70%! TBH that still justifies us removing the other metric or doing that more "asynchronously", e.g. via the updateStatsRoutine

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Jun 18, 2024
@melekes melekes enabled auto-merge June 18, 2024 04:51
@melekes melekes dismissed andynog’s stale review June 18, 2024 04:52

docs were updated

@melekes melekes added this pull request to the merge queue Jun 18, 2024
Merged via the queue into main with commit 80114b2 Jun 18, 2024
37 of 38 checks passed
@melekes melekes deleted the dev/deprecate_per_peer_metric branch June 18, 2024 05:03
melekes added a commit that referenced this pull request Jun 18, 2024
MessageReceiveBytesTotal and MessageSendBytesTotal

Follow-up to #3184
Refs #2840
github-merge-queue bot pushed a commit that referenced this pull request Jun 19, 2024
Follow-up to #3184
Refs #2840

---

#### PR checklist

- [ ] ~~Tests written/updated~~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
@melekes
Copy link
Contributor
melekes commented Jul 10, 2024

@mergify backport v1.x

Copy link
Contributor
mergify bot commented Jul 10, 2024

backport v1.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jul 10, 2024
Component of #2840. As we discussed in the issue, we want to deprecate
the peer send bytes total metric because:
- This is a costly metric (see pprof)
- This increases the cardinality of labels in Prometheus, which can slow
things down in Prometheus significantly.
- Its not really useful at this level of abstraction. We don't have
reported usecases of this being used yet. If we want to do it in the
future, we should more likely architect it as updating a live go
variable, and then sending to our metrics instance on a polling ticker
(ala PeerSendQueue metric). The right levels that are interesting for
outbound traffic could be the consensus gossip routines or blocksync.

To see its costly, 21% of all CPU time for live osmosis nodes in the
latest Osmosis patch release is in peer.Send. over half of that time is
in these metrics:

![image](https://github.com/cometbft/cometbft/assets/6440154/efba6e22-da43-4b97-ad1b-600a13738b5f)

Please advise for how we should go about deprecating / removing this
metric? Feel free to close this PR / say you want to handle it
separately.

---

#### PR checklist

- [X] Tests written/updated - N/A
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I didn't find any relevant docs
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 80114b2)
mergify bot pushed a commit that referenced this pull request Jul 10, 2024
Follow-up to #3184
Refs #2840

---

#### PR checklist

- [ ] ~~Tests written/updated~~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit a2decd8)

# Conflicts:
#	.changelog/unreleased/breaking-changes/3184-remove-PeerSendBytesTotal-metric.md
#	docs/explanation/core/metrics.md
#	p2p/metrics.gen.go
#	p2p/metrics.go
melekes pushed a commit that referenced this pull request Jul 10, 2024
Component of #2840. As we discussed in the issue, we want to deprecate
the peer send bytes total metric because:
- This is a costly metric (see pprof)
- This increases the cardinality of labels in Prometheus, which can slow
things down in Prometheus significantly.
- Its not really useful at this level of abstraction. We don't have
reported usecases of this being used yet. If we want to do it in the
future, we should more likely architect it as updating a live go
variable, and then sending to our metrics instance on a polling ticker
(ala PeerSendQueue metric). The right levels that are interesting for
outbound traffic could be the consensus gossip routines or blocksync.


To see its costly, 21% of all CPU time for live osmosis nodes in the
latest Osmosis patch release is in peer.Send. over half of that time is
in these metrics:

![image](https://github.com/cometbft/cometbft/assets/6440154/efba6e22-da43-4b97-ad1b-600a13738b5f)

Please advise for how we should go about deprecating / removing this
metric? Feel free to close this PR / say you want to handle it
separately.

---

#### PR checklist

- [X] Tests written/updated - N/A
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I didn't find any relevant docs
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3184 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
melekes added a commit that referenced this pull request Jul 10, 2024
Follow-up to #3184
Refs #2840

---

#### PR checklist

- [ ] ~~Tests written/updated~~
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3298 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@sergio-mena sergio-mena removed the stale For use by stalebot label Jul 23, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0