8000 Establish and implement the relevant metrics to understand storage workloads · Issue #46 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Establish and implement the relevant metrics to understand storage workloads #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
Tracked by #44 ...
lasarojc opened this issue Dec 23, 2022 · 2 comments · Fixed by #1974
Closed
3 tasks done
Tracked by #44 ...

Establish and implement the relevant metrics to understand storage workloads #46

lasarojc opened this issue Dec 23, 2022 · 2 comments · Fixed by #1974
Assignees
Labels
metrics P:storage-optimization Priority: Give operators greater control over storage and storage optimization storage
Milestone

Comments

@lasarojc
Copy link
Contributor
lasarojc commented Dec 23, 2022

Was tendermint/tendermint#9773

We need to identify the set of metrics to understand the storage workloads of CometBFT. The metrics should help us identify:

  • The access patterns (sequential / random access)
  • How often the data is read/written
  • Who reads/writes the data
  • Is the data accessed by multiple components or just one.
  • How much of the total height time is spent in storage - on a small network, on a big network? (Is storage a bottleneck? )

Open questions: Do we want information on which CometBFT Blockstore / StateStore method call triggered the access or is read/write/delete count and timing enough?

Draft implementation: tendermint/tendermint#9774

DoD

@jmalicevic
Copy link
Contributor

For Q4 2023, we will focus on:

  • Storage access time (last point in the issue description)
  • Which data is most frequently read
  • The amount of storage taken by different workloads with and without pruning
  • The throughout with and without pruning - This was derived from a call with Injective whose users' latency was heavily impacted by pruning being enabled.

This list might be altered after an in person meeting of the team deciding on the exact measurments that will help us achieve our Q4 goals.

The main result of this testing should be :

  • Showing that the impact of pruning has improved - compaction works
  • Database access times have decreased
  • Impact of the new pruning mechanism on throughput
  • Impact of different key representation + new pruning mechanism on throughput.

@melekes
Copy link
Contributor
melekes commented Nov 28, 2023

The throughout with and without pruning - This was derived from a call with Injective whose users' latency was heavily impacted by pruning being enabled.

The throughput can be hard to measure as it depends on many factors. Is it better to measure the duration of pruning (if possible)? This is similar to how the Golang core team optimised their garbage collector by looking at the duration of GC pauses.

@adizere adizere modified the milestones: 2023-Q4, 2024-Q1 Jan 10, 2024
@adizere adizere added this to CometBFT Jan 10, 2024
@github-project-automation github-project-automation bot moved this to Todo in CometBFT Jan 10, 2024
@jmalicevic jmalicevic moved this from Todo to In Progress in CometBFT Jan 17, 2024
@jmalicevic jmalicevic self-assigned this Jan 17, 2024
@jmalicevic jmalicevic moved this from In Progress to Ready for Review in CometBFT Jan 30, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2024
This PR superceeds #79 with some adjustments to the segments of code
timed as well as bucket sizes. The majority of the code was done by
William in #79. I tried to fine tune the measurements to exclude proto
marshalling/unmarshalling where I think it made sense.

Closes #46 

Blocked on benchmarking to confirm it is measuring what we want.
(follow up) The metrics gave us nice and realistic measurements in our
benchmarks.

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@github-project-automation github-project-automation bot moved this from Ready for Review to Done in CometBFT Feb 13, 2024
mergify bot pushed a commit that referenced this issue Feb 22, 2024
This PR superceeds #79 with some adjustments to the segments of code
timed as well as bucket sizes. The majority of the code was done by
William in #79. I tried to fine tune the measurements to exclude proto
marshalling/unmarshalling where I think it made sense.

Closes #46

Blocked on benchmarking to confirm it is measuring what we want.
(follow up) The metrics gave us nice and realistic measurements in our
benchmarks.

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit dfd3f6c)

# Conflicts:
#	internal/store/store.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics P:storage-optimization Priority: Give operators greater control over storage and storage optimization storage
Projects
No open projects
Status: Done
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants
0