8000 db: consider separating mvcc garbage values into separate blob files · Issue #4424 · cockroachdb/pebble · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

db: consider separating mvcc garbage values into separate blob files #4424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbowens opened this issue Mar 26, 2025 · 3 comments
Open

db: consider separating mvcc garbage values into separate blob files #4424

jbowens opened this issue Mar 26, 2025 · 3 comments

Comments

@jbowens
Copy link
Collaborator
jbowens commented Mar 26, 2025

The current (in-progress) design of value separation (#112) writes all the values of a sstable to a single blob file during compactions that write blob files. If there exists significant MVCC garbage, the MVCC garbage can reduce locality of access for reads at recent MVCC timestamps. We could consider instead storing these MVCC garbage values in a separate blob file.

This has the added advantage that in CockroachDB, MVCC garbage is expected to be deleted once the GC ttl elapses. That means we expect all the values within a garbage blob file to eventually be deleted. We may be able to delete the blob file even before a relevant blob-rewriting compaction is scheduled if non-blob-rewriting compactions compact the tombstones written by MVCC GC first.

Jira issue: PEBBLE-363

@jbowens jbowens changed the title db: consider separating mvcc garbage values in distinct blob files db: consider separating mvcc garbage values into separate blob files Mar 26, 2025
@petermattis
Copy link
Collaborator

There is similarity to #847. That issue was thinking about obsolete records pinned by snapshots, but the same idea could be done for historic MVCC keys. We already record the MVCC bounds in the sstable properties to do table-level filtering on reads. Separating out the historic MVCC keys into separate sstables has the nice property that you don't even have to skip over the keys during reads.

@sumeerbhola
Copy link
Collaborator

Separating out the historic MVCC keys into separate sstables has the nice property that you don't even have to skip over the keys during reads.

I suspect @jbowens was narrowly thinking of writing to two blob files for the separated values in a sstable -- that sounds like a good idea.
Regarding skipping over keys, I think this is similar to #1170 which is now closed, post implementation of value blocks in sstables. The reason we did not separate the keys is that we have no way of knowing when writing a sstable in a compaction about which keys represent committed and resolved values. If we separate what we think are older keys and values, and the latest one is actually not committed and resolved, the MVCC get/scan will be incorrect.

@petermattis
Copy link
Collaborator

Ack that a compaction can't know if the latest MVCC version is committed or not, but isn't it true that if there are two MVCC versions for a key that the second (older version) must be committed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0