db: consider separating mvcc garbage values into separate blob files · Issue #4424 · cockroachdb/pebble · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current (in-progress) design of value separation (#112) writes all the values of a sstable to a single blob file during compactions that write blob files. If there exists significant MVCC garbage, the MVCC garbage can reduce locality of access for reads at recent MVCC timestamps. We could consider instead storing these MVCC garbage values in a separate blob file.
This has the added advantage that in CockroachDB, MVCC garbage is expected to be deleted once the GC ttl elapses. That means we expect all the values within a garbage blob file to eventually be deleted. We may be able to delete the blob file even before a relevant blob-rewriting compaction is scheduled if non-blob-rewriting compactions compact the tombstones written by MVCC GC first.
jbowens
changed the title
db: consider separating mvcc garbage values in distinct blob files
db: consider separating mvcc garbage values into separate blob files
Mar 26, 2025
There is similarity to #847. That issue was thinking about obsolete records pinned by snapshots, but the same idea could be done for historic MVCC keys. We already record the MVCC bounds in the sstable properties to do table-level filtering on reads. Separating out the historic MVCC keys into separate sstables has the nice property that you don't even have to skip over the keys during reads.
Separating out the historic MVCC keys into separate sstables has the nice property that you don't even have to skip over the keys during reads.
I suspect @jbowens was narrowly thinking of writing to two blob files for the separated values in a sstable -- that sounds like a good idea.
Regarding skipping over keys, I think this is similar to #1170 which is now closed, post implementation of value blocks in sstables. The reason we did not separate the keys is that we have no way of knowing when writing a sstable in a compaction about which keys represent committed and resolved values. If we separate what we think are older keys and values, and the latest one is actually not committed and resolved, the MVCC get/scan will be incorrect.
Ack that a compaction can't know if the latest MVCC version is committed or not, but isn't it true that if there are two MVCC versions for a key that the second (older version) must be committed?
The current (in-progress) design of value separation (#112) writes all the values of a sstable to a single blob file during compactions that write blob files. If there exists significant MVCC garbage, the MVCC garbage can reduce locality of access for reads at recent MVCC timestamps. We could consider instead storing these MVCC garbage values in a separate blob file.
This has the added advantage that in CockroachDB, MVCC garbage is expected to be deleted once the GC ttl elapses. That means we expect all the values within a garbage blob file to eventually be deleted. We may be able to delete the blob file even before a relevant blob-rewriting compaction is scheduled if non-blob-rewriting compactions compact the tombstones written by MVCC GC first.
Jira issue: PEBBLE-363
The text was updated successfully, but these errors were encountered: