-
Notifications
You must be signed in to change notification settings - Fork 3.9k
tscache: introduce skiplist-backed TimestampCache #19508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tscache: introduce skiplist-backed TimestampCache #19508
Conversation
Very exciting work, @nvanbenschoten! Does the PS I think you may have accidentally amended one of Andy's commits ( Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 6 of 6 files at r3. pkg/storage/helpers_test.go, line 335 at r3 (raw file):
Such a satisfying diff. Comments from Reviewable |
Very cool. Did you come up with a solution for keeping entries in the cache for |
I haven't verified this, but they should allow for roughly the same number of entries because the currently used I also agree that
@spencerkimball and I were discussing this last night. I was thinking that there could be some form of a dynamic |
Heh, that's exactly what I was thinking. We'd have to maintain the max timestamp per fixedCache, but that seems relatively straightforward. |
How would the lookup work in that scenario? You'd check the first cache, and if you fall through to the low watermark but you'd rather have something smaller, you check the next level, until you bottom out? That doesn't seem unreasonable, given that requests near the write front wouldn't usually have to check anything but the first level. |
Actually, that's exactly what https://github.com/andy-kimball/tscache already does! |
Finally got around to looking at the implementation. Very neat! It is a bit wasteful that every read/write span for request gets a copy of the timestamp and the transaction ID, especially given that the data is immutable once it is stored in the arena. For a given request, it would be a significant space savings if we could store the timestamp/txnID once in the arena and then have each node store the offset. I think this could be done, though there are definite challenges. Certainly not worthwhile exploring in the near term. |
@petermattis that's a good idea which could certainly be done, and it may even be something we'd want to look into sooner rather than later. In the case where all nodes have a key and a gap value, it would result in a ~22% space saving |
This change adds a retention policy to `intervalSkl` that it should never evict pages that contain intervals within a certain window of recency. This will allow the structure to exhibit the `MinRetentionWindow` semantics that `tscache.Cache` implementations require. We do this by replacing the `earlier` and `later` `sklPages` with a linked list of `sklPages` that can dynamically grow when necessary. See cockroachdb#19508 (comment) for more discussion.
This change adds a retention policy to `intervalSkl` that it should never evict pages that contain intervals within a certain window of recency. This will allow the structure to exhibit the `MinRetentionWindow` semantics that `tscache.Cache` implementations require. We do this by replacing the `earlier` and `later` `sklPages` with a linked list of `sklPages` that can dynamically grow when necessary. See cockroachdb#19508 (comment) for more discussion.
cc4d267
to
2686d91
Compare
I've rebased this on top of #19974, so it's now using the @petermattis I did some testing with this using |
@nvanbenschoten Are you waiting on review of this or is there still more work to do here? Review status: 0 of 11 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. Comments from Reviewable |
No, I'm not waiting on a review at the moment. Before pushing this forward I want to improve the randomized testing we perform across |
2686d91
to
e702f49
Compare
This has been rebased off of #20087 and is ready to be reviewed once that gets resolved. |
1985b71
to
587d44d
Compare
@bdarnell would it be possible to stress this (using the |
We're not set up to easily run multiple configurations of the jepsen tests, but you could temporarily add a commit to this PR to default that variable to true and then run a one-off build on this branch. |
@bdarnell aren't there current failures in Jepsen that might be confusing to sort through while testing this PR? |
Yes, but I'm assuming we don't want to wait for all of those to be resolved before testing this change. You'll just have to look at the logs (I can help with this) to see whether it's an inconsistency or some other automation/glue problem (All the current problems are the latter). |
8f38921
to
0860410
Compare
0860410
to
cf06ace
Compare
This has been rebased off master and is now ready for review. It got most of the way through Jepsen before an SSH failure. I'm running the two tests that didn't get to finish again. |
9289a7a
to
cf53406
Compare
Reviewed 9 of 10 files at r5, 3 of 3 files at r6. build/jepsen-common.sh, line 54 at r6 (raw file):
Just leaving a comment to double-check that this is reverted before merge. pkg/storage/tscache/cache_test.go, line 515 at r6 (raw file):
Timestampts Comments from Reviewable |
This change uses the `intervalSkl` structure to create a new `sklImpl` type that implements the `Cache` interface.
cf53406
to
45ad7d5
Compare
TFTR! I think you mistook a new test as being the same as an existing one though. Review status: 8 of 10 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/storage/tscache/cache_test.go, line 515 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
This comment was copied from build/jepsen-common.sh, line 54 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
Reviewed 2 of 9 files at r4, 4 of 4 files at r7. Comments from Reviewable |
This is still a major work in progress, but I wanted to publish it in-case others wanted
to play around with it as well. I'm going to continue experimenting with this over the
next few days.
The change creates a new implementation of
tscache.Cache
that's backed by a concurrentskiplist. It uses a fork of https://github.com/andy-kimball/tscache, which contains a similar
timestamp cache built on top of an arena-based skiplist (https://github.com/andy-kimball/arenaskl).
A couple of important modifications were performed to make the structure usable in CockroachDB, which are located over in https://github.com/nvanbenschoten/tscache. The actual changes here
are minimal, and amount to creating a wrapper type around the library and ripping out now-unnecessary locking.
Nothing here is polished, but that wasn't the intention for now. The important part is that all tests pass with the new
tscache.Cache
implementation plugged in!Preliminary benchmarking shows impressive results. Here, the effect of the arena allocator is obvious. The properties of the skiplist also allow us to avoid a
request
interval map like we have in the current code, which I predict is also contributing to speedup. The following lists some (hastily gathered) performance diffs:kv, 100% reads
kv --read-percent 100 --concurrency 16 --splits 100 --write-seq 100000
kv, 80% reads
kv --read-percent 80 --concurrency 16 --splits 100 --write-seq 100000
tpcc
default settings
cc. @petermattis @spencerkimball @andy-kimball