8000 tscache: switch to skiplist implementation by default, remove Request interface by nvanbenschoten · Pull Request #20232 · cockroachdb/cockroach · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

tscache: switch to skiplist implementation by default, remove Request interface #20232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nvanbenschoten
Copy link
Contributor
@nvanbenschoten nvanbenschoten commented Nov 23, 2017

Follow up to #19508.
Closes #4768.
Closes #6190.
Closes #14518.
Closes #16901.
Closes #16796.
Closes #20123.
Part of #17172.

This change switches the default tscache.Cache implementation to sklImpl. In doing so, it performs a series of cleanup steps that improve the external interface of tscache.Cache and improve sklImpl's performance moderately. This comes at the cost of removing some optimizations built into the treeImpl. Since we're no longer going to use that outside of testing, its performance matters less, so the tradeoff is worth it. The change is broken into a series of steps so that each one is self-contained:

  1. Switch the tscache package to create sklImpl instances by default.
  2. Make treeImpl thread-safe by adding internal locking.
  3. Remove the ThreadSafe method from tscache.Cache. Remove the tsCacheMu lock from Store.
  4. Remove the requests tree optimization from treeImpl.
  5. Remove the Request type and {Add,Expand}Request methods from the tscache package. tscache.Cache now exports its Add method, which is used by Replica directly.

Before merging this, I want to see at least a few successful runs of Jepsen with the change enabled.

I'll also add some concrete benchmarks from before and after this change.

@nvanbenschoten nvanbenschoten requested a review from a team November 23, 2017 00:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten requested a review from a team November 23, 2017 08:34
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/sklImplCleanup branch 5 times, most recently from 8af3889 to f5f7c2c Compare November 27, 2017 16:50
@nvanbenschoten
Copy link
Contributor Author
nvanbenschoten commented Nov 28, 2017

This change just passed Jepsen: https://teamcity.cockroachdb.com/viewLog.html?buildId=421110. I'll run it a few times, but at this point, I don't expect anything to change.

Also, to confirm that the change will produce benchmark numbers comparable to what we saw in the initial investigation, I ran ./kv --read-percent=95 --splits=100 --concurrency=48 --write-seq 100000 across a 3 node cluster with and without the change. The numbers looked close to what we expected:

BEFORE:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.0s        0        3849972        12833.1      3.7      1.2     23.1     48.2    285.2

BenchmarkLoadgenKV/readPercent=95/splits=100/concurrency=48/duration=5m0s	 3849972	     77923 ns/op

AFTER:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.0s        0        4589871        15299.5      3.1      1.2     18.9     46.1    268.4

BenchmarkLoadgenKV/readPercent=95/splits=100/concurrency=48/duration=5m0s	 4589871	     65361 ns/op

Throughput increased by 19%
50th percentile latency dropped by 16%
95th percentile latency dropped by 19%
99th percentile latency dropped by 4%

The only surprising part is that my testing here didn't seem to affect 99th percentile latency. I think that's because the initial testing was using a single node cluster and the testing here is on a three node cluster, so there are other dominant factors that are now influencing the 99th percentile latency.

@tbg
Copy link
Member
tbg commented Nov 30, 2017

:lgtm:! So good to see this coming to a close.


Reviewed 4 of 4 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 9 of 9 files at r4, 4 of 4 files at r5, 10 of 10 files at r6.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/storage/replica.go, line 1995 at r6 (raw file):

func (r *Replica) updateTimestampCache(ba *roachpb.BatchRequest, br *roachpb.BatchResponse) {
	// Create booleans to be passed as tsCache.Add's readCache argument.
	updateWriteCache := false

I got confused by these names. Do you need both of these variables or can you get away with a single variable? I don't see any mutations of updateWriteCache. Maybe reduce to one and call it readOnlyUseReadCache?


Comments from Reviewable

The new `sklImpl` tscache structure includes a large number of
preemption points when race tested. This change adds a new
`util.EnableRacePreemptionPoints` function so that these preemption
points are only used in specific tests.
This change makes `treeImpl` thread-safe, as is reflected by its
`ThreadSafe` method. This makes the structure slightly less efficient
because it will result in locking at a finer-granularity. Now that it's
not used by default, that's no a huge deal.
Because all `Cache` implementation must now be thread-safe, this change
removes the `ThreadSafe` method entirely. It then removes the `tsCacheMu`
lock on `Store`.
This change removes the `requests` btree from treeImpl. Since `treeImpl`
is no longer used by default, this optimization is no longer important.

This will allow us to remove `tscache.Request` completely.
Now that all `tscache.Cache` implementations are thread safe and do nothing in
their `ExpandRequests` methods, we can remove the idea of a `tscache.Request`
entirely. We do this by removing the associated methods and type and instead
exporting the `Add` method to be used directly. This avoids all associated
overhead of the `Request` struct, which is a big improvement for `sklImpl`.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/sklImplCleanup branch from 5b1f93c to 0cf73ec Compare November 30, 2017 18:06
@nvanbenschoten
Copy link
Contributor Author

TFTR!


Review status: 3 of 17 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/storage/replica.go, line 1995 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I got confused by these names. Do you need both of these variables or can you get away with a single variable? I don't see any mutations of updateWriteCache. Maybe reduce to one and call it readOnlyUseReadCache?

Done.


Comments from Reviewable

@nvanbenschoten nvanbenschoten merged commit 4b42b3c into cockroachdb:master Nov 30, 2017
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/sklImplCleanup branch November 30, 2017 18:43
@petermattis
Copy link
Collaborator

🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0