perf: timestampCache causes GC-related slowdown #16796

petermattis · 2017-06-29T17:00:05Z

Running a read-only workload on a single node cluster shows performance degradation from ~19k ops/sec to a steady state of ~13.5k ops/sec. Investigation eventually pointed to the timestampCache. Specifically, if timestampCache.AddRequest is disabled, steady state performance is ~18k ops/sec. The workload under investigation is:

kv --read-percent 100 --concurrency 16 --splits 100 --write-seq 100000

Interestingly, the manipulation of the timestampCache.requests btree by AddRequest doesn't seem to be the problem. If AddRequest is tweaked to add and immediately delete the request, performance is good.

Part of the problem seems to be the btree. Replacing the btree with a fixed size ring buffer of approximately the same size results in steady state performance of ~15k ops/sec. The ring buffer experiment isn't a drop in replacement. More work would be required to flesh out its functionality and that additional functionality might end up negating the modest improvement it is providing.

Another experiment was to zero the cacheRequest before inserting it. This brought steady state performance back up to ~17k ops/sec.

My current suspicion is that the non-zeroed cacheRequests are causing additional GC pressure. Enabling GODEBUG=gctrace=1 shows:

	GC CPU
master	13%
disabled timestampCache	7%
zeroed cacheRequest	8%

Note that there are two fields in cacheRequest that zeroing affects with this workload: cacheRequest.span and cacheRequest.reads. I'm not sure what to do here. We need the spans in order to later expand the requests into the interval tree. It is surprising to me that merely holding on to these keys is causing additional GC CPU usage. Suggestions are welcome.

The text was updated successfully, but these errors were encountered:

bdarnell · 2017-06-29T20:55:03Z

Try zeroing the span and reads field independently. I would expect that zeroing span wouldn't make a difference for GC costs (since it just points to bytes), but zeroing reads would (since it's a slice containing pointers).

How much of the Go memory was used by the timestamp cache in these tests? It's usually the biggest single consumer of memory on the go side, in which case it might not be surprising that eliminating it cuts GC costs nearly in half. If that's the case, our options are limited. We could allocate a big slab of memory, copy the keys into it, and store indexes into that slab instead of the pointers in roachpb.Span. Or we could move the whole timestamp cache into C++.

petermattis · 2017-06-29T21:24:56Z

	ops/sec	GC CPU
zero none	14909.6	12%
zero all	17503.2	8%
zero span,reads	16906.9	9%
zero reads	16705.9	10%
zero span	15345.6	12%

"zero all" corresponds to req = cacheRequest{}. Strange that doing so is different than req.span, req.reads = roachpb.RSpan{}, nil.

Go memory was in the 300-600MB range during these tests. The timestamp cache was at capacity (i.e. using the full 64 MB), but it is a bit difficult to tell how much Go memory was being used by the timestamp cache. Presumably somewhat more than 64 MB due to the btree and other overhead that isn't accounted for. Moving the cache to C++ would get rid of the GC issues, but introduce Cgo overhead.

petermattis · 2017-08-01T19:36:12Z

As a sanity check about the feasibility of moving the timestamp cache to C++, TimestampCache.AddRequest takes about ~2us and the various TimestampCache calls in applyTimestampCache consume ~5us. The overhead of a cgo call is 60ns (i.e. 0.06us). So the cgo overhead would be negligible here.

petermattis self-assigned this Jun 29, 2017

petermattis added this to the 1.2 milestone Aug 1, 2017

petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Sep 27, 2017

petermattis assigned nvanbenschoten Oct 16, 2017

This was referenced Oct 17, 2017

storage/tscache: move TimestampCache to its own package #19301

Merged

[WIP] storage: clone Spans before adding them to tscache.Request and use sync.Pool #19418

Merged

nvanbenschoten mentioned this issue Nov 30, 2017

tscache: switch to skiplist implementation by default, remove Request interface #20232

Merged

nvanbenschoten closed this as completed in #20232 Nov 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: timestampCache causes GC-related slowdown #16796

perf: timestampCache causes GC-related slowdown #16796

Uh oh!

Uh oh!

Uh oh!

perf: timestampCache causes GC-related slowdown #16796

perf: timestampCache causes GC-related slowdown #16796

Comments

Uh oh!

Uh oh!

Uh oh!