Move NameRef and SymbolRef kind bits to end for better varint packing. #4012
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Move NameRef and SymbolRef kind bits to end for better varint packing.
Motivation
We serialize
u4
s using variable-length integers (varints) to reduce the size of the payload baked into Sorbet and stored on disk in the cache. Varints are really nice for small numbers (numbers up to 2^7-1 fit in 1 byte!) but have overhead for large numbers (28+ bits require 5 bytes since it stores 7 bits of data per byte).We store SymbolRef/NameRefs as u4s in the cache using their rawId, which encodes their kind and their index in that kind's table. Storing the kind bits in the upper bits of rawId ensures that ~all of these references are stored inefficiently in the cache (5 bytes instead of 1 for small IDs!). I hadn't thought of this downside when initially engineering them this way.
A secondary consequence of this inefficiency is that it's slower to deserialize and serialize an integer from 5 bytes rather than 1 or 2. I am very interested in optimizing deserialization for my compressed AST work, which is the primary motivation of this work.
Fortunately, the solution is simple: Move the kind bits to the end of the rawId. This solution shrinks payload by ~4% (0.1MB) and Stripe's cache directory (which contains ASTs) by ~2%. I do not see any noticeable changes in end-to-end runtime.
Test plan
See included automated tests.