When performing a three-way merge, Use a patchBuffer to build the new primary index instead of a MutableMap. #9229
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MutableMaps are designed for caching point modifications to a table. But during merge, the new primary index is computed sequentially. There's no benefit to using a MutableMap here.
MutableMaps are built on top of the
ApplyMutations
function, which takes a sequential stream of modifications (called a PatchBuffer) and applies them to a chunker. This has the added benefit of being parallelizable: the patches are produced in one goroutine and consumed in another.Instead of using the MutableMap, we can extract the underlying PatchBuffer and use it directly. This should be both more performant, and more correct as it avoids a failure case with schema merges where the MutableMap flushes changes to disk and writes a chunk containing rows with different schemas.