-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Optimize FSST decoding #16508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize FSST decoding #16508
Conversation
auto compressed_string_ptr = (unsigned char *)compressed_string; // NOLINT | ||
auto fsst_decoder = reinterpret_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder); | ||
auto compressed_string_ptr = reinterpret_cast<const unsigned char *>(compressed_string); | ||
auto fsst_decoder = static_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, perhaps we can replace the use of void *
by forward-declaring duckdb_fsst_decoder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done on purpose by Sam >2 years ago. From fsst.h
:
/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */
typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */
It was purposely forward-declared as an opaque type there, so I would rather not touch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so it's already a void*
underneath, but then I don't get why we are moving it around as a void*
and then casting it to duckdb_fsst_encoder_t *
in here, that's what I meant.
We should be able to use duckdb_fsst_encoder_t *
everywhere instead of void*
We still need to forward declare though because we don't want to include third_party headers in core headers
So then we need to repeat the typedef void* duckdb_fsst_encoder_t;
?
Hmm it's weird, maybe we leave it alone for now then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done on purpose by Sam >2 years ago. From
fsst.h
:/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */ typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */It was purposely forward-declared as an opaque type there, so I would rather not touch this.
Actually, that's encoder
, not decoder
@@ -97,6 +97,10 @@ struct string_t { | |||
return value.inlined.length; | |||
} | |||
|
|||
void SetSize(uint32_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I like having a method for this.
If this is used incorrectly it will cause problems down the line.
For example if it was used to create a non-inlined string and someone thinks: "Ah I can make a substring out of this very easily with this" and the size would now indicate that it's inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can add an Unsafe
to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do this:
new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));
To achieve exactly the same, that saves us from creating the method at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps SetSizeAndFinalize
, which does both, and then also calls VerifyCharacters()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do this:
new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));To achieve exactly the same, that saves us from creating the method at all.
This looks scary, I like the SetSizeAndFinalize
more
src/include/duckdb/common/fsst.hpp
Outdated
const auto target_ptr = str_buffer.AllocateBuffer(max_uncompressed_length); | ||
const auto decompressed_string_size = duckdb_fsst_decompress( | ||
fsst_decoder, compressed_string_len, compressed_string_ptr, max_uncompressed_length, target_ptr); | ||
return str_buffer.FinalizeBuffer(target_ptr, max_uncompressed_length, decompressed_string_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about FinalizeBuffer
If I understand it correctly, it should only be used in this scenario, where we need to potentially shrink the buffer because we allocated too much?
Can this method live outside of the VectorStringBuffer?
Just so it doesn't get confused by someone in the future as a necessary step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it live outside of FinalizeBuffer
would require making the StringHeap
in VectorStringBuffer
public
instead of private
, or am I missing something? I don't think that's worth it.
The comments above AllocateBuffer
and FinalizeBuffer
are very descriptive (i.e., saying that these should be used in conjunction) and should guide developers in how to use these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead make a PotentiallyShrinkableStringBuffer
(name needs work 😅 ):
struct PotentiallyShrinkableStringBuffer {
public:
static PotentiallyShrinkableStringBuffer Create(Allocator &allocator, idx_t alloc_len);
string_t Finalize(idx_t len);
private:
Allocator &allocator;
idx_t alloc_len;
data_ptr_t buffer;
bool finalized;
};
Just so we don't pollute VectorStringBuffer with AllocateBuffer
and FinalizeBuffer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do friend class FSSTPrimitives;
and make the methods private
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will actually just rename to AllocateShrinkableBuffer
and FinalizeShrinkableBuffer
, this should avoid confusion
D_ASSERT(str_len <= alloc_len); | ||
bool is_not_inlined = str_len > string_t::INLINE_LENGTH; | ||
idx_t shrink_count = alloc_len - (str_len * is_not_inlined); | ||
allocator.ShrinkHead(shrink_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. 628C Learn more.
Can this also verify that the buffer is part of the head of the allocator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now 👍
Thanks! |
Optimize FSST decoding (duckdb/duckdb#16508)
Optimize FSST decoding (duckdb/duckdb#16508)
Optimize FSST decoding (duckdb/duckdb#16508)
Optimize FSST decoding (duckdb/duckdb#16508)
Optimize FSST decoding (duckdb/duckdb#16508)
Optimize FSST decoding (duckdb/duckdb#16508)
After profiling string reading a bit I noticed that a lot of time was spent decoding FSST, and then copying into a
StringVectorBuffer
, and realised that this should be decoded directly into theStringVectorBuffer
. I pinged @Mytherin, who had already spent some time on this and optimized it really well, but while this greatly improved decoding long strings, it caused a regression for inlined strings (<=12 bytes).This PR picks up where he left off and optimizes the code path for inlined strings. This code path is taken when we know, based on statistics, that all strings in a row group are inlined. In these cases, we can decode directly into the
string_t
. To make this efficient, we need some "overflow space", i.e., some room after thestring_t
such that FSST knows that it should always be able to decode the string without running out of space.I created a table from TPC-H SF100 and ran these queries:
The long strings were already much faster with Mark's changes, and with my changes on top, the short strings don't regress (and are even slightly faster).