8000 Optimize FSST decoding by lnkuiper · Pull Request #16508 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Optimize FSST decoding #16508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 5, 2025
Merged

Optimize FSST decoding #16508

merged 12 commits into from
Mar 5, 2025

Conversation

lnkuiper
Copy link
Contributor
@lnkuiper lnkuiper commented Mar 4, 2025

After profiling string reading a bit I noticed that a lot of time was spent decoding FSST, and then copying into a StringVectorBuffer, and realised that this should be decoded directly into the StringVectorBuffer. I pinged @Mytherin, who had already spent some time on this and optimized it really well, but while this greatly improved decoding long strings, it caused a regression for inlined strings (<=12 bytes).

This PR picks up where he left off and optimizes the code path for inlined strings. This code path is taken when we know, based on statistics, that all strings in a row group are inlined. In these cases, we can decode directly into the string_t. To make this efficient, we need some "overflow space", i.e., some room after the string_t such that FSST knows that it should always be able to decode the string without running out of space.

I created a table from TPC-H SF100 and ran these queries:

create table strings as select l_comment as long, l_comment[:12] as short from lineitem;
select any_value(long) from strings; -- 2.60s -> 1.53s
select any_value(short) from strings; -- 1.20s -> 1.03s 

The long strings were already much faster with Mark's changes, and with my changes on top, the short strings don't regress (and are even slightly faster).

auto compressed_string_ptr = (unsigned char *)compressed_string; // NOLINT
auto fsst_decoder = reinterpret_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder);
auto compressed_string_ptr = reinterpret_cast<const unsigned char *>(compressed_string);
auto fsst_decoder = static_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, perhaps we can replace the use of void * by forward-declaring duckdb_fsst_decoder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on purpose by Sam >2 years ago. From fsst.h:

/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */
typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */

It was purposely forward-declared as an opaque type there, so I would rather not touch this.

Copy link
Contributor
@Tishj Tishj Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so it's already a void* underneath, but then I don't get why we are moving it around as a void* and then casting it to duckdb_fsst_encoder_t * in here, that's what I meant.

We should be able to use duckdb_fsst_encoder_t * everywhere instead of void*
We still need to forward declare though because we don't want to include third_party headers in core headers

So then we need to repeat the typedef void* duckdb_fsst_encoder_t; ?
Hmm it's weird, maybe we leave it alone for now then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on purpose by Sam >2 years ago. From fsst.h:

/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */
typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */

It was purposely forward-declared as an opaque type there, so I would rather not touch this.

Actually, that's encoder, not decoder

8000
@@ -97,6 +97,10 @@ struct string_t {
return value.inlined.length;
}

void SetSize(uint32_t size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I like having a method for this.
If this is used incorrectly it will cause problems down the line.

For example if it was used to create a non-inlined string and someone thinks: "Ah I can make a substring out of this very easily with this" and the size would now indicate that it's inlined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can add an Unsafe to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do this:

new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));

To achieve exactly the same, that saves us from creating the method at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps SetSizeAndFinalize, which does both, and then also calls VerifyCharacters()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do this:

new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));

To achieve exactly the same, that saves us from creating the method at all.

This looks scary, I like the SetSizeAndFinalize more

const auto target_ptr = str_buffer.AllocateBuffer(max_uncompressed_length);
const auto decompressed_string_size = duckdb_fsst_decompress(
fsst_decoder, compressed_string_len, compressed_string_ptr, max_uncompressed_length, target_ptr);
return str_buffer.FinalizeBuffer(target_ptr, max_uncompressed_length, decompressed_string_size);
Copy link
Contributor
@Tishj Tishj Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about FinalizeBuffer
If I understand it correctly, it should only be used in this scenario, where we need to potentially shrink the buffer because we allocated too much?

Can this method live outside of the VectorStringBuffer?
Just so it doesn't get confused by someone in the future as a necessary step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it live outside of FinalizeBuffer would require making the StringHeap in VectorStringBuffer public instead of private, or am I missing something? I don't think that's worth it.

The comments above AllocateBuffer and FinalizeBuffer are very descriptive (i.e., saying that these should be used in conjunction) and should guide developers in how to use these functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead make a PotentiallyShrinkableStringBuffer (name needs work 😅 ):

struct PotentiallyShrinkableStringBuffer {
public:
	static PotentiallyShrinkableStringBuffer Create(Allocator &allocator, idx_t alloc_len);
	string_t Finalize(idx_t len);
private:
	Allocator &allocator;
	idx_t alloc_len;
	data_ptr_t buffer;
	bool finalized;
};

Just so we don't pollute VectorStringBuffer with AllocateBuffer and FinalizeBuffer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do friend class FSSTPrimitives; and make the methods private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will actually just rename to AllocateShrinkableBuffer and FinalizeShrinkableBuffer, this should avoid confusion

D_ASSERT(str_len <= alloc_len);
bool is_not_inlined = str_len > string_t::INLINE_LENGTH;
idx_t shrink_count = alloc_len - (str_len * is_not_inlined);
allocator.ShrinkHead(shrink_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. 628C Learn more.

Can this also verify that the buffer is part of the head of the allocator?

@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 5, 2025 11:28
@lnkuiper lnkuiper marked this pull request as ready for review March 5, 2025 11:28
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 5, 2025 11:34
@lnkuiper lnkuiper marked this pull request as ready for review March 5, 2025 11:44
Copy link
Contributor
@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now 👍

@Mytherin Mytherin merged commit 3c5694a into duckdb:main Mar 5, 2025
51 checks passed
@Mytherin
Copy link
Collaborator
Mytherin commented Mar 5, 2025

Thanks!

@lnkuiper lnkuiper deleted the fsstscan branch April 14, 2025 09:10
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0