Optimize FSST decoding #16508

lnkuiper · 2025-03-04T14:02:21Z

After profiling string reading a bit I noticed that a lot of time was spent decoding FSST, and then copying into a StringVectorBuffer, and realised that this should be decoded directly into the StringVectorBuffer. I pinged @Mytherin, who had already spent some time on this and optimized it really well, but while this greatly improved decoding long strings, it caused a regression for inlined strings (<=12 bytes).

This PR picks up where he left off and optimizes the code path for inlined strings. This code path is taken when we know, based on statistics, that all strings in a row group are inlined. In these cases, we can decode directly into the string_t. To make this efficient, we need some "overflow space", i.e., some room after the string_t such that FSST knows that it should always be able to decode the string without running out of space.

I created a table from TPC-H SF100 and ran these queries:

create table strings as select l_comment as long, l_comment[:12] as short from lineitem;
select any_value(long) from strings; -- 2.60s -> 1.53s
select any_value(short) from strings; -- 1.20s -> 1.03s

The long strings were already much faster with Mark's changes, and with my changes on top, the short strings don't regress (and are even slightly faster).

…ate method

Tishj · 2025-03-05T08:23:08Z

src/common/fsst.cpp

-	auto compressed_string_ptr = (unsigned char *)compressed_string; // NOLINT
-	auto fsst_decoder = reinterpret_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder);
+	auto compressed_string_ptr = reinterpret_cast<const unsigned char *>(compressed_string);
+	auto fsst_decoder = static_cast<duckdb_fsst_decoder_t *>(duckdb_fsst_decoder);


While we're at it, perhaps we can replace the use of void * by forward-declaring duckdb_fsst_decoder

This was done on purpose by Sam >2 years ago. From fsst.h:

/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */ typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */

It was purposely forward-declared as an opaque type there, so I would rather not touch this.

Ah so it's already a void* underneath, but then I don't get why we are moving it around as a void* and then casting it to duckdb_fsst_encoder_t * in here, that's what I meant.

We should be able to use duckdb_fsst_encoder_t * everywhere instead of void*
We still need to forward declare though because we don't want to include third_party headers in core headers

So then we need to repeat the typedef void* duckdb_fsst_encoder_t; ?
Hmm it's weird, maybe we leave it alone for now then

This was done on purpose by Sam >2 years ago. From fsst.h:

/* Data structure needed for compressing strings - use duckdb_fsst_duplicate() to create thread-local copies. Use duckdb_fsst_destroy() to free. */ typedef void* duckdb_fsst_encoder_t; /* opaque type - it wraps around a rather large (~900KB) C++ object */

It was purposely forward-declared as an opaque type there, so I would rather not touch this.

Actually, that's encoder, not decoder

Tishj · 2025-03-05T08:34:31Z

src/include/duckdb/common/types/string_type.hpp

@@ -97,6 +97,10 @@ struct string_t {
 		return value.inlined.length;
 	}

+	void SetSize(uint32_t size) {


I don't know if I like having a method for this.
If this is used incorrectly it will cause problems down the line.

For example if it was used to create a non-inlined string and someone thinks: "Ah I can make a substring out of this very easily with this" and the size would now indicate that it's inlined

Perhaps we can add an Unsafe to it?

We could also do this:

new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));

To achieve exactly the same, that saves us from creating the method at all.

Perhaps SetSizeAndFinalize, which does both, and then also calls VerifyCharacters()?

We could also do this:

new (&result.str) string_t(UnsafeNumericCast<uint32_t>(decompressed_string_size));

To achieve exactly the same, that saves us from creating the method at all.

This looks scary, I like the SetSizeAndFinalize more

Tishj · 2025-03-05T08:56:19Z

src/include/duckdb/common/fsst.hpp

+		const auto target_ptr = str_buffer.AllocateBuffer(max_uncompressed_length);
+		const auto decompressed_string_size = duckdb_fsst_decompress(
+		    fsst_decoder, compressed_string_len, compressed_string_ptr, max_uncompressed_length, target_ptr);
+		return str_buffer.FinalizeBuffer(target_ptr, max_uncompressed_length, decompressed_string_size);


I'm not sure how I feel about FinalizeBuffer
If I understand it correctly, it should only be used in this scenario, where we need to potentially shrink the buffer because we allocated too much?

Can this method live outside of the VectorStringBuffer?
Just so it doesn't get confused by someone in the future as a necessary step

Having it live outside of FinalizeBuffer would require making the StringHeap in VectorStringBuffer public instead of private, or am I missing something? I don't think that's worth it.

The comments above AllocateBuffer and FinalizeBuffer are very descriptive (i.e., saying that these should be used in conjunction) and should guide developers in how to use these functions.

Can we instead make a PotentiallyShrinkableStringBuffer (name needs work 😅 ):

struct PotentiallyShrinkableStringBuffer { public: static PotentiallyShrinkableStringBuffer Create(Allocator &allocator, idx_t alloc_len); string_t Finalize(idx_t len); private: Allocator &allocator; idx_t alloc_len; data_ptr_t buffer; bool finalized; };

Just so we don't pollute VectorStringBuffer with AllocateBuffer and FinalizeBuffer ?

We could also do friend class FSSTPrimitives; and make the methods private?

I will actually just rename to AllocateShrinkableBuffer and FinalizeShrinkableBuffer, this should avoid confusion

Tishj · 2025-03-05T09:28:54Z

src/include/duckdb/common/types/vector_buffer.hpp

+		D_ASSERT(str_len <= alloc_len);
+		bool is_not_inlined = str_len > string_t::INLINE_LENGTH;
+		idx_t shrink_count = alloc_len - (str_len * is_not_inlined);
+		allocator.ShrinkHead(shrink_count);


Can this also verify that the buffer is part of the head of the allocator?

Tishj

LGTM now 👍

Mytherin · 2025-03-05T16:32:49Z

Thanks!

Optimize FSST decoding (duckdb/duckdb#16508)

Mytherin and others added 10 commits January 15, 2025 10:29

FSST Scan: speed up scan by decompressing directly into the StringHeap

d3aa87b

Merge branch 'main' into fsstscan

574f335

Inline ArenaAllocator Allocate call, move AllocateNewBlock to a separ…

646f177

…ate method

Avoid a branch in FinalizeBuffer

6fc353c

Merge branch 'fsstscan' of github.com:mytherin/duckdb into fsstscan

4687586

get stringbuffer once instead of for every value

a5dc2f8

removing more unnecessary per-value overheads

56174e5

fast path for inlined fsst

42690b1

Merge branch 'main' into fsstscan

51b26bc

don't just assert, but throw an exception

72effd8

Tishj reviewed Mar 5, 2025

View reviewed changes

implement PR feedback

a9ce4d3

duckdb-draftbot marked this pull request as draft March 5, 2025 11:28

lnkuiper marked this pull request as ready for review March 5, 2025 11:28

also rename functions here

e56984d

duckdb-draftbot marked this pull request as draft March 5, 2025 11:34

lnkuiper marked this pull request as ready for review March 5, 2025 11:44

Tishj approved these changes Mar 5, 2025

View reviewed changes

lnkuiper added the Ready To Merge label Mar 5, 2025

Mytherin merged commit 3c5694a into duckdb:main Mar 5, 2025
51 checks passed

lnkuiper deleted the fsstscan branch April 14, 2025 09:10

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

13d756e

Optimize FSST decoding (duckdb/duckdb#16508)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

13fafda

Optimize FSST decoding (duckdb/duckdb#16508)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

b1c82e6

Optimize FSST decoding (duckdb/duckdb#16508)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

eb752e0

Optimize FSST decoding (duckdb/duckdb#16508)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

0e20d3b

Optimize FSST decoding (duckdb/duckdb#16508)

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025

vendor: Update vendored sources to duckdb/duckdb@3c5694a

22dc2bd

Optimize FSST decoding (duckdb/duckdb#16508)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize FSST decoding #16508

Optimize FSST decoding #16508

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize FSST decoding #16508

Optimize FSST decoding #16508

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!