Efficient way to find approximate median in data range #11111

locker · 2025-02-07T08:30:01Z

The range-based sharding strategy requires the ability to quickly find a median in a data range so that it can split ranges for load balancing. With the memtx storage engine, this can be done efficiently using index:count(), which was recently optimized to have the logarithmic complexity, see #8204. In vinyl, however, index:count() is still prohibitively slow and it's unclear whether it's possible to optimize it at all.

The suggestion is to add a new API function that could be used for finding a median with a reasonable approximation. For example, we could probably take the deepest run file and find the median key there, assuming that updates done in more recent run files are unlikely to shift the median substantially.

The text was updated successfully, but these errors were encountered:

locker · 2025-02-25T13:10:21Z

RFC: Index percentile #11194

The new method will be used by the quantile() public index API function. The stub raises "method unsupported" error. Part of tarantool#11111 NO_DOC=internal NO_TEST=stub NO_CHANGELOG=internal

The new function takes three arguments: level, begin_key, and end_key. It returns the quantile point key from the target range corresponding to the given level. It may return nil if the range is too small, in particular, if there are no tuples in it. The new function works only with TREE indexes. Currently, it isn't implemented by any engine. Part of tarantool#11111 NO_DOC=added later NO_CHANGELOG=added later

The implementation is trivial: we simply return the key at the offset equal to level * (end - begin) using the offsets built into the tree. The only case when we return nil is when the target range is empty. Part of tarantool#11111 NO_DOC=added later NO_CHANGELOG=added later

@TarantoolBot

We find the quantile key assuming that all keys are stored at the deepest disk layer while newer layers contain incremental updates. First, we walk over all index ranges overlapping with the target range and estimate the number of tuples in them. Then we compute the offset of the quantile key and walk over the index ranges again to find the key. To achieve that, we need two helper functions: 1. vy_run_estimate_stmt_count() estimates the number of tuples in the given run within the specified range. Basically, it finds the number of pages falling in the range, then scales down the total number of rows stored in the run proportionally, assuming that each page stores approximately the same number of rows. 2. vy_run_estimate_key_at() finds a key at the given offset. Like vy_run_estimate_stmt_count(), this function assumes that each page stores approximately the same number of rows to find the page where the target key is located, then it returns the page min key, which should be a good enough approximation. None of the functions accesses disk (yields). Note that the target key may happen to be outside the target range, for example, if the range is so small that there's no page fitting in it. To follow the index:quantile() function protocol, we ignore such keys (return NULL). Also note that runs store extended keys so we need to strip extra parts before returning the found key to the user. Closes tarantool#11111 @TarantoolBot document Title: Document `index:quantile()` The new index method finds a quantile point in an indexed data range. It takes the quantile level (0 < L < 1) and optionally the target range boundaries and returns such a key that the ratio of tuples less than the key equals L. Example: ```lua local s = box.schema.space.create('test') s:create_index('pk') for i = 1, 100 do s:insert({i * i, i}) end -- Find the median in the whole index. s.index.pk:quantile(0.5) -- returns {2601} -- Find the 90th percentile among all keys < 1000 s.index.pk:quantile(0.9, {}, {1000}) -- returns {784} ``` The target range is defined as the intersection of the following read requests: ```lua s:select(begin_key, {iterator = 'ge'}) s:select(end_key, {iterator = 'lt'}) ``` This means that using the empty key `{}` or `nil` for both `begin_key` and `end_key` finds the quantile in the whole index. The function raises an error if there can't possibly be any tuples in the target range, i.e. if `key_begin` >= `key_end`. The function returns `nil` if there are no tuples in the target range or if it failed to find the quantile due to the limitations of the storage engine, which are described below. The function is implemented by memtx and vinyl tree indexes only. In memtx the function returns the quantile point exactly. It has the logarithmic complexity. It may return `nil` only if there are no tuples in the target range. There's guaranteed to be a tuple in the target range corresponding to the returned key. The vinyl implementation returns the quantile point with a reasonable degree of error for performance considerations. It has the logarithmic complexity as well. There may or may not be a tuple in the target range corresponding to the returned key. The function may return nil even if the target range is not empty, in particular, if there are no disk layers or if the range is smaller than the vinyl page size. The function doesn't yield. The function doesn't participate in transactions. Like other index methods, `index:quantile()` can be used through a space object as `space:quantile()`, in which case it works as a shortcut for the primary index. For more details, see the RFC document: https://github.com/orgs/tarantool/discussions/11194

The new function takes three arguments: level, begin_key, and end_key. It returns the quantile point key from the target range corresponding to the given level. It may return nil if the range is too small, in particular, if there are no tuples in it. The new function works only with TREE indexes. Currently, it isn't implemented by any engine. Part of tarantool#11111 NO_DOC=added later NO_CHANGELOG=added later

The implementation is trivial: we simply return the key at the offset equal to level * (end - begin) using the offsets built into the tree. The only case when we return nil is when the target range is empty. Part of tarantool#11111 NO_DOC=added later NO_CHANGELOG=added later

@TarantoolBot

We find the quantile key assuming that all keys are stored at the deepest disk layer while newer layers contain incremental updates. First, we walk over all index ranges overlapping with the target range and estimate the number of tuples in them. Then we compute the offset of the quantile key and walk over the index ranges again to find the key. To achieve that, we need two helper functions: 1. vy_run_estimate_stmt_count() estimates the number of tuples in the given run within the specified range. Basically, it finds the number of pages falling in the range, then scales down the total number of rows stored in the run proportionally, assuming that each page stores approximately the same number of rows. 2. vy_run_estimate_key_at() finds a key at the given offset. Like vy_run_estimate_stmt_count(), this function assumes that each page stores approximately the same number of rows to find the page where the target key is located, then it returns the page min key, which should be a good enough approximation. None of the functions accesses disk (yields). Note that the target key may happen to be outside the target range, for example, if the range is so small that there's no page fitting in it. To follow the index:quantile() function protocol, we ignore such keys (return NULL). Also note that runs store extended keys so we need to strip extra parts before returning the found key to the user. Closes tarantool#11111 @TarantoolBot document Title: Document `index:quantile()` The new index method finds a quantile point in an indexed data range. It takes the quantile level (0 < L < 1) and optionally the target range boundaries and returns such a key that the ratio of tuples less than the key equals L. Example: ```lua local s = box.schema.space.create('test') s:create_index('pk') for i = 1, 100 do s:insert({i * i, i}) end -- Find the median in the whole index. s.index.pk:quantile(0.5) -- returns {2601} -- Find the 90th percentile among all keys < 1000 s.index.pk:quantile(0.9, {}, {1000}) -- returns {784} ``` The target range is defined as the intersection of the following read requests: ```lua s:select(begin_key, {iterator = 'ge'}) s:select(end_key, {iterator = 'lt'}) ``` This means that using the empty key `{}` or `nil` for both `begin_key` and `end_key` finds the quantile in the whole index. The function raises an error if there can't possibly be any tuples in the target range, i.e. if `key_begin` >= `key_end`. The function returns `nil` if there are no tuples in the target range or if it failed to find the quantile due to the limitations of the storage engine, which are described below. The function is implemented by memtx and vinyl tree indexes only. In memtx the function returns the quantile point exactly. It has the logarithmic complexity. It may return `nil` only if there are no tuples in the target range. There's guaranteed to be a tuple in the target range corresponding to the returned key. The vinyl implementation returns the quantile point with a reasonable degree of error for performance considerations. It has the logarithmic complexity as well. There may or may not be a tuple in the target range corresponding to the returned key. The function may return nil even if the target range is not empty, in particular, if there are no disk layers or if the range is smaller than the vinyl page size. The function doesn't yield. The function doesn't participate in transactions. Like other index methods, `index:quantile()` can be used through a space object as `space:quantile()`, in which case it works as a shortcut for the primary index. For more details, see the RFC document: https://github.com/orgs/tarantool/discussions/11194

The new method will be used by the quantile() public index API function. The stub raises "method unsupported" error. Part of #11111 NO_DOC=internal NO_TEST=stub NO_CHANGELOG=internal

The new function takes three arguments: level, begin_key, and end_key. It returns the quantile point key from the target range corresponding to the given level. It may return nil if the range is too small, in particular, if there are no tuples in it. The new function works only with TREE indexes. Currently, it isn't implemented by any engine. Part of #11111 NO_DOC=added later NO_CHANGELOG=added later

The implementation is trivial: we simply return the key at the offset equal to level * (end - begin) using the offsets built into the tree. The only case when we return nil is when the target range is empty. Part of #11111 NO_DOC=added later NO_CHANGELOG=added later

@TarantoolBot

We find the quantile key assuming that all keys are stored at the deepest disk layer while newer layers contain incremental updates. First, we walk over all index ranges overlapping with the target range and estimate the number of tuples in them. Then we compute the offset of the quantile key and walk over the index ranges again to find the key. To achieve that, we need two helper functions: 1. vy_run_estimate_stmt_count() estimates the number of tuples in the given run within the specified range. Basically, it finds the number of pages falling in the range, then scales down the total number of rows stored in the run proportionally, assuming that each page stores approximately the same number of rows. 2. vy_run_estimate_key_at() finds a key at the given offset. Like vy_run_estimate_stmt_count(), this function assumes that each page stores approximately the same number of rows to find the page where the target key is located, then it returns the page min key, which should be a good enough approximation. None of the functions accesses disk (yields). Note that the target key may happen to be outside the target range, for example, if the range is so small that there's no page fitting in it. To follow the index:quantile() function protocol, we ignore such keys (return NULL). Also note that runs store extended keys so we need to strip extra parts before returning the found key to the user. Closes #11111 @TarantoolBot document Title: Document `index:quantile()` The new index method finds a quantile point in an indexed data range. It takes the quantile level (0 < L < 1) and optionally the target range boundaries and returns such a key that the ratio of tuples less than the key equals L. Example: ```lua local s = box.schema.space.create('test') s:create_index('pk') for i = 1, 100 do s:insert({i * i, i}) end -- Find the median in the whole index. s.index.pk:quantile(0.5) -- returns {2601} -- Find the 90th percentile among all keys < 1000 s.index.pk:quantile(0.9, {}, {1000}) -- returns {784} ``` The target range is defined as the intersection of the following read requests: ```lua s:select(begin_key, {iterator = 'ge'}) s:select(end_key, {iterator = 'lt'}) ``` This means that using the empty key `{}` or `nil` for both `begin_key` and `end_key` finds the quantile in the whole index. The function raises an error if there can't possibly be any tuples in the target range, i.e. if `key_begin` >= `key_end`. The function returns `nil` if there are no tuples in the target range or if it failed to find the quantile due to the limitations of the storage engine, which are described below. The function is implemented by memtx and vinyl tree indexes only. In memtx the function returns the quantile point exactly. It has the logarithmic complexity. It may return `nil` only if there are no tuples in the target range. There's guaranteed to be a tuple in the target range corresponding to the returned key. The vinyl implementation returns the quantile point with a reasonable degree of error for performance considerations. It has the logarithmic complexity as well. There may or may not be a tuple in the target range corresponding to the returned key. The function may return nil even if the target range is not empty, in particular, if there are no disk layers or if the range is smaller than the vinyl page size. The function doesn't yield. The function doesn't participate in transactions. Like other index methods, `index:quantile()` can be used through a space object as `space:quantile()`, in which case it works as a shortcut for the primary index. For more details, see the RFC document: https://github.com/orgs/tarantool/discussions/11194

locker added feature A new functionality vinyl labels Feb 7, 2025

locker self-assigned this Feb 17, 2025

locker mentioned this issue Mar 7, 2025

Introduce index:quantile() #11226

Merged

locker added a commit that referenced this issue Mar 10, 2025

index: add quantile method stub

85732bd

The new method will be used by the quantile() public index API function. The stub raises "method unsupported" error. Part of #11111 NO_DOC=internal NO_TEST=stub NO_CHANGELOG=internal

locker closed this as completed in #11226 Mar 10, 2025

locker mentioned this issue Mar 17, 2025

Efficient way to estimate size of data range #11263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient way to find approximate median in data range #11111

Efficient way to find approximate median in data range #11111

Efficient way to find approximate median in data range #11111

Efficient way to find approximate median in data range #11111

Comments