-
Notifications
You must be signed in to change notification settings - Fork 2k
[Feature] Change bucket number from physical partition level to materialized index level #59441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can finish the necessary refactoring work in this PR, postpone the behavior change and protocol change in the future PR.
So you can just keep not changing the num_bucket, just refactoring the tabel_reader and tablet_sink in the backend.
be/src/exec/tablet_info.h
Outdated
@@ -258,7 +257,7 @@ class OlapTablePartitionParam { | |||
// `invalid_row_index` stores index that chunk[index] | |||
// has been filtered out for not being able to find tablet. | |||
// it could be any row, becauset it's just for outputing error message for user to diagnose. | |||
Status find_tablets(Chunk* chunk, std::vector<OlapTablePartition*>* partitions, std::vector<uint32_t>* indexes, | |||
Status find_tablets(Chunk* chunk, std::vector<uint32_t>* hashes, std::vector<OlapTablePartition*>* partitions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not changing the order of the parameters in this PR. It introduces an extra burden to track what you are doing.
To keep it a renaming change makes life easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I update the code to keep the order of the parameters in the function find_tablets
.
But I still refactor the parameters in the private function _find_tablets_with_range_partition
and _find_tablets_with_list_partition
. Because their original parameters cannot tell which are input and which are output, and even the original function comments are written incorrectly, some input parameters are incorrectly annotated as output parameters.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ialized index level Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]❌ fail : 18 / 41 (43.90%) file detail
|
…ialized index level (StarRocks#59441) Why I'm doing: This is a preliminary work of tablet splitting and merging. Previously, bucket number is at physical partition level. All materialized indexes in a physical partition must have the same bucket number. But after tablet splitting, different materialized index in a physical partition could have different bucket number. We need to change bucket number from physical partition level to materialized index level. What I'm doing: Change bucket number from physical partition level to materialized index level. Because different materialized index in a physical partition could have different bucket number. Tablet sink cannot calculate a unified tablet index for each record of data to be distributed to different materialized index. To solve the problem, we refactor the code of tablet sink. Now tablet sink calculate a unified hash value for each record of data to be distributed to different materialized index, then the tablet index for each record of data will be calculated according to the hash value and the tablet size of a materialized index when the record of data is distributed to the materialized index. This pr just refactor the code. Next pr will remove num_bucket in OlapTablePartition and use tablets.size() in each OlapTableIndexTablets instead. Fixes StarRocks#59134 Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com> Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>
Why I'm doing:
This is a preliminary work of tablet splitting and merging.
Previously, bucket number is at physical partition level. All materialized indexes in a physical partition must have the same bucket number. But after tablet splitting, different materialized index in a physical partition could have different bucket number. We need to change bucket number from physical partition level to materialized index level.
What I'm doing:
Change bucket number from physical partition level to materialized index level.
Because different materialized index in a physical partition could have different bucket number. Tablet sink cannot calculate a unified tablet index for each record of data to be distributed to different materialized index.
To solve the problem, we refactor the code of tablet sink. Now tablet sink calculate a unified hash value for each record of data to be distributed to different materialized index, then the tablet index for each record of data will be calculated according to the hash value and the tablet size of a materialized index when the record of data is distributed to the materialized index.
This pr just refactor the code. Next pr will remove
num_bucket
inOlapTablePartition
and usetablets.size()
in eachOlapTableIndexTablets
instead.Fixes #59134
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: