Open
Description
Feature request
StarRocks uses a partitioning + bucketing two-tier data distribution strategy, to evenly distribute data across BEs or CNs. A well-designed data distribution strategy is crucial to performance, cost and stability.
However, it is difficult for users to design a good data distribution strategy for the following reasons:
- It is difficult to choose an appropriate bucket number when creating a new table or a new partition, even if the system automatically calculates it.
- It is costly to modify the bucket number of an existing table dynamically when it is found to be inappropriate.
- The most appropriate bucket number of a table is not fixed and will change with the development of business and the cluster specifications.
The dynamic tablet splitting and merging feature achieves the following goals:
- When creating tables or partitions, there is no need to estimate the final bucket number. Users or systems could provide an initial bucket number, then the number of tablets in a partition can grow or shrink dynamically. In most cases, users could not care about the bucket number. The initial number is calculated by system, and then the system can split and merge buckets automatically.
- Too large tablets and too many tablets with small sizes can be avoided.
- The performance should not degrade compared to the perfect data distribution manually.
- Prioritize support dynamic tablet splitting and merging in shared-data mode.