-
Notifications
You must be signed in to change notification settings - Fork 3.9k
kvserver: limit the number of kv spans configurable by a tenant #70555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is something we'll try tackling (or at least prototyping) in November. A brief plug for why it's important: After #67679, enabling the span configs infrastructure by default (something we're also looking to do over November) would mean that for every single schema object in a multi-tenant cluster, we would be inducing a range split. This is exactly what happens with the host tenant today where every table/index/partition gets its own range. Before #67679, for each tenant we started off with just a mega-range containing the entire tenant ID prefix.
After #67679 + enabling the infra.
Having (+cc @shralex, @nvanbenschoten, @ajwerner, @arulajmani) |
In the short-term (22.1) all we'll do is maintain a counter for secondary tenants and for every schema change, use a translator-like component to determine how many splits are being added/removed. We don't need to look at the corresponding zone config for the descriptor as that has no bearing here. We'll inc/decrement our counter accordingly, comparing it against a per-tenant limit exposed by the host -- perhaps using a "tenant read-only cluster setting", using the verbiage from multi-tenant cluster settings rfc. If we're over the limit, we'll abort the schema change txn. KV of course would still maintain its internal counter for the number of splits/etc per-tenant, and use that to reject span config updates if they go overboard. For unmodified tenants, we don't expect to run into these errors. This scheme, in contrast to the one described above, has the downside that it would serialize all schema changes. That feels reasonable for the short term, especially considering it only applies to secondary tenants. Incorporating leasing in any form would purely be a performance optimization -- to have real schema change concurrency. |
How would this work for internal APIs like |
I don't think I know we recently filed #74389, should we brainstorm there? It's not yet clear to me yet that even want to carry forward AdminSplit/Scatter as an API, let alone extend it to secondary tenants. There was some KV internal discussion here on the subject. |
More internal discussion on |
Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); Assuming a table ID of 108, we'd generate: /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 --- This is going to serve as the underlying library for cockroachdb#70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: cockroachdb#73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None
Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); Assuming a table ID of 108, we'd generate: /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 --- This is going to serve as the underlying library for cockroachdb#70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: cockroachdb#73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None
Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); Assuming a table ID of 108, we'd generate: /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 --- This is going to serve as the underlying library for cockroachdb#70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: cockroachdb#73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None
Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); Assuming a table ID of 108, we'd generate: /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 --- This is going to serve as the underlying library for cockroachdb#70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: cockroachdb#73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None
75803: spanconfig: introduce spanconfig.Splitter r=irfansharif a=irfansharif Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: ``` CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); ``` Assuming a table ID of 108, we'd generate: ``` /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 ``` This is going to serve as the underlying library for #70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: #73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None 76751: ci: build `workload` in CI r=rail a=rickystewart This brings us to parity with the pre-Bazel `Compile Builds` job in CI. Release note: None 76758: backup: include tenant_settings in cluster backup r=RaduBerinde a=RaduBerinde We include the tenant_settings table in full cluster backups. This means that when we restore a cluster, all overrides should be in place. The overrides are treated as a host cluster property. When restoring only specific tenants, the overrides will not be carried over. In the future, we can add tenant-specific overrides to tenant metadata so restoring a single tenant restores the overrides as well. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Splitter returns the set of all possible split points for the given table descriptor. It steps through every "unit" that we can apply configurations over (table, indexes, partitions and sub-partitions) and figures out the actual key boundaries that we may need to split over. For example: CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) ( PARTITION one_and_five VALUES IN (1, 5), PARTITION four_and_three VALUES IN (4, 3), PARTITION everything_else VALUES IN (6, default) ); Assuming a table ID of 108, we'd generate: /Table/108 /Table/108/1 /Table/108/1/1 /Table/108/1/2 /Table/108/1/3 /Table/108/1/4 /Table/108/1/5 /Table/108/1/6 /Table/108/1/7 /Table/108/2 --- This is going to serve as the underlying library for cockroachdb#70555. In a future commit we'll introduce two system tables, one for all tenants (both guest and host) and one on just the host. The first will be maintained by the SQL pod, using the library, as part of every schema change operation. We'll record in it a single row containing the total number of splits the pod's currently using up by comparing the before/after of a descriptor operation to see if the commit would add or reduce the number of necessary splits. We'll abort the txn if it's over a pre-set limit (INF for the host tenant, and using a KV-settable-only cluster setting for guests: cockroachdb#73857). The second table will be used only for KV's internal book keeping, sitting behind the KVAccessor interface. We'll use it to hard reject RPCs installing span configs that induce splits more than a tenant is allowed to. With the SQL pods coordinating using the KV-set limits, this would only happen as a result of faulty SQL code. Put together, we'll have a co-operative form of per-tenant split limits. Release note: None
77639: spanconfig: re-write spanconfig.Splitter r=irfansharif a=irfansharif The earlier implementation relied on decoding keys in order to determine precise split points. When plugging this library into the component designed for tenant-side span config limiting (#77337, to address \#70555), we realized it's not possible to grab a hold of type-hydrated table descriptors (needed for decoding). This is because today it's possible to GC type descriptors before GC-ing table descriptors that refer to them. Given the integration point for this library was around the GC job, we had to forego decoding routines. Instead of computing the precise split keys, we can however compute how many there are without having to look at keys at all. We Consider our table hierarchy: table -> index -> partition -> partition -> (...) -> partition Each partition is either a PARTITION BY LIST kind (where it can then be further partitioned, or not), or a PARTITION BY RANGE kind (no further partitioning possible). We can classify each parent-child link into two types: (a) Contiguous {index, list partition} -> range partition (b) Non-contiguous table -> index, {index, list partition} -> list partition - Contiguous links are the sort where each child span is contiguous with another, and that the set of all child spans encompass the parent's span. For an index that's partitioned by range: ``` CREATE TABLE db.range(i INT PRIMARY KEY, j INT) PARTITION BY RANGE (i) ( PARTITION less_than_five VALUES FROM (minvalue) to (5), PARTITION between_five_and_ten VALUES FROM (5) to (10), PARTITION greater_than_ten VALUES FROM (10) to (maxvalue) ); ``` With table ID as 106, the parent index span is `/Table/106/{1-2}`. The child spans are `/Table/106/1{-/5}`, `/Table/106/1/{5-10}` and `/Table/106/{1/10-2}`. They're contiguous; put together they wholly encompass the parent span. - Non-contiguous links, by contrast, are when child spans are neither contiguous with respect to one another, nor do they start and end at the parent span's boundaries. For a table with a secondary index: ``` CREATE TABLE db.t(i INT PRIMARY KEY, j INT); CREATE INDEX idx ON db.t (j); DROP INDEX db.t@idx; CREATE INDEX idx ON db.t (j); ``` With table ID as 106, the parent table span is `/Table/10{6-7}`. The child spans are `/Table/106/{1-2}` and `/Table/106/{3-4}`. Compared to the parent span, we're missing `/Table/106{-/1}`, `/Table/106/{2-3}`, `/Table/10{6/4-7}`. For N children: - For a contiguous link, the number of splits equals the number of child elements (i.e. N). - For a non-contiguous link, the number of splits equals N + 1 + N. For N children, there are N - 1 gaps. There are also 2 gaps at the start and end of the parent span. Summing that with the N children span themselves, we get to the formula above. This assumes that the N child elements aren't further subdivided, if they are (we can compute it recursively), the formula becomes N + 1 + Σ(grand child spans). Computing split count this way does come with one down-side: we might be overcounting. When comparing keys, we could for example recognize that partition-by-list values are adjacent, therefore there's no "gap" between them. We can also do this by potentially comparing encoded keys with one other. We just didn't because it's more annoying code to write, and over-counting here is more than fine for our purposes usages. Release justification: non-production code Release note: None 77653: rfcs: update storage parameters for TTL r=rafiss a=otan Release justification: non-production code change Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It considers the committed and uncommitted // state of a table descriptor and computes the "span" delta, each // unit we can apply a configuration over. It uses these deltas to // maintain an aggregate counter, informing the caller if exceeding // the configured limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, committed, uncommitted catalog.TableDescriptor, ) (bool, error) } This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVers 10000 ionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit. - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (cockroachdb#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t2(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change
Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It considers the committed and uncommitted // state of a table descriptor and computes the "span" delta, each // unit we can apply a configuration over. It uses these deltas to // maintain an aggregate counter, informing the caller if exceeding // the configured limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, committed, uncommitted catalog.TableDescriptor, ) (bool, error) } This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit. - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (cockroachdb#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t2(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It considers the committed and uncommitted // state of a table descriptor and computes the "span" delta, each // unit we can apply a configuration over. It uses these deltas to // maintain an aggregate counter, informing the caller if exceeding // the configured limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, committed, uncommitted catalog.TableDescriptor, ) (bool, error) } This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit. - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (cockroachdb#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t2(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It takes in a delta (typically the difference // in span configs between the committed and uncommitted state in // the txn), uses it to maintain an aggregate counter, and informs // the caller if exceeding the prescribed limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, delta int, ) (bool, error) } The delta is computed using a static helper, spanconfig.Delta: // Delta considers both the committed and uncommitted state of a // table descriptor and computes the difference in the number of // spans we can apply a configuration over. func Delta( ctx context.Context, s Splitter, committed, uncommitted catalog.TableDescriptor, ) (int, error) This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit; - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (cockroachdb#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t42(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change Release note: None
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It takes in a delta (typically the difference // in span configs between the committed and uncommitted state in // the txn), uses it to maintain an aggregate counter, and informs // the caller if exceeding the prescribed limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, delta int, ) (bool, error) } The delta is computed using a static helper, spanconfig.Delta: // Delta considers both the committed and uncommitted state of a // table descriptor and computes the difference in the number of // spans we can apply a configuration over. func Delta( ctx context.Context, s Splitter, committed, uncommitted catalog.TableDescriptor, ) (int, error) This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit; - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (cockroachdb#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t42(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change Release note: None
77337: spanconfig: limit # of tenant span configs r=irfansharif a=irfansharif Fixes #70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It considers the committed and uncommitted // state of a table descriptor and computes the "span" delta, each // unit we can apply a configuration over. It uses these deltas to // maintain an aggregate counter, informing the caller if exceeding // the configured limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, committed, uncommitted catalog.TableDescriptor, ) (bool, error) } This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit. - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t2(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change 79462: colexecproj: break it down into two packages r=yuzefovich a=yuzefovich **colexecproj: split up default cmp proj op file into two** This commit splits up a single file containing two default comparison projection operators into two files. This is done in preparation of the following commit (which will move one of the operators to a different package). Release note: None **colexecproj: extract a new package for projection ops with const** This commit extracts a new `colexecprojconst` package out of `colexecproj` that contains all projection operators with one constant argument. This will allow for faster build speeds since both packages tens of thousands lines of code. Special care had to be taken for default comparison operator because we need to generate two files in different packages based on a single template. I followed the precedent of `sort_partitioner.eg.go` which had to do the same. Addresses: #79357. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Fixes #70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It takes in a delta (typically the difference // in span configs between the committed and uncommitted state in // the txn), uses it to maintain an aggregate counter, and informs // the caller if exceeding the prescribed limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, delta int, ) (bool, error) } The delta is computed using a static helper, spanconfig.Delta: // Delta considers both the committed and uncommitted state of a // table descriptor and computes the difference in the number of // spans we can apply a configuration over. func Delta( ctx context.Context, s Splitter, committed, uncommitted catalog.TableDescriptor, ) (int, error) This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit; - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t42(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change Release note: None
manually reviewed and brought up to date |
Uh oh!
There was an error while loading. Please reload this page.
#66348 outlined a scheme to support zone configs for secondary tenants. Since the unit of what we can configure in KV is a Range, tenants now being able to set zone configs grants them the ability to induce an arbitrary number of splits in KV -- something we want to put guardrails against (see RFC for more details).
There's some discussion elsewhere for how we could achieve this, copying over one in particular:
The RFC also captures a possible schema we could use on the host tenant to store these limits:
The actual RPCs/interfaces are TBD. Probably we also want the ability to configure these limits on a per-tenant basis.
Epic CRDB-10563
Jira issue: CRDB-10116
The text was updated successfully, but these errors were encountered: