8000 kvserver: limit the number of kv spans configurable by a tenant · Issue #70555 · cockroachdb/cockroach · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

kvserver: limit the number of kv spans configurable by a tenant #70555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
irfansharif opened this issue Sep 22, 2021 · 7 comments · Fixed by #77337
Closed

kvserver: limit the number of kv spans configurable by a tenant #70555

irfansharif opened this issue Sep 22, 2021 · 7 comments · Fixed by #77337
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-multitenancy Related to multi-tenancy A-zone-configs branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team

Comments

@irfansharif
Copy link
Contributor
irfansharif commented Sep 22, 2021

#66348 outlined a scheme to support zone configs for secondary tenants. Since the unit of what we can configure in KV is a Range, tenants now being able to set zone configs grants them the ability to induce an arbitrary number of splits in KV -- something we want to put guardrails against (see RFC for more details).

There's some discussion elsewhere for how we could achieve this, copying over one in particular:

We could have each pod lease out a local quota, counting against the tenant's global limit. For N pods of a tenant, and a limit L, we could either lease out partial quotas L/N, or have a single pod lease out L entirely and require all DDLs/zcfgs to through that one pod. The scheme below describes the general case of each pod getting some fractional share of the total quota.

  • When each pod considers a DDL/zcfg txn, before committing it will attempt to acquire a quota from its local leased limit. The quota acquired is an over-estimate of how much the txn will cost (in terms of additional splits induced); we cannot under-estimate as KV could then later reject the committed zcfg.
    • If there's insufficient quota, we'll abort the txn and error out accordingly.
    • If there's sufficient quota, we'll commit the txn and rely on the eventual reconciliation to reliably succeed. We'll queue up a work item for the for the reconciliation job, marking alongside it the lease expiration of the quota it was accepted with.
  • When the local pool runs out of quota and has to request/lease more, it needs to ensure that all the txns that have been committed have been reconciled. We don't want it to be the case that a quota is leased, allocated to an operation in the work queue, it stays in the work queue past its lease expiration date, and we're able to lease the "same" quota again and queue up a second work item that will now get rejected. We need to make sure that when we lease a limit at t, there are no unprocessed work items with leased tokens expiring before t.
  • Whenever the work item is processed, the amount we over-estimated by could either be returned to the local quota pool (with the same expiration ts), or we could simply drop it entirely and rely on the local quota pool to eventually re-acquire it from the global pool.

The RFC also captures a possible schema we could use on the host tenant to store these limits:

CREATE TABLE system.spans_configurations_per_tenant (
    tenant_id               INT,
    max_span_configurations INT,
)

The actual RPCs/interfaces are TBD. Probably we also want the ability to configure these limits on a per-tenant basis.

Epic CRDB-10563

Jira issue: CRDB-10116

@blathers-crl blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 22, 2021
@irfansharif irfansharif added A-kv Anything in KV that doesn't belong in a more specific category. A-zone-configs A-multitenancy Related to multi-tenancy labels Sep 22, 2021
@exalate-issue-sync exalate-issue-sync bot added the T-kv KV Team label Oct 11, 2021
@irfansharif
Copy link
Contributor Author
irfansharif commented Nov 3, 2021

This is something we'll try tackling (or at least prototyping) in November. A brief plug for why it's important:

After #67679, enabling the span configs infrastructure by default (something we're also looking to do over November) would mean that for every single schema object in a multi-tenant cluster, we would be inducing a range split. This is exactly what happens with the host tenant today where every table/index/partition gets its own range. Before #67679, for each tenant we started off with just a mega-range containing the entire tenant ID prefix.

--- Seeing range boundaries on the host tenant, with a single secondary tenant pre-67679. The last one is of interest.
> SELECT range_id, start_pretty, end_pretty from crdb_internal.ranges;
  range_id |         start_pretty          |          end_pretty
-----------+-------------------------------+--------------------------------
         1 | /Min                          | /System/NodeLiveness
         2 | /System/NodeLiveness          | /System/NodeLivenessMax
         3 | /System/NodeLivenessMax       | /System/tsd
         4 | /System/tsd                   | /System/"tse"
         5 | /System/"tse"                 | /Table/SystemConfigSpan/Start
         6 | /Table/SystemConfigSpan/Start | /Table/11
         7 | /Table/11                     | /Table/12
         8 | /Table/12                     | /Table/13
                                        ...
        42 | /Table/46                     | /Table/47
        43 | /Table/47                     | /Tenant/123
        44 | /Tenant/123                   | /Max

After #67679 + enabling the infra.

> SELECT range_id, start_pretty, end_pretty from crdb_internal.ranges;
  range_id |          start_pretty          |           end_pretty
-----------+--------------------------------+---------------------------------
                                        ...
        42 | /Table/46                      | /Table/47
        43 | /Table/47                      | /Tenant/123
        44 | /Tenant/123                    | /Tenant/123/Table/3
        45 | /Tenant/123/Table/3            | /Tenant/123/Table/4
        46 | /Tenant/123/Table/4            | /Tenant/123/Table/5
        47 | /Tenant/123/Table/5            | /Tenant/123/Table/6
                                        ...
        83 | /Tenant/123/Table/45           | /Tenant/123/Table/46
        84 | /Tenant/123/Table/46           | /Tenant/123/Table/47
        85 | /Tenant/123/Table/47           | /Max

Having O(descriptors across all tenants) ranges in the system is going to be difficult for us to work with, and can readily be used as an attack vector. We'd either need to impose a hard limit on the number of descriptor objects a tenant is allowed to have, or do what this issue suggests -- introducing a limit based on the number of individually configurable spans. This is still a pretty rudimentary safeguard, but fear not, we can do much better on the KV side to move away from always splitting on descriptor boundaries: #72389.

(+cc @shralex, @nvanbenschoten, @ajwerner, @arulajmani)

@irfansharif
Copy link
Contributor Author

In the short-term (22.1) all we'll do is maintain a counter for secondary tenants and for every schema change, use a translator-like component to determine how many splits are being added/removed. We don't need to look at the corresponding zone config for the descriptor as that has no bearing here. We'll inc/decrement our counter accordingly, comparing it against a per-tenant limit exposed by the host -- perhaps using a "tenant read-only cluster setting", using the verbiage from multi-tenant cluster settings rfc. If we're over the limit, we'll abort the schema change txn. KV of course would still maintain its internal counter for the number of splits/etc per-tenant, and use that to reject span config updates if they go overboard. For unmodified tenants, we don't expect to run into these errors.

This scheme, in contrast to the one described above, has the downside that it would serialize all schema changes. That feels reasonable for the short term, especially considering it only applies to secondary tenants. Incorporating leasing in any form would purely be a performance optimization -- to have real schema change concurrency.

@ajstorm
Copy link
Collaborator
ajstorm commented Jan 6, 2022

How would this work for internal APIs like AdminSplit? Would they get caught by this mechanism and tracked accordingly? Or would they enter KV below the "schema change"-level tracking? I'm asking because we may be able to enable AdminSplit directly if it gets tracked by this scheme.

@irfansharif
Copy link
Contributor Author

we may be able to enable AdminSplit directly if it gets tracked by this scheme.

I don't think AdminSplit can directly leverage the mechanisms described above, at least not in this current form, and not without additional leg work. AdminSplits are issued directly to KV as a non-transactional requests (follow code here). The SQL side counter we're maintaining is done so transactionally, in the tenant's own keyspace. Crucially, it's maintained without ever having to ask KV about its own trackers. To track an AdminSplit request (these typically have expiration timestamps) we'd have to maintain this SQL side counter in tandem with issuing the request to KV, where it may or may not be rejected (some form of 2PC maybe?). In addition, once the AdminSplit request has expired, the SQL side counter would probably want to reflect the change -- it would perhaps need to reach out to KV to reconcile what the current state of those requests are. This is all doable with some effort, and there might be simpler approaches, but I'm not sure if it's quite as "drop-in" with everything else this issue is focused on.

I know we recently filed #74389, should we brainstorm there? It's not yet clear to me yet that even want to carry forward AdminSplit/Scatter as an API, let alone extend it to secondary tenants. There was some KV internal discussion here on the subject.

@ajwerner
Copy link
10000 Contributor
ajwerner commented Jan 7, 2022

More internal discussion on AdminSplit and "advisory splits" here.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 14, 2022
Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:

  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );

Assuming a table ID of 108, we'd generate:

  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2

---

This is going to serve as the underlying library for cockroachdb#70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: cockroachdb#73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 17, 2022
Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:

  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );

Assuming a table ID of 108, we'd generate:

  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2

---

This is going to serve as the underlying library for cockroachdb#70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: cockroachdb#73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 17, 2022
Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:

  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );

Assuming a table ID of 108, we'd generate:

  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2

---

This is going to serve as the underlying library for cockroachdb#70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: cockroachdb#73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Feb 18, 2022
Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:

  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );

Assuming a table ID of 108, we'd generate:

  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2

---

This is going to serve as the underlying library for cockroachdb#70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: cockroachdb#73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None
craig bot pushed a commit that referenced this issue Feb 18, 2022
75803: spanconfig: introduce spanconfig.Splitter r=irfansharif a=irfansharif

Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:
```
  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );
```
Assuming a table ID of 108, we'd generate:
```
  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2
```

This is going to serve as the underlying library for #70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: #73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None

76751: ci: build `workload` in CI r=rail a=rickystewart

This brings us to parity with the pre-Bazel `Compile Builds` job in CI.

Release note: None

76758: backup: include tenant_settings in cluster backup r=RaduBerinde a=RaduBerinde

We include the tenant_settings table in full cluster backups. This
means that when we restore a cluster, all overrides should be in
place.

The overrides are treated as a host cluster property. When restoring
only specific tenants, the overrides will not be carried over.

In the future, we can add tenant-specific overrides to tenant metadata
so restoring a single tenant restores the overrides as well.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
maryliag pushed a commit to maryliag/cockroach that referenced this issue Feb 28, 2022
Splitter returns the set of all possible split points for the given
table descriptor. It steps through every "unit" that we can apply
configurations over (table, indexes, partitions and sub-partitions) and
figures out the actual key boundaries that we may need to split over.
For example:

  CREATE TABLE db.parts(i INT PRIMARY KEY, j INT) PARTITION BY LIST (i) (
    PARTITION one_and_five    VALUES IN (1, 5),
    PARTITION four_and_three  VALUES IN (4, 3),
    PARTITION everything_else VALUES IN (6, default)
  );

Assuming a table ID of 108, we'd generate:

  /Table/108
  /Table/108/1
  /Table/108/1/1
  /Table/108/1/2
  /Table/108/1/3
  /Table/108/1/4
  /Table/108/1/5
  /Table/108/1/6
  /Table/108/1/7
  /Table/108/2

---

This is going to serve as the underlying library for cockroachdb#70555. In a future
commit we'll introduce two system tables, one for all tenants (both
guest and host) and one on just the host.

The first will be maintained by the SQL pod, using the library, as part
of every schema change operation. We'll record in it a single row
containing the total number of splits the pod's currently using up by
comparing the before/after of a descriptor operation to see if the
commit would add or reduce the number of necessary splits. We'll abort
the txn if it's over a pre-set limit (INF for the host tenant, and using
a KV-settable-only cluster setting for guests: cockroachdb#73857).

The second table will be used only for KV's internal book keeping,
sitting behind the KVAccessor interface. We'll use it to hard reject
RPCs installing span configs that induce splits more than a tenant is
allowed to. With the SQL pods coordinating using the KV-set limits, this
would only happen as a result of faulty SQL code.

Put together, we'll have a co-operative form of per-tenant split limits.

Release note: None
craig bot pushed a commit that referenced this issue Mar 11, 2022
77639: spanconfig: re-write spanconfig.Splitter r=irfansharif a=irfansharif

The earlier implementation relied on decoding keys in order to determine
precise split points. When plugging this library into the component
designed for tenant-side span config limiting (#77337, to address
\#70555), we realized it's not possible to grab a hold of type-hydrated
table descriptors (needed for decoding). This is because today it's
possible to GC type descriptors before GC-ing table descriptors that
refer to them. Given the integration point for this library was around
the GC job, we had to forego decoding routines.

Instead of computing the precise split keys, we can however compute how
many there are without having to look at keys at all. We

Consider our table hierarchy:

    table -> index -> partition -> partition -> (...) -> partition

Each partition is either a PARTITION BY LIST kind (where it can then be
further partitioned, or not), or a PARTITION BY RANGE kind (no further
partitioning possible). We can classify each parent-child link into two
types:

(a) Contiguous      {index, list partition} -> range partition
(b) Non-contiguous  table -> index, {index, list partition} -> list partition

- Contiguous links are the sort where each child span is contiguous with
  another, and that the set of all child spans encompass the parent's
  span. For an index that's partitioned by range:
```
    CREATE TABLE db.range(i INT PRIMARY KEY, j INT)
    PARTITION BY RANGE (i) (
      PARTITION less_than_five       VALUES FROM (minvalue) to (5),
      PARTITION between_five_and_ten VALUES FROM (5) to (10),
      PARTITION greater_than_ten     VALUES FROM (10) to (maxvalue)
    );
```
  With table ID as 106, the parent index span is `/Table/106/{1-2}`. The
  child spans are `/Table/106/1{-/5}`, `/Table/106/1/{5-10}` and
  `/Table/106/{1/10-2}`. They're contiguous; put together they wholly
  encompass the parent span.

- Non-contiguous links, by contrast, are when child spans are neither
  contiguous with respect to one another, nor do they start and end at
  the parent span's boundaries. For a table with a secondary index:
```
    CREATE TABLE db.t(i INT PRIMARY KEY, j INT);
    CREATE INDEX idx ON db.t (j);
    DROP INDEX db.t@idx;
    CREATE INDEX idx ON db.t (j);
```
  With table ID as 106, the parent table span is `/Table/10{6-7}`. The
  child spans are `/Table/106/{1-2}` and `/Table/106/{3-4}`. Compared to the
  parent span, we're missing `/Table/106{-/1}`, `/Table/106/{2-3}`,
  `/Table/10{6/4-7}`.

For N children:
- For a contiguous link, the number of splits equals the number of child
  elements (i.e. N).
- For a non-contiguous link, the number of splits equals N + 1 + N. For N
  children, there are N - 1 gaps. There are also 2 gaps at the start and end
  of the parent span. Summing that with the N children span themselves, we
  get to the formula above. This assumes that the N child elements aren't
  further subdivided, if they are (we can compute it recursively), the
  formula becomes N + 1 + Σ(grand child spans).

Computing split count this way does come with one down-side: we might be
overcounting. When comparing keys, we could for example recognize that
partition-by-list values are adjacent, therefore there's no "gap"
between them. We can also do this by potentially comparing encoded keys
with one other. We just didn't because it's more annoying code to write,
and over-counting here is more than fine for our purposes usages.

Release justification: non-production code
Release note: None

77653: rfcs: update storage parameters for TTL r=rafiss a=otan

Release justification: non-production code change

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 11, 2022
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It considers the committed and uncommitted
    // state of a table descriptor and computes the "span" delta, each
    // unit we can apply a configuration over. It uses these deltas to
    // maintain an aggregate counter, informing the caller if exceeding
    // the configured limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn,
        committed, uncommitted catalog.TableDescriptor,
      ) (bool, error)
    }

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVers
10000
ionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit.
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (cockroachdb#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t2(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change
@blathers-crl
Copy link
blathers-crl bot commented Mar 22, 2022

Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@nvanbenschoten nvanbenschoten added the branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 label Mar 22, 2022
@irfansharif irfansharif added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Mar 22, 2022
@irfansharif irfansharif removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. blocks-22.1.0-beta.1 labels Mar 29, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 1, 2022
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It considers the committed and uncommitted
    // state of a table descriptor and computes the "span" delta, each
    // unit we can apply a configuration over. It uses these deltas to
    // maintain an aggregate counter, informing the caller if exceeding
    // the configured limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn,
        committed, uncommitted catalog.TableDescriptor,
      ) (bool, error)
    }

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit.
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (cockroachdb#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t2(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 1, 2022
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It considers the committed and uncommitted
    // state of a table descriptor and computes the "span" delta, each
    // unit we can apply a configuration over. It uses these deltas to
    // maintain an aggregate counter, informing the caller if exceeding
    // the configured limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn,
        committed, uncommitted catalog.TableDescriptor,
      ) (bool, error)
    }

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit.
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (cockroachdb#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t2(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 4, 2022
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It takes in a delta (typically the difference
    // in span configs between the committed and uncommitted state in
    // the txn), uses it to maintain an aggregate counter, and informs
    // the caller if exceeding the prescribed limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn, delta int,
      ) (bool, error)
    }

The delta is computed using a static helper, spanconfig.Delta:

    // Delta considers both the committed and uncommitted state of a
    // table descriptor and computes the difference in the number of
    // spans we can apply a configuration over.
    func Delta(
      ctx context.Context, s Splitter,
      committed, uncommitted catalog.TableDescriptor,
    ) (int, error)

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit;
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (cockroachdb#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t42(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 7, 2022
Fixes cockroachdb#70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It takes in a delta (typically the difference
    // in span configs between the committed and uncommitted state in
    // the txn), uses it to maintain an aggregate counter, and informs
    // the caller if exceeding the prescribed limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn, delta int,
      ) (bool, error)
    }

The delta is computed using a static helper, spanconfig.Delta:

    // Delta considers both the committed and uncommitted state of a
    // table descriptor and computes the difference in the number of
    // spans we can apply a configuration over.
    func Delta(
      ctx context.Context, s Splitter,
      committed, uncommitted catalog.TableDescriptor,
    ) (int, error)

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit;
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (cockroachdb#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t42(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change

Release note: None
craig bot pushed a commit that referenced this issue Apr 7, 2022
77337: spanconfig: limit # of tenant span configs r=irfansharif a=irfansharif

Fixes #70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It considers the committed and uncommitted
    // state of a table descriptor and computes the "span" delta, each
    // unit we can apply a configuration over. It uses these deltas to
    // maintain an aggregate counter, informing the caller if exceeding
    // the configured limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn,
        committed, uncommitted catalog.TableDescriptor,
      ) (bool, error)
    }

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit.
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t2(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change

79462: colexecproj: break it down into two packages r=yuzefovich a=yuzefovich

**colexecproj: split up default cmp proj op file into two**

This commit splits up a single file containing two default comparison
projection operators into two files. This is done in preparation of
the following commit (which will move one of the operators to a
different package).

Release note: None

**colexecproj: extract a new package for projection ops with const**

This commit extracts a new `colexecprojconst` package out of
`colexecproj` that contains all projection operators with one
constant argument. This will allow for faster build speeds since both
packages tens of thousands lines of code.

Special care had to be taken for default comparison operator because we
need to generate two files in different packages based on a single
template. I followed the precedent of `sort_partitioner.eg.go` which had
to do the same.

Addresses: #79357.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in 90054c6 Apr 8, 2022
blathers-crl bot pushed a commit that referenced this issue Apr 8, 2022
Fixes #70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It takes in a delta (typically the difference
    // in span configs between the committed and uncommitted state in
    // the txn), uses it to maintain an aggregate counter, and informs
    // the caller if exceeding the prescribed limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn, delta int,
      ) (bool, error)
    }

The delta is computed using a static helper, spanconfig.Delta:

    // Delta considers both the committed and uncommitted state of a
    // table descriptor and computes the difference in the number of
    // spans we can apply a configuration over.
    func Delta(
      ctx context.Context, s Splitter,
      committed, uncommitted catalog.TableDescriptor,
    ) (int, error)

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit;
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t42(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change

Release note: None
@kenliu-crl
Copy link
Contributor

manually reviewed and brought up to date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. A-multitenancy Related to multi-tenancy A-zone-configs branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0