8000 release-22.1: spanconfig: limit # of tenant span configs by blathers-crl[bot] · Pull Request #79627 · cockroachdb/cockroach · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

release-22.1: spanconfig: limit # of tenant span configs #79627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2022

Conversation

blathers-crl[bot]
Copy link
@blathers-crl blathers-crl bot commented Apr 8, 2022

Backport 1/1 commits from #77337 on behalf of @irfansharif.

/cc @cockroachdb/release


Fixes #70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

// Limiter is used to limit the number of span configs installed by
// secondary tenants. It considers the committed and uncommitted
// state of a table descriptor and computes the "span" delta, each
// unit we can apply a configuration over. It uses these deltas to
// maintain an aggregate counter, informing the caller if exceeding
// the configured limit.
type Limiter interface {
  ShouldLimit(
    ctx context.Context, txn *kv.Txn,
    committed, uncommitted catalog.TableDescriptor,
  ) (bool, error)
}

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

CREATE TABLE system.span_count (
  singleton  BOOL DEFAULT TRUE,
  span_count INT NOT NULL,
  CONSTRAINT "primary" PRIMARY KEY (singleton),
  CONSTRAINT single_row CHECK (singleton),
  FAMILY "primary" (singleton, span_count)
);

We need just two integration points for spanconfig.Limiter:

  • Right above CheckTwoVersionInvariant, where we're able to hook into
    the committed and to-be-committed descriptor state before txn commit.
  • In the GC job, when gc-ing table state. We decrement a table's split
    count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

> CREATE TABLE db.t2(i INT PRIMARY KEY);
pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change


Release justification:

Fixes #70555. In order to limit the number of span configs a tenant's
able to install, we introduce a tenant-side spanconfig.Limiter. It
presents the following interface:

    // Limiter is used to limit the number of span configs installed by
    // secondary tenants. It takes in a delta (typically the difference
    // in span configs between the committed and uncommitted state in
    // the txn), uses it to maintain an aggregate counter, and informs
    // the caller if exceeding the prescribed limit.
    type Limiter interface {
      ShouldLimit(
        ctx context.Context, txn *kv.Txn, delta int,
      ) (bool, error)
    }

The delta is computed using a static helper, spanconfig.Delta:

    // Delta considers both the committed and uncommitted state of a
    // table descriptor and computes the difference in the number of
    // spans we can apply a configuration over.
    func Delta(
      ctx context.Context, s Splitter,
      committed, uncommitted catalog.TableDescriptor,
    ) (int, error)

This limiter only applies to secondary tenants. The counter is
maintained in a newly introduced (tenant-only) system table, using the
following schema:

    CREATE TABLE system.span_count (
      singleton  BOOL DEFAULT TRUE,
      span_count INT NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (singleton),
      CONSTRAINT single_row CHECK (singleton),
      FAMILY "primary" (singleton, span_count)
    );

We need just two integration points for spanconfig.Limiter:
- Right above CheckTwoVersionInvariant, where we're able to hook into
  the committed and to-be-committed descriptor state before txn commit;
- In the GC job, when gc-ing table state. We decrement a table's split
  count when GC-ing the table for good.

The per-tenant span config limit used is controlled by a new tenant
read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster
settings (#73857) provides the infrastructure for the host tenant to be
able to control this setting cluster wide, or to target a specific
tenant at a time.

We also need a migration here, to start tracking span counts for
clusters with pre-existing tenants. We introduce a migration that scans
over all table descriptors and seeds system.span_count with the right
value. Given cluster version gates disseminate asynchronously, we also
need a preliminary version to start tracking incremental changes.

It's useful to introduce the notion of debt. This will be handy if/when
we lower per-tenant limits, and also in the migration above since it's
possible for pre-existing tenants to have committed state in violation
of the prescribed limit. When in debt, schema changes that add new
splits will be rejected (dropping tables/indexes/partitions/etc. will
work just fine).

When attempting a txn that goes over the configured limit, the UX is as
follows:

    > CREATE TABLE db.t42(i INT PRIMARY KEY);
    pq: exceeded limit for number of table spans

Release note: None
Release justification: low risk, high benefit change

Release note: None
@blathers-crl blathers-crl bot requested a review from a team as a code owner April 8, 2022 00:29
@blathers-crl blathers-crl bot requested review from a team April 8, 2022 00:29
@blathers-crl blathers-crl bot requested a review from a team as a code owner April 8, 2022 00:29
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-22.1-77337 branch from b14c0cc to 4823f8c Compare April 8, 2022 00:29
@blathers-crl blathers-crl bot requested review from dt, ajwerner, arulajmani and irfansharif and removed request for a team April 8, 2022 00:29
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Apr 8, 2022
@blathers-crl
Copy link
Author
blathers-crl bot commented Apr 8, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor

Has to merge after #79598.

@irfansharif irfansharif merged commit 9b4ab01 into release-22.1 Apr 8, 2022
@irfansharif irfansharif deleted the blathers/backport-release-22.1-77337 branch April 8, 2022 14:19
Copy link
Contributor
@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 36 of 70 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0