Make backfill batch selection exclude rows inserted or updated after backfill start #634

andrew-farries · 2025-01-29T09:41:40Z

Implement one possible solution to limit backfill to touch only rows that existed in the table prior to backfill start and ignore rows inserted or updated after backfill start.

I believe the solution presented here is flawed and should not be merged as is; the PR is open for discussion to see if the technique can be made to work or if we need to take a different approach.

In general, it is fine from a correctness POV if backfill updates a row that was already updated/inserted by a transaction that committed after backfill started; it's not OK from a performance POV however as a backfill running at the same time as a high rate of INSERTs into the table will cause the backfill to never terminate (the issue is described in #583).

Proposed solution

Backfill works be touching rows in batches (batch size is configurable). 'Touch' in this context means to set the row's PK to itself, causing the already-installed backfill trigger to fire for that row.

As of this PR, the per-batch query looks like:

WITH batch AS
(
  SELECT "id", "zip"
  FROM "table_name"
  WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
  )
  AND ("id", "zip") > ('1', '1234')
  ORDER BY "id", "zip"
  LIMIT 10
  FOR NO KEY UPDATE
),
update AS
(
  UPDATE "table_name"
  SET "id" = "table_name"."id", "zip" = "table_name"."zip"
  FROM batch
  WHERE "table_name"."id" = batch."id" AND "table_name"."zip" = batch."zip"
  RETURNING "table_name"."id", "table_name"."zip"
)
SELECT LAST_VALUE("id") OVER(), LAST_VALUE("zip") OVER()
FROM update

The first CTE (WITH batch AS...) is the relevant part here. The purpose of this CTE is to select the next batch of rows to be updated (and lock those rows for update).

The relevant part of the first CTE is this bit:

WHERE (
    "pgroll".b_follows_a(xmin::text::bigint, 1234) OR
    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)
)

This is where we attempt to filter out tuples that were created/updated after the backfill process started. '1234' represents the xid of when the backfill process started. The first part:

    "pgroll".b_follows_a(xmin::text::bigint, 1234)

checks to see if the tuple is older than the xid when the backfill started. If so the tuple should be part of the batch. b_follows_a implements an xid-wraparound safe comparison of xids. The transaction id space (0 - 2^32-1) is considered as a circle and anything in the forward half of the circle is considered ahead of xid, anything else is behind. See Postgres Internals book - Chapter 7, Freezing for a description:

Using this calculation alone to determine relative ages between transaction ids will fail for very old rows (older than 2^31 transactions since backfill start), which will appear to be in the future.

Here, the first snowflake transaction at the top of the circle will appear to be in the future relative to T1, but really it's in the past. Postgres solves this problem by 'freezing` old tuples once they are guaranteed visible to all transactions (exactly when Postgres freezes old tuples is configurable, but not relevant here). So we also need a second check to always include a row in the batch if it's frozen:

    "pgroll".b_follows_a(xmin::text::bigint, "pgroll".frozen_xid('public', 'table_name')::text::bigint)

The frozen_xid function is defined:

-- Find the xid for a table before which all transaction ids in the table are
-- frozen
CREATE OR REPLACE FUNCTION pgroll.frozen_xid(schema_name name, table_name name)
RETURNS xid
LANGUAGE SQL
AS $$
  SELECT relfrozenxid
	FROM pg_class
	WHERE relnamespace::regnamespace::name = schema_name
	AND relname = table_name
	AND relkind = 'r'
$$;

The test checks if the transaction id that created the tuple comes before the oldest unfrozen tuple in the table (pg_class.relfrozenxid). If so, the tuple is frozen and should be included in the batch even if the visibility check would regard it as in the future.

Problem

What happens if a tuple was frozen many billions of transactions ago (ie several xid wraparounds ago)? the 32 bit pg_class.relfrozenxid won't be able to tell us accurately whether the xid of this extremely old tuple should be considered frozen or not - relfrozenxid doesn't contain epoch information about which wraparound cycle it refers to.

The ultimate truth of a whether a tuple is frozen or not is contained in the tuple header - frozen tuples have their HEAP_XMIN_FROZEN bits set in t_infomask:

static inline bool
HeapTupleHeaderXminFrozen(const HeapTupleHeaderData *tup)
{
	return (tup->t_infomask & HEAP_XMIN_FROZEN) == HEAP_XMIN_FROZEN;
}

But the only way to access the tuple header is via the pageinspect extension. Prior to Postges 9.4, frozen tuples had their xmin replaced with a special value to indicate that the row was frozen which made easy identification of frozen tuples from SQL possible, but this is no longer the case - the tuple infomask is used instead.

Summary

Without a reliable way to check from SQL whether a tuple is frozen, I don't think this approach is robust. Reliably checking whether a tuple is frozen looks like it requires access to the tuple header, not possible from SQL, only by using extensions.

Using pageinspect to determine if a tuple is frozen may be possible, but would introduce a dependency on that extension; currently pgroll does not require any extensions.

Without robust checks for frozen tuples, the backfill process could exclude tuples that should be backfilled potentially resulting in data loss.

References

The `WithStateSchema` option allows to specify in which schema the `pgroll` internal state is stored. This is useful when the `pgroll` internal state is stored in a schema other than the default `pgroll`.

Ensure that only rows having an `xmin` that logically follows the `xid` of the transaction that started the backfill are included in the batch.

Add a `WithSchema` option to `backfill.Backfill` to allow specifying the schema in which the backfill will operate.

Ensure rows logically preceding the frozen transaction id for the table are included in the batch of rows to be backfilled.

andrew-farries · 2025-02-06T16:32:17Z

I tested this solution with the aid of the xid_wraparound extension from the Postgres source tree [code, blog] to be able to burn through transaction ids quickly.

Created a Postgres image with the extension installed:

FROM gcc:latest AS builder

# Clone the Postgres 17 release branch at depth 1.
# The Postgres version we checkout here must match the version of Postgres run
# in the final image, otherwise there will be a mismatch between the extension
# and the server versions.
WORKDIR /tmp
RUN git clone --branch REL_17_2 --depth=1 https://github.com/postgres/postgres.git

# Install missing build-time dependencies for Postgres
RUN apt update && apt install -y \
  make \
  flex \
  bison

# Run the 'configure' script
WORKDIR postgres
RUN ./configure

# Build the 'xid_wraparound' extension
WORKDIR src/test/modules/xid_wraparound
RUN make

# Build the final image
FROM postgres:17

# Install the extension files from the builder image
RUN mkdir -p /usr/lib/postgresql/17/modules
COPY --from=builder /tmp/postgres/src/test/modules/xid_wraparound/xid_wraparound.so /usr/lib/postgresql/17/lib/
COPY --from=builder /tmp/postgres/src/test/modules/xid_wraparound/xid_wraparound--*.sql /usr/share/postgresql/17/extension/
COPY --from=builder /tmp/postgres/src/test/modules/xid_wraparound/xid_wraparound.control /usr/share/postgresql/17/extension/

# init.sql ensures that the extension is loaded when the container is started
COPY init.sql /docker-entrypoint-initdb.d/

init.sql is:

CREATE EXTENSION xid_wraparound;

With this it's possible to approach transaction id wraparound quickly by using xids 1 billion at a time:

select consume_xids(1_000_000_000);
vacuum verbose <tablename>

As expected, once transaction ids surpass 2^32-1, the pg_class.relfrozenxid for the table I was testing against did not wrap a 8000 round (it's a 64 bit integer). Without epoch counters or a 64 bit xmin in the tuple header we therefore cant compare pg_class.relfrozenxid for the table with the xmin values for each to determine whether a tuple is frozen or not.

andrew-farries · 2025-02-17T07:18:41Z

Closing in favor of #652

…backfill start (#652) Backfill only rows present at backfill start. This is third approach to solving #583. The previous two are: * #634 * #648 This is the most direct approach to solving the problem. At the same time as the up/down triggers are created to perform a backfill, a `_pgroll_needs_backfill` column is also created on the table to be backfilled. The column has a `DEFAULT` of `true`; the constant default ensures that this extra column can be added quickly without a lengthy `ACCESS_EXCLUSIVE` lock. The column is removed when the the operation is rolled back or completed. The up/down triggers are modified to set `_pgroll_needs_backfill` to false whenever they update a row. The backfill itself is updated to select only rows having `_pgroll_needs_backfill` set to `true` - this ensures that only rows created before the triggers were installed are updated by the backfill. The backfill process still needs to *read* every row in the table, including those inserted/updated after backfill start, but only those rows created before backfill start will be updated. The main disadvantage of this approach is that backfill now requires an extra column to be created on the target table.

andrew-farries added the help wanted Extra attention is needed label Jan 29, 2025

andrew-farries added 7 commits January 29, 2025 09:47

Add wraparound-aware cmp function to init.sql

7871cc8

Get transaction id at start of backfill process

57cb689

Add WithStateSchema option for backfill

0b5d1e7

The `WithStateSchema` option allows to specify in which schema the `pgroll` internal state is stored. This is useful when the `pgroll` internal state is stored in a schema other than the default `pgroll`.

Add test for xmin in backfill template

629b8e4

Ensure that only rows having an `xmin` that logically follows the `xid` of the transaction that started the backfill are included in the batch.

8000

Add WithSchema option to backfill.Backfill

2446d91

Add a `WithSchema` option to `backfill.Backfill` to allow specifying the schema in which the backfill will operate.

Add frozen_xid function to init.sql

1ed7f7c

Test for frozen transaction id in each batch

4bed314

Ensure rows logically preceding the frozen transaction id for the table are included in the batch of rows to be backfilled.

andrew-farries force-pushed the backfill-old-tuples-only branch from dbe8bdf to 4bed314 Compare January 29, 2025 09:47

andrew-farries requested review from urso and gulcin January 29, 2025 11:36

This was referenced Feb 3, 2025

Make backfill batch selection exclude rows inserted or updated after backfill start #648

Closed

Make backfill batch selection exclude rows inserted or updated after backfill start #652

Merged

andrew-farries closed this Feb 17, 2025

andrew-farries mentioned this pull request Feb 17, 2025

Backfill process never completes during periods of high INSERT query rates #583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Make backfill batch selection exclude rows inserted or updated after backfill start #634

Conversation

Proposed solution

Problem

Summary

References