8000 Make backfill batch selection exclude rows inserted or updated after backfill start by andrew-farries · Pull Request #652 · xataio/pgroll · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Make backfill batch selection exclude rows inserted or updated after backfill start #652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 18, 2025

Conversation

andrew-farries
Copy link
Collaborator
@andrew-farries andrew-farries commented Feb 4, 2025

Backfill only rows present at backfill start. This is third approach to solving #583. The previous two are:

This is the most direct approach to solving the problem. At the same time as the up/down triggers are created to perform a backfill, a _pgroll_needs_backfill column is also created on the table to be backfilled. The column has a DEFAULT of true; the constant default ensures that this extra column can be added quickly without a lengthy ACCESS_EXCLUSIVE lock. The column is removed when the the operation is rolled back or completed.

The up/down triggers are modified to set _pgroll_needs_backfill to false whenever they update a row.

The backfill itself is updated to select only rows having _pgroll_needs_backfill set to true - this ensures that only rows created before the triggers were installed are updated by the backfill. The backfill process still needs to read every row in the table, including those inserted/updated after backfill start, but only those rows created before backfill start will be updated.

The main disadvantage of this approach is that backfill now requires an extra column to be created on the target table.

Base automatically changed from remove-trigger-else-expr to main February 4, 2025 11:35
@andrew-farries andrew-farries force-pushed the backfill-old-tuples-only3 branch from 511a84f to de9cc8b Compare February 4, 2025 11:36
@andrew-farries andrew-farries force-pushed the backfill-old-tuples-only3 branch 2 times, most recently from 67be54a to b426087 Compare February 17, 2025 07:16
@andrew-farries andrew-farries force-pushed the backfill-old-tuples-only3 branch from b426087 to 3c8ae30 Compare February 17, 2025 07:49
For all rows backfilled by the up or down triggers, set the
`_pgroll_needs_backfill` column to `false` for the updated row.
Make the `TableMustBeCleanedUp` assertion check that the
'_pgroll_needs_backfill` column is not present in the table.
Include only those rows in each batch having `_pgroll_needs_backfill`
set to `true`.
Rows are returned in a different order.
Extend operations to remove the `_pgroll_needs_backfill` column on
completion and rollback.
Ensure that the new column is included in the example output and is
mentioned in the explanations.
Copy link
Contributor
@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself looks good. The only note I have is rather a modelling question. Could you please explain why you put removing the backfill column to the operations? To me it feels like it should be the responsibility of the backfill process to clean up after itself. So this functionality should be moved to the backfill package and should be run from Roll.Complete and/or Roll.Rollback.

8000
@andrew-farries
Copy link
Collaborator Author

Removing the backfill column is in the individual operations because they have the knowledge of which table they are operating on. Roll.Complete and Roll.Rollback don't have that same information.

We could change this in future and centralize the responsiblity of cleaning up tables (removing functions, triggers and 'needs backfill' columns) into Roll.Complete and Roll.Rollback, but I think this would be a larger change. It could be worth doing thoug as we have lots of repetition between individual Complete and Rollback methods in different operations now.

@andrew-farries andrew-farries requested a review from kvch February 18, 2025 13:04
Copy link
Contributor
@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this issue and for the detailed explanation.

@andrew-farries andrew-farries merged commit 3e21e4d into main Feb 18, 2025
28 checks passed
@andrew-farries andrew-farries deleted the backfill-old-tuples-only3 branch February 18, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0