-
Notifications
You must be signed in to change notification settings - Fork 92
Make backfill batch selection exclude rows inserted or updated after backfill start #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
511a84f
to
de9cc8b
Compare
67be54a
to
b426087
Compare
b426087
to
3c8ae30
Compare
For all rows backfilled by the up or down triggers, set the `_pgroll_needs_backfill` column to `false` for the updated row.
Make the `TableMustBeCleanedUp` assertion check that the '_pgroll_needs_backfill` column is not present in the table.
Include only those rows in each batch having `_pgroll_needs_backfill` set to `true`.
Rows are returned in a different order.
Extend operations to remove the `_pgroll_needs_backfill` column on completion and rollback.
Ensure that the new column is included in the example output and is mentioned in the explanations.
3c8ae30
to
cb46baa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself looks good. The only note I have is rather a modelling question. Could you please explain why you put removing the backfill column to the operations? To me it feels like it should be the responsibility of the backfill process to clean up after itself. So this functionality should be moved to the backfill
package and should be run from Roll.Complete
and/or Roll.Rollback
.
Removing the backfill column is in the individual operations because they have the knowledge of which table they are operating on. We could change this in future and centralize the responsiblity of cleaning up tables (removing functions, triggers and 'needs backfill' columns) into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this issue and for the detailed explanation.
Backfill only rows present at backfill start. This is third approach to solving #583. The previous two are:
This is the most direct approach to solving the problem. At the same time as the up/down triggers are created to perform a backfill, a
_pgroll_needs_backfill
column is also created on the table to be backfilled. The column has aDEFAULT
oftrue
; the constant default ensures that this extra column can be added quickly without a lengthyACCESS_EXCLUSIVE
lock. The column is removed when the the operation is rolled back or completed.The up/down triggers are modified to set
_pgroll_needs_backfill
to false whenever they update a row.The backfill itself is updated to select only rows having
_pgroll_needs_backfill
set totrue
- this ensures that only rows created before the triggers were installed are updated by the backfill. The backfill process still needs to read every row in the table, including those inserted/updated after backfill start, but only those rows created before backfill start will be updated.The main disadvantage of this approach is that backfill now requires an extra column to be created on the target table.