8000 box: don't delete inprogress files during normal GC by locker · Pull Request #6555 · tarantool/tarantool · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

box: don't delete inprogress files during normal GC #6555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 27, 2021

Conversation

locker
Copy link
Member
@locker locker commented Oct 26, 2021

*.inprogress files shouldn't be deleted during normal GC, because they
may be in use. E.g. GC may happen to run while WAL is rotating xlog;
if GC removes the transient xlog.inprogress file created by WAL (see
xdir_create_xlog), WAL will fail to rename it and abort the current
transaction.

Initially, inprogress files were removed only after recovery. For this
reason, xdir_collect_inprogress, which is used for deleting inprogress
files, accesses FS directly, without COIO. This was changed by commit
5aa243d ("recovery: build secondary
index in hot standby mode") which moved xdir_collect_inprogress from
memtx_engine_end_recovery to memtx_engine_collect_garbage so that a hot
standby instance doesn't delete inprogress files of the master instance
by mistake.

To fix this issue, let's move xdir_collect_inprogress back where it
belongs, to engine_end_recovery, and introduce a new callback for memtx
to build its secondary keys before entering the hot standby mode -
engine_begin_hot_standby.

Let's also remove engine_collect_garbage from gc_init, which was added
there by the aforementioned commit, probably to make tests pass.

The bug was reported by the vinyl/deferred_delete test (#5089).

Closes #6554

@locker locker requested a review from sergepetrenko October 26, 2021 09:50
@locker locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch 2 times, most recently from 9951e43 to 9b89d26 Compare October 26, 2021 10:18
@coveralls
Copy link
coveralls commented Oct 26, 2021

Coverage Status

Coverage increased (+0.01%) to 84.434% when pulling 16b7b65 on locker:gh-6554-gc-removes-inprogress-xlogs into c3e231d on tarantool:master.

@@ -372,6 +372,7 @@ memtx_engine_end_recovery(struct engine *engine)
if (space_foreach(memtx_build_secondary_keys, memtx) != 0)
return -1;
}
xdir_collect_inprogress(&memtx->snap_dir);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU this was moved from memtx_engine_end_recovery in the commit you mentioned, because memtx_engine_end_recovery happens too early. With your patch applied the inprogress files left by the main node won't be removed by the hot_standby node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean xdir_collect_inprogress should be called somewhere around these lines: https://github.com/tarantool/tarantool/blob/master/src/box/box.cc#L3441-L3449

So I propose to move it to recovery_finalize or wal_enable. Both functions seem suitable to me.

Or maybe extract xdir_collect_inprogress and call it explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the commit you mention (5aa243d) also added engine_collect_garbage to gc_init: 5aa243d#diff-0bcbae7cd0018b8d3a9816c5ed98dc346f51946ea7be03730fff798c603c2da2R116

This seems useless, since gc_init is called even before engine_init.

I propose you remove that extraneous engine_collect_garbage call as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit you mention also added engine_collect_garbage to gc_init. I propose you remove that extraneous engine_collect_garbage call as well.

Good catch. Looks like it was added there to make GC tests pass. This is not needed now. Removed. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your patch applied the inprogress files left by the main node won't be removed by the hot_standby node.

Right, we shouldn't run xdir_collect_garbage until the end of the hot standby mode, because we risk deleting files that are still in use by the master instance. Thanks for catching this!

I moved engine_end_recovery back to its old place - after hot standby completes - and introduced a new engine callback for memtx to build its secondary keys before hot standby - engine_begin_hot_standby. PTAL.

@locker locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 9b89d26 to 05cda47 Compare October 26, 2021 12:40
@locker locker changed the title box: don't delete inprogress files in GC after recovery box: don't delete inprogress files during normal GC Oct 26, 2021
@locker locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 05cda47 to 2f5da36 Compare October 26, 2021 12:46
@locker locker requested a review from sergepetrenko October 26, 2021 12:53
Copy link
Collaborator
@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

*.inprogress files shouldn't be deleted during normal GC, because they
may be in use. E.g. GC may happen to run while WAL is rotating xlog;
if GC removes the transient xlog.inprogress file created by WAL (see
xdir_create_xlog), WAL will fail to rename it and abort the current
transaction.

Initially, inprogress files were removed only after recovery. For this
reason, xdir_collect_inprogress, which is used for deleting inprogress
files, accesses FS directly, without COIO. This was changed by commit
5aa243d ("recovery: build secondary
index in hot standby mode") which moved xdir_collect_inprogress from
memtx_engine_end_recovery to memtx_engine_collect_garbage so that a hot
standby instance doesn't delete inprogress files of the master instance
by mistake.

To fix this issue, let's move xdir_collect_inprogress back where it
belongs, to engine_end_recovery, and introduce a new callback for memtx
to build its secondary keys before entering the hot standby mode -
engine_begin_hot_standby.

Let's also remove engine_collect_garbage from gc_init, which was added
there by the aforementioned commit, probably to make tests pass.

The bug was reported by the vinyl/deferred_delete test (tarantool#5089).

Closes tarantool#6554
@locker locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 2f5da36 to 16b7b65 Compare October 26, 2021 14:46
@kostja
Copy link
Contributor
kostja commented Oct 27, 2021

This got to be part of 1.10

@sergepetrenko
9E86 Copy link
Collaborator

This got to be part of 1.10

1.10 isn't affected, because it doesn't have the patch with hot standby instance building secondary indices.

@locker locker merged commit f9bbfff into tarantool:master Oct 27, 2021
@locker locker deleted the gh-6554-gc-removes-inprogress-xlogs branch October 27, 2021 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC deletes xlog.inprogress files that are still in use
4 participants
0