box: don't delete inprogress files during normal GC #6555

locker · 2021-10-26T09:50:13Z

*.inprogress files shouldn't be deleted during normal GC, because they
may be in use. E.g. GC may happen to run while WAL is rotating xlog;
if GC removes the transient xlog.inprogress file created by WAL (see
xdir_create_xlog), WAL will fail to rename it and abort the current
transaction.

Initially, inprogress files were removed only after recovery. For this
reason, xdir_collect_inprogress, which is used for deleting inprogress
files, accesses FS directly, without COIO. This was changed by commit
5aa243d ("recovery: build secondary
index in hot standby mode") which moved xdir_collect_inprogress from
memtx_engine_end_recovery to memtx_engine_collect_garbage so that a hot
standby instance doesn't delete inprogress files of the master instance
by mistake.

To fix this issue, let's move xdir_collect_inprogress back where it
belongs, to engine_end_recovery, and introduce a new callback for memtx
to build its secondary keys before entering the hot standby mode -
engine_begin_hot_standby.

Let's also remove engine_collect_garbage from gc_init, which was added
there by the aforementioned commit, probably to make tests pass.

The bug was reported by the vinyl/deferred_delete test (#5089).

Closes #6554

coveralls · 2021-10-26T11:04:22Z

Coverage increased (+0.01%) to 84.434% when pulling 16b7b65 on locker:gh-6554-gc-removes-inprogress-xlogs into c3e231d on tarantool:master.

sergepetrenko · 2021-10-26T11:04:49Z

src/box/memtx_engine.cc

@@ -372,6 +372,7 @@ memtx_engine_end_recovery(struct engine *engine)
 		if (space_foreach(memtx_build_secondary_keys, memtx) != 0)
 			return -1;
 	}
+	xdir_collect_inprogress(&memtx->snap_dir);


AFAIU this was moved from memtx_engine_end_recovery in the commit you mentioned, because memtx_engine_end_recovery happens too early. With your patch applied the inprogress files left by the main node won't be removed by the hot_standby node.

I mean xdir_collect_inprogress should be called somewhere around these lines: https://github.com/tarantool/tarantool/blob/master/src/box/box.cc#L3441-L3449

So I propose to move it to recovery_finalize or wal_enable. Both functions seem suitable to me.

Or maybe extract xdir_collect_inprogress and call it explicitly.

BTW, the commit you mention (5aa243d) also added engine_collect_garbage to gc_init: 5aa243d#diff-0bcbae7cd0018b8d3a9816c5ed98dc346f51946ea7be03730fff798c603c2da2R116

This seems useless, since gc_init is called even before engine_init.

I propose you remove that extraneous engine_collect_garbage call as well.

The commit you mention also added engine_collect_garbage to gc_init. I propose you remove that extraneous engine_collect_garbage call as well.

Good catch. Looks like it was added there to make GC tests pass. This is not needed now. Removed. Thanks!

With your patch applied the inprogress files left by the main node won't be removed by the hot_standby node.

Right, we shouldn't run xdir_collect_garbage until the end of the hot standby mode, because we risk deleting files that are still in use by the master instance. Thanks for catching this!

I moved engine_end_recovery back to its old place - after hot standby completes - and introduced a new engine callback for memtx to build its secondary keys before hot standby - engine_begin_hot_standby. PTAL.

sergepetrenko

Thanks! LGTM.

*.inprogress files shouldn't be deleted during normal GC, because they may be in use. E.g. GC may happen to run while WAL is rotating xlog; if GC removes the transient xlog.inprogress file created by WAL (see xdir_create_xlog), WAL will fail to rename it and abort the current transaction. Initially, inprogress files were removed only after recovery. For this reason, xdir_collect_inprogress, which is used for deleting inprogress files, accesses FS directly, without COIO. This was changed by commit 5aa243d ("recovery: build secondary index in hot standby mode") which moved xdir_collect_inprogress from memtx_engine_end_recovery to memtx_engine_collect_garbage so that a hot standby instance doesn't delete inprogress files of the master instance by mistake. To fix this issue, let's move xdir_collect_inprogress back where it belongs, to engine_end_recovery, and introduce a new callback for memtx to build its secondary keys before entering the hot standby mode - engine_begin_hot_standby. Let's also remove engine_collect_garbage from gc_init, which was added there by the aforementioned commit, probably to make tests pass. The bug was reported by the vinyl/deferred_delete test (tarantool#5089). Closes tarantool#6554

kostja · 2021-10-27T05:30:31Z

This got to be part of 1.10

sergepetrenko · 2021-10-27T06:04:59Z

This got to be part of 1.10

1.10 isn't affected, because it doesn't have the patch with hot standby instance building secondary indices.

locker requested a review from sergepetrenko October 26, 2021 09:50

locker assigned locker and sergepetrenko Oct 26, 2021

locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch 2 times, most recently from 9951e43 to 9b89d26 Compare October 26, 2021 10:18

sergepetrenko requested changes Oct 26, 2021

View reviewed changes

locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 9b89d26 to 05cda47 Compare October 26, 2021 12:40

locker changed the title ~~box: don't delete inprogress files in GC after recovery~~ box: don't delete inprogress files during normal GC Oct 26, 2021

locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 05cda47 to 2f5da36 Compare October 26, 2021 12:46

locker requested a review from sergepetrenko October 26, 2021 12:53

sergepetrenko approved these changes Oct 26, 2021

View reviewed changes

locker force-pushed the gh-6554-gc-removes-inprogress-xlogs branch from 2f5da36 to 16b7b65 Compare October 26, 2021 14:46

locker merged commit f9bbfff into tarantool:master Oct 27, 2021

locker deleted the gh-6554-gc-removes-inprogress-xlogs branch October 27, 2021 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

box: don't delete inprogress files during normal GC #6555

box: don't delete inprogress files during normal GC #6555

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

box: don't delete inprogress files during normal GC #6555

box: don't delete inprogress files during normal GC #6555

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!