[BUG] not removing orphaned chunks #581

onlyjob · 2024-08-10T06:00:37Z

I had one chunkserver offline for weeks, so it was removed from cluster entirely. It have a unique label that is not used in any storage classes. All storage classes are STRICT.

When I started that particular chunkserver (its host server was booted after being offline for few months) I expected all its chunks removed. However, only some chunks were deleted, and about a million orphaned chunks (~7.5 TiB) were left on that chunkserver, not being removed at all.

Apparently MooseFS does nothing to remove orphaned chunks that were forgotten by master.

Experienced on MooseFS 3.0.117 (Debian).

The text was updated successfully, but these errors were encountered:

chogata · 2024-08-10T13:28:28Z

MooseFS removes orphaned chunks - after 1 week. This is on purpose, in case those chunks are a result of a crash or something similar. This gives the admins time to save them manually if they turn out to be needed after all.

This is not a bug :)

Edit: to be perfectly clear: after 7 days of continuous work. If you shut down the chunkserver after a day, the counter will reset.

onlyjob · 2024-08-10T16:05:57Z

Thanks for quick reply. I also hope that it is not a bug but I'm not convinced yet.

As I've said, few months passed but it seems that orphaned chunks were not deleted even after CS_DAYS_TO_REMOVE_UNUSED. (I'm not actually absolutely sure about that, as that server could have been temporarily booted within CS_DAYS_TO_REMOVE_UNUSED period therefore resetting it.)

I've also explicitly removed chunkserver in question from "Servers" tab in web interface.

What if I can not guarantee continuous availability of chunkserver for 7 days? How to ensure that orphaned chunks are deleted?

chogata · 2024-08-13T15:38:55Z

CS_DAYS_TO_REMOVE_UNUSED is set to tell the master how many days it should keep an inactive chunk server on the server list, before removing it permanently. It has nothing to do with how long a chunk server will keep orphaned chunks. You can also remove a disconnected chunk server from the list via CGI sooner than the value of CS_DAYS_TO_REMOVE_UNUSED, and again, it has nothing to do with chunks.

When a chunk server - no matter if it is on the list or not - connects to a master and presents it with a correct metaid (meaning the chunk server was connected to this master in the past and its chunks actually belong), the master will then start accepting chunk ids sent by the chunk server (this is called "chunk registration process"). If, during this process, the chunk server will send a chunk id that the master thinks should not exist, the master will reply, that it's not interested in this chunk, and the chunk server will flag it as orphaned. And remove after 7 days.

We never thought to add any way of shortening this period because, frankly, this should not happen - there should not be any orphaned chunks, unless there is an emergency situation. And in an emergency we want to keep them, long enough so if an admin realises some important data is missing, there is still time to react and try to recover the data manually. Hence the quite arbitrary 7 days.

We've never expected any usecases where users would disconnect entire servers, put them offline, and then online again after prolonged periods of time, with old chunk data still on them.

But we see a few people doing just that and while MooseFS will behave safely in such cases, it may sometimes not behave as a user would expect :)

Anyway, we can add a feature request to our list, to make a config setting that will allow to adjust the time for orphaned chunks and also to report their existence, similar to what we have with duplicates.

chogata added the data safety Tag issues and questions regarding potential data safety issues. Improve existing documentation. label Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] not removing orphaned chunks #581

[BUG] not removing orphaned chunks #581

[BUG] not removing orphaned chunks #581

[BUG] not removing orphaned chunks #581

Comments