8000 restic repo initialised twice when starting up docker with both backup/pruneprune actually runs · Issue #48 · djmaze/resticker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

restic repo initialised twice when starting up docker with both backup/pruneprune actually runs #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sumnerboy12 opened this issue Jun 18, 2020 · 31 comments
Labels

Comments

@sumnerboy12
Copy link

I noticed when I started my docker swarm stack which included both backup and prune, that it attempted to (and succeeded) in initialising the same repo twice.

This put the repo in a bad state and any attempts to run a backup/prune resulted in;

Fatal: config cannot be loaded: ciphertext verification failed

Can we either only attempt to initialise during backup? In prune can we check if the repo is initialised and if not then exit early (clearly nothing to do).

@ThomDietrich
Copy link
Contributor
ThomDietrich commented Jun 18, 2020

How is this even possible?

Initialization happens outside of the backup or prune code, inside entrypoint - could this error be related to another aspect of your setup?

Also, whatever led to the issue might be resolved after PR #47 is merged. Would you be able to test the new logic with your setup?

(Btw. I recognize your profile pic from some years ago, not sure from where 😄)

@sumnerboy12
Copy link
Author

I have a docker stack with two services, backup and prune (i.e. one with BACKUP_CRON and one with PRUNE_CRON). When I deployed the stack they both started up simultaneously on the same node against an uninitialised repo (on Backblaze).

They both managed to initialise the repo at the same time and put it into an invalid state. This has been reported here.

(BTW - I think I recognise your handle also - openhab perhaps?!)

@ThomDietrich
Copy link
Contributor
ThomDietrich commented Jun 20, 2020

Hey!

Which part of the issue behind your link talks about initialization? The main thing I get from the issue is that "this kind of issue" is normally linked to the storage and less to restic. Can you verify this happens with the new logic and with simultaneous startup.
Regarding simultaneous: Why did you set it up like this? Wouldn't it then be better to add --prune to the forget command?

Let's clarify first that this is indeed a real issue, if so we need to discuss whether initialization should only happen with BACKUP_CRON.

Right. openHAB, of course! :)

@sumnerboy12
Copy link
Author
sumnerboy12 commented Jun 21, 2020

Just the comment that talked about that error being caused by the repo being initialised twice (somehow). When I got this error I was searching for relevant posts and came across that one. I noticed when checking the docker logs for my djmaze/resticker services that both the backup and prune services had started up simultaneously and had logging to say they had both initialised the repo.

The next time either of them attempted to do anything, or I tried to access the repo manually (via restic snapshots), I got the ciphertext validation error.

I was following the example in https://github.com/djmaze/resticker/blob/master/docker-swarm.example.yml.

@zoispag
Copy link
zoispag commented Jun 21, 2020

Hey!

Which part of the issue behind your link talks about initialization? The main thing I get from the issue is that "this kind of issue" is normally linked to the storage and less to restic. Can you verify this happens with the new logic and with simultaneous startup.

Regarding simultaneous: Why did you set it up like this? Wouldn't it then be better to add --prune to the forget command?

Let's clarify first that this is indeed a real issue, if so we need to discuss whether initialization should only happen with BACKUP_CRON.

Right. openHAB, of course! :)

@ThomDietrich the --prune is not realistic for big repos with multiple hosts running backup, since prune is a blocking action. That’s why @djmaze created the second container for pruning in #36 to run it on different intervals

2 different containers (backup & prune) starting at the same time would indeed cause the initialization twice. When I tried it, all my repos were already initialized.

@ThomDietrich
Copy link
Contributor
ThomDietrich commented Jun 21, 2020

@zoispag that makes sense. Thanks.
Got it. I guess we can safely assume that a repo will already be initialized in the prune usecase of the image. Shall we separate the check in #47 then. Check whether repository reachable and initialized but only initialize in the backup case? @djmaze

Also: in the example yml files the container start with a 30min difference. For the purpose of an example I would suggest to make it 12 hours.

@zoispag
Copy link
zoispag commented Jun 21, 2020

@zoispag that makes sense. Thanks.

Got it. I guess we can safely assume that a repo will already be initialized in the prune usecase of the image. Shall we separate the check in #47 then. Check whether repository reachable and initialized but only initialize in the backup case? @djmaze

Also: in the example yml files the container start with a 30min difference. For the purpose of an example I would suggest to make it 12 hours.

Pruning will start with 30min difference. The containers will start at the same time, if they are at the same docker-compose file.

@ThomDietrich
Copy link
Contributor

Of course. Obviously I was talking about the cron job definition :)

@djmaze
Copy link
Owner
djmaze commented Jun 21, 2020

Check whether repository reachable and initialized but only initialize in the backup case? @djmaze

That sounds reasonable. Let's do it.

@djmaze
Copy link
Owner
djmaze commented Jun 21, 2020

I just realized that, at least in a swarm deployment, multiple initialization can also occur when the backup service is deployed to multiple machines. So just skipping the initialization in the prune service does not completely alleviate this problem.

In my Docker swarm deployments, I sometimes make use of separate "one-shot" initialization services. They are just started once and then exit. On swarm, this is possible using a restart policy with condition: on-failure. When using Docker compose, it should be possible to use restart: on-failure instead.

So we could have a separate init service which is just used for initializing the repository on first use. Maybe this would be the way to go. What do you think?

@djmaze
Copy link
Owner
djmaze commented Jun 21, 2020

Of course, when using Docker compose, it would be enough to add an init command for the entrypoint so you would just need to call something like docker-compose run backup init once.

@ThomDietrich
Copy link
Contributor

Before we discuss this any further: Is there any reason why restic should even allow this? Instead of building a workaround here maybe we can help come up with an improvement in restic itself!?

@ThomDietrich
Copy link
Contributor

Also: in the example yml files the container start with a 30min difference. For the purpose of an example I would suggest to make it 12 hours.

Moved to #49

@djmaze
Copy link
Owner
djmaze commented Jun 21, 2020

Before we discuss this any further: Is there any reason why restic should even allow this? Instead of building a workaround here maybe we can help come up with an improvement in restic itself!?

Would be okay for me. Then we should at least give a hint about this in the documentation (as you thankfully already began in #49).

@sumnerboy12
Copy link
Author

What about removing the auto-initialisation and requiring the user to do this step manually (with suitable documentation/instructions)?

@zoispag
Copy link
zoispag commented Jun 22, 2020

What about removing the auto-initialisation and requiring the user to do this step manually (with suitable documentation/instructions)?

I don’t think that’s a good idea. I do rely on restiker initializing the repos for me.

On the other hand, we can add a mild delay on the prune container on startup, to wait a minute before attempting initialization.

@djmaze
Copy link
Owner
djmaze commented Jun 22, 2020

I don't really like both ideas. Initializing the repo manually yourself means you have to also setup all the SSH keys, credentials etc. locally, which is prone to errors. The 1 minute wait, on the other hand, would rather be an ugly workaround and does not protect when deploying multiple backup containers simultaneously.

Ideally, this should be solved on the restic side. Or for a more short-term solution, we could have a separate init command as proposed before.

@ThomDietrich
Copy link
Contributor
ThomDietrich commented Jun 22, 2020

I like the carved out init command (which is process wise what @sumnerboy12 said) and am not a fan of the wait solution for its unclear new issues.

Let's talk init. In my opinion it is anyhow better to leave this one-time command to an intentional action by the user - even though I can't currently think of a scenario in which it might lead to unexpected behavior or security risk.

To summarize the solution:

  1. In backup and prune "mode" the entrypoint script would check the connection to the repository and exit with an error if it is not accessible or initialized.
  2. Initialization has to be executed manually with e.g. docker-compose run backup init
  3. An optional environment variable (default off) allows the user to re-enable current auto-initialize behavior

In other words: The solution is a PR that is 10% code and 90% docu changes.

What do you think?

@zoispag
Copy link
zoispag commented Jun 22, 2020

If init is removed from both backup/prune containers, i would definitely need an ENV variable, to turn auto-init on. I am using this as part of my automation.

On the other hand, does it make sense to init a repo to do prune? It should already exist. We can remove init from prune altogether. (This still does not solve the issue of multiple backup containers on swarm)

@stale
Copy link
stale bot commented Aug 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 21, 2020
@ThomDietrich
Copy link
Contributor

Not stale, just didn't have the time to work on yet

@stale stale bot removed the wontfix label Aug 21, 2020
@stale
Copy link
stale bot commented Oct 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 21, 2020
@stale stale bot closed this as completed Oct 28, 2020
@camo-f
Copy link
camo-f commented Nov 3, 2020

Hello,

init is still an issue, I got the previously reported error with the current docker-compose example.

I think what @ThomDietrich summed up here is a good approach.

To summarize the solution:

  1. In backup and prune "mode" the entrypoint script would check the connection to the repository and exit with an error if it is not accessible or initialized.

  2. Initialization has to be executed manually with e.g. docker-compose run backup init

  3. An optional environment variable (default off) allows the user to re-enable current auto-initialize behavior

I also agree with @zoispag that init shouldn't be done when pruning, as repo should already be initialised. So the optional env variable would only have effect in backup containers.

For Docker Swarm, @djmaze mentioned an initialisation service, which seems the cleanest way to handle multiple backup services to me. It could also be used for Docker Compose to only have a single file, which can be better for a quickstart.

@ThomDietrich Have you already worked on this issue ? It doesn't seem too complex to me, so I can help if needed.

@ThomDietrich
Copy link
Contributor
ThomDietrich commented Nov 3, 2020

Triggered by a linked project I will most probably take some time to work on this issue this week. I will respond to your thoughts and suggestions later. Thanks for the push

@djmaze could you please reopen the issue? Thanks

@varac
Copy link
6D40
varac commented Feb 14, 2021

Yes, please reopen this bug - I hit it every time I want to init a repo for a new host where both backup and prune containers are initializing at the same time.

@djmaze
Copy link
Owner
djmaze commented Feb 14, 2021

Oh, well, seems I overlooked the previous comment. Sorry

@djmaze djmaze reopened this Feb 14, 2021
@stale stale bot removed the wontfix label Feb 14, 2021
@varac
Copy link
varac commented Apr 7, 2021

Any news on this ? Right now, every new host added to our backup needs manual intervention, would be great to have this fixed.

@djmaze
Copy link
Owner
djmaze commented Apr 18, 2021

Sorry for the late answer. If there is no one on this, I will try to find some time to implement this in the upcoming days.

@stale
Copy link
stale bot commented Jun 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 18, 2021
@stale stale bot closed this as completed Jun 26, 2021
@MRezaNasirloo
Copy link

I faced this issue while trying to set up restic on a new host with an existing repo, no my repo is broken and not usable anymore

Fatal: config or key 6a88fbbb5b0fa776b220d4cff42dd5716a3f691b550ffbec5a5fb15f15b2153c is damaged: ciphertext verification failed. Try again

@pquerner
Copy link
pquerner commented Jun 3, 2023

I just had the same (running 1.7.0) and temporary added a profiles entry to the service prune and check like described here.

After starting (docker compose up ...), only the backup service ran - and found no repository, therefore issued the init command on restic.
Then I stopped the container - and removed the profiles config from the .yml file and started it again - this time it did not create multiple keys (etc) and it ran flawlessly.

Maybe this needs a init service and all services need to wait for this service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants
0