10000 overlay: migrate existing systems to OCI by jbtrystram · Pull Request #3458 · coreos/fedora-coreos-config · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

overlay: migrate existing systems to OCI #3458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: testing-devel
Choose a base branch
from

Conversation

jbtrystram
Copy link
Member

In [1] we moved newly provisioned systems to be
deployed via container and thus retrieving updates via zincati from the OCI registry (quay.io/fedora/fedora-coreos).

This starts the migration script shipped in [2]
before Zincati to set up the migration at the next update.

A way to override this is to add another drop-in
with ExecStartPre= with a higher ranking name
under the systemd drop in path, e.g.
/etc/systemd/system/zincati.service.d/90-no-oci.conf.

[1] coreos/fedora-coreos-tracker#1823
[2] #3355

Closes coreos/fedora-coreos-tracker#1890

@jbtrystram jbtrystram requested a review from dustymabe April 15, 2025 15:08
@dustymabe
Copy link
Member

ok. a few things here.

  • we're going to need to roll this out to next first so maybe we just add this file in a conditional include (we can nest them now and it doesn't need to be a second file, see this for an example) based on the stream name.
  • Maybe we should make it even easier for people to opt out by just touching a file. That means instead of telling them to drop down an override they touch a file instead. This means we'd modify the migration script to check and no-op based on the existence of a file.
  • Should we consider also adding a MOTD helper to let people know if there system hasn't been migrated (for whatever reason). i.e. maybe the migration script failed OR they decided to opt out. If they decided to opt out I still think it would be useful to warn them in an MOTD and they can silence that too if they'd like. Perhaps the MOTD thing can be a separate thing that we roll out later?

@jbtrystram jbtrystram marked this pull request as draft April 16, 2025 13:32
Copy link
Member
@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this motd and the migration script into its own overlay (i.e. not 15fcos) that only gets included for F42?

I just went through removing some old cruft from our packages lists and pulling it out of a generic overlay is much harder than just deleting a directory.

@dustymabe
Copy link
Member

IMO things here become way simpler if we do #3458 (comment) and just roll out the motd in the next release after we roll out the automated migration (i.e. separate PR)

@jbtrystram
Copy link
Member Author

IMO things here become way simpler if we do #3458 (comment) and just roll out the motd in the next release after we roll out the automated migration (i.e. separate PR)

the issue I see going that route is : if a node get the ExecStartPre but zincati can't pull the OCI at the next update for some reason, it won't be able to get the next update, so won't ever get the motd to display
so except if the user go and look at the rpm-ostree status output it will silently get stuck

I would prefer to ship the motd in the same rollout. We can make it simpler if you think the complexity is not worth it, and we could move the .path systemd files to a simpler timer to show the motd without a reboot

@dustymabe
Copy link
Member

the issue I see going that route is : if a node get the ExecStartPre but zincati can't pull the OCI at the next update for some reason, it won't be able to get the next update, so won't ever get the motd to display so except if the user go and look at the rpm-ostree status output it will silently get stuck

TBH I've been thinking for some time that we need a motd that warns of the most recent deployment being stale (i.e. older than 30 days or something). I think if we had, it wouldn't give you what you want here, but it would be some sort of backstop that would let the user know something was going wrong with the update mechanism.

I would prefer to ship the motd in the same rollout. We can make it simpler if you think the complexity is not worth it, and we could move the .path systemd files to a simpler timer to show the motd without a reboot

Interested in thoughts from @jlebon @travier here.

@jlebon
Copy link
Member
jlebon commented May 8, 2025

I agree with @jbtrystram that the likelihood of failure is highest during migration and we want to cover the case where the system can't upgrade. But I also agree with @dustymabe that this seems quite complex.

Hmm, I'm confused why we need the motd stuff to live completely separately. Couldn't it be another ExecStartPre= that just checks if the migration was successful (e.g. check for /run stamp file written by the migration script) and writes out the appropriate motd? No additional path/timer/systemd units and presets.

Ahh, is it to handle the "disabled Zincati" case? People doing rpm-ostree upgrade directly are already diverging by not following the graph. But yeah, we could have a separate MOTD at some point before we stop OSTree releases that checks for this and emits a message.

@jbtrystram
Copy link
Member Author

Couldn't it be another ExecStartPre= that just checks if the migration was successful (e.g. check for /run stamp file written by the migration script) and writes out the appropriate motd? No additional path/timer/systemd units and presets.

Then you'd only get the MOTD at next login I think.
As currently you can't have Zincati run before the thing writing the motd : #3458 (comment)

That's why I ended up with the path unit , but I think it's a byproduct of cosa run getting a log in very early, in real life you'd get logged in after Zincati started i think.

@jlebon
Copy link
Member
jlebon commented May 9, 2025

That's why I ended up with the path unit , but I think it's a byproduct of cosa run getting a log in very early, in real life you'd get logged in after Zincati started i think.

Yeah, agree with that. Noticing it at the next login seems OK too.

@jbtrystram
Copy link
Member Author

Ok I Just tested this some more and after all the whole .path thing is really uneeded as the motd is refreshed at each login anyway.

Couldn't it be another ExecStartPre= that just checks if the migration was successful (e.g. check for /run stamp file written by the migration script) and writes out the appropriate motd? No additional path/timer/systemd units and presets.

Then you can't silence it without having to do another zincati dropin

@jbtrystram jbtrystram force-pushed the migrate-to-oci branch 2 times, most recently from dc74044 to 9f8da5e Compare May 13, 2025 10:06
@@ -0,0 +1,3 @@
[Service]
StateDirectory=coreos-oci-migration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry just a little bit about this somehow affecting zincati daemon when it runs.

This will set some env vars:

 StateDirectory=         │ /var/lib/                   │ $XDG_STATE_HOME           │ $STATE_DIRECTORY  

Are we sure this won't change any behavior of Zincati when it runs?

If we are worried about it we could just move to a model where we don't write the files under a directory but just directly in /var/lib/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had any issues in my testing, Zincati updated fine

@@ -33,6 +33,10 @@ conditional-include:
# All Fedora CoreOS streams share the same pool for locked files.
lockfile-repos:
- fedora-coreos-pool
- if: ${stream} == "next"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if: ${stream} == "next"
- if: stream == "next"

I don't think the ${} is needed, but please confirm

Comment on lines 44 to 47
The system can still be updated using 'rpm-ostree update' but this
will no longer work after the legacy OSTree repository is decommissioned.
Currently this is planned for the release
of Fedora 43.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the strategy slightly such that rpm-ostree upgrade will continue to work even after the OCI migration. So we can drop this part.

See new wording suggestion in other comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are misunderstanding this : this is shown when Zincati is disabled, so there is no oci migration happening here.
This says that if you want to disable zincati you can still manually update, as always, but as the ostree repo won't be populated after f43, then no more updates after this point.

@dustymabe
Copy link
Member

Some wording suggestions for the motd messages:

opt_out_message="
This system has been opted out of the migration to OCI images for
updates as the opt-out stamp file exists.

This system will keep updating using the legacy OSTree repository,
but later this year new Fedora CoreOS updates will cease to be pushed
to the OSTree repository.

When ready, the migration can be resumed by deleting 
${OCI_MIGRATION_OPT_OUT}, then restarting zincati.service"

failed_message="
The migration to OCI images for updates failed. Check the logs of
zincati.service for more details.

This system will keep updating using the legacy OSTree repository,
but later this year new Fedora CoreOS updates will cease to be pushed
to the OSTree repository.

When ready, the migration can be retried by deleting 
${OCI_MIGRATION_FAILED}, then restarting zincati.service"

zincati_disabled_message="
The zincati service is disabled on this system.

If not done so already please migrate the system to the OCI backend
for updates by running:

stream=stable # or testing or next
target=ostree-remote-registry:fedora:quay.io/fedora/fedora-coreos:\$stream
sudo rpm-ostree rebase \$target

Where \$stream matches the Fedora CoreOS stream the system is following."

@dustymabe
Copy link
Member

I can try to test this later this week if you like.

@jbtrystram
Copy link
Member Author

@dustymabe I applied the suggestions . Thanks

@jbtrystram jbtrystram force-pushed the migrate-to-oci branch 2 times, most recently from dc46722 to 8fb8909 Compare May 14, 2025 12:55
Copy link
Member
@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code review LGTM. I'll try to run some tests.

@jbtrystram jbtrystram marked this pull request as ready for review May 14, 2025 15:15
@jbtrystram
Copy link
Member Author
jbtrystram commented May 14, 2025

I made a small ign config and instructions to make testing this easier : https://gist.github.com/jbtrystram/5f0f94e2ea47f46e16f0e7cafc4cc788

it's just testing the MOTD though. I tested the zincati migration with the status override well enough in the other PR I t
think

Updated this to also test the migration

In [1] we moved newly provisioned systems to be
deployed via container and thus retrieving updates
via zincati from the OCI registry (quay.io/fedora/fedora-coreos).

This starts the migration script shipped in [2]
before Zincati to set up the migration at the next
update.

Slightly tweak the migration script to look for
a stamp to opt-out of the migration and create one
to signal failure.

Also add a MOTD that look for those stamps and
display an appropriate message.

[1] coreos/fedora-coreos-tracker#1823
[2] coreos#3355

Closes coreos/fedora-coreos-tracker#1890
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate existing systems to OCI updates
3 participants
0