-
Notifications
You must be signed in to change notification settings - Fork 880
Conversation
stage1/init/init.go
Outdated
@@ -468,6 +469,10 @@ func getArgsEnv(p *stage1commontypes.Pod, flavor string, canMachinedRegister boo | |||
env = append(env, "SYSTEMD_LOG_LEVEL=err") // silence log_warning too | |||
} | |||
|
|||
if hostIPC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this to be called parentIPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Looks quite in line with my initial investigation from #3291.
I don't want to put too much focus on legacy systems, but should we detect <205 and skip registration on that? I seem to understand the misbehavior is on systemd-nspawn registration. This should allow to sync to latest.
|
|
The journal issue is fixed, I added a commit about it ("stage1: add systemd-journal-flush.service") with an explanation in the commit message. It works for me both on the "coreos" and the "host" flavor. Btw, giving
I'll file a follow-up ticket and continue with the other TODO items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
@@ -1,5 +1,7 @@ | |||
lib64/tmpfiles.d/journal-nocow.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? And if we do, can we add it for the host flavor and ARM too?
The docs in the file say it only matters for btrfs filesystems and it improves performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that's a left-over from an experiment. I'll remove it from this PR.
0, | ||
}, | ||
{ | ||
"--ipc=supercalifragilisticexpialidocious", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
default: | ||
log.Fatalf("unknown value for --ipc parameter: %v", p.IPCMode) | ||
} | ||
if parentIPC && flavor == "kvm" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this for the "fly" flavor? That is, if we pass --ipc=private
Right now we just display a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we won't get here because stage0 will prevent us from passing --ipc=
before, maybe it's good to keep it for extra safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iaguis "fly" is implemented in the stage1_fly
directory and it does not link with stage1/init/init.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I mean is if we should do something similar in stage1_fly
(that is, explicitly refuse ipc=private
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... could it be left for the follow-up issue #3789 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
I tested the coreos flavor with these three patches on arm64:
I found I needed theses changes to the `systemd.manifest': https://gist.github.com/glevand/2e9e92568e3b5fa29f431c475b984b70 Also, it would be good to keep the manifest files sorted. I made a small script I planed to submit to do that here: https://github.com/glevand/coreos--rkt/blob/for-merge-sort/scripts/sort-stage1-manifests.sh. |
Symptoms: kernel/built-in.o: In function `update_wall_time': (.text+0x5acc6): undefined reference to `____ilog2_NaN' This commit includes the workaround patch from the upstream Linux kernel. Ideally, we would switch to a newer Linux version for the kvm flavor. But this is quicker this way. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=474c90156c8dcc2fa815e6716cc9394d7930cb9c
systemd-v234 is already released. However I am only upgrading from v231 to v233 to avoid a regression bug when the 4 year old v204 is used on the host. v204 is still used on SemaphoreCI. The symptoms were: > Press ^] three times within 1s to kill container. > Failed to allocate scope: No such method 'StartTransientUnit' This is due to a change in systemd-nspawn-v234: see: https://github.com/systemd/systemd/pull/6261/files#diff-a8646f412b928a4679f2ec702b55c12bR338 It now calls 'StartTransientUnit' synchronously. Before, it was called asynchronously without checking for errors. This 'StartTransientUnit' method was added in systemd-v205 (via systemd/systemd@c2756a684011) so it is not yet available in SemaphoreCI.
systemd-v233 introduced a change in journald: the logs will not be written in /var/log/journal until it is asked to. This is done by the new unit systemd-journal-flush.service but rkt didn't include that unit. rkt uses the systemd-nspawn option '--link-journal=try-guest', which means the logs from the pod's /var/log/journal directory will be bind-mounted to the host. So it is important to use the correct directory to have "journalctl -M" working properly. See: systemd/systemd@f78273c This patch adds this new unit systemd-journal-flush.service in stage1, both on the host flavor (taking the unit from the host if rkt detects systemd >= v233) and on the default flavor (taking the unit from the coreos image).
systemd implicitely set MountAPIVFS to true whenever we have ProtectKernelTunables=true. It was causing a regression on test TestVolumeSysfs with systemd >= v233 Symptoms: > rkt_tests.go:410: didn't receive expected output "/sys/class: mode: Dcrw-rw-rw-": image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.28.1+gitc779c926 > > [ 4385.848746] rkt-inspect[5]: /sys/class: mode: drwxr-xr-x
rkt normally creates a new IPC namespace for the pod. In order to stay in the host IPC namespace, a new option is added: --ipc=[auto|private|parent]
@glevand thanks for testing! I updated the PR with your changes for arm64 and used your script to sort the manifest files. |
When testing, TestAttachSmoke failed a couple of times but it is unrelated to this PR since it also failed on master. I filed a new bug to track this: #3793 I am triggering the CI again... |
@glevand Thanks a lot for testing! LGTM |
rkt normally creates a new IPC namespace for the pod. In order to stay in the host IPC namespace, a new option is added:
This can be tested with:
We can see that the ipc namespace id is the same as the host when using
--ipc=parent
but different otherwise.In the kvm flavor, using the host IPC namespace is not allowed and rkt will error out.
The implementation relies on the systemd-nspawn new environment variable
SYSTEMD_NSPAWN_SHARE_NS_IPC=true
added in systemd v232, see systemd/systemd#4023.Since the default "coreos" stage1 currently uses systemd v231, I needed to upgrade to a more recent systemd version, following the "Update coreos flavor stage1" documentation.
Although systemd-v234 is already released, I am only upgrading from v231 to v233 to avoid a regression bug when the 4 year old v204 is used on the host. v204 is still used on SemaphoreCI. The symptoms were:
This is due to a change in systemd-nspawn-v234: see: https://github.com/systemd/systemd/pull/6261/files#diff-a8646f412b928a4679f2ec702b55c12bR338. With this changes, it now calls 'StartTransientUnit' synchronously. Before, it was called asynchronously without checking for errors. This 'StartTransientUnit' method was added in systemd-v205 (via systemd/systemd@c2756a684011) so it is not yet available in SemaphoreCI.
I contacted SemaphoreCI and they hopefully will have a new testing platform with Ubuntu 16 in the next few weeks. Once that platform is available, we could upgrade stage1 again without interruption in the tests. I don't think we care about support for systemd v204 if not for the tests.
This PR also includes a regression fix that was caused by upgrading systemd in stage1. We ensure that the new systemd option
MountAPIVFS
stays disabled by disablingProtectKernelTunables
.Finally, it includes a compilation fix in the kvm flavor with Linux < 4.11 and with gcc >= 7.
Fixes #3291
/cc @bcg62 @iaguis @lucab
TODO