-
Notifications
You must be signed in to change notification settings - Fork 880
stage0/status: fix failure when systemd never runs in stage1 #3713
Conversation
Can one of the admins verify this patch? |
ok to test |
The CI failures seems unrelated, some flakiness on a TTY test. |
@fabiokung I'll review this after today release, but you could please format your commit titles so that they include an area/component prefix? |
613471f
to
f7b4ad8
Compare
@lucab done. |
|
||
stdout.Printf("pid=%d\nexited=%t", pid, (state == pkgPod.Exited || state == pkgPod.ExitedGarbage)) | ||
if pid, err := p.Pid(); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change may re-introduce the race documented in the comment below, which was previously addressed by the goroutine+timeout. I would suggest to special-case the running case, block on it until pid exists or we hit the timeout, and then proceed with your logic.
I understand what you are experiencing but I think the current PR is a bit too aggressive and may bring back the previous racing issue. |
On the other hand, this completely skips printing pid in the error and racing case, so it may actually be ok and push the retrying logic to the consumer side. I'm not sure what is the best behavior, but this approach may be more coherent. |
Exactly my thoughts. There is already The sleep code is also racy, nothing guarantees that the pid file will be written in 1s. I really dislike that approach. |
@fabiokung I think I can agree with that. @s-urbaniak @squeed any second opinion on this discussion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree with this change. Given we already have two separate --wait-...
instructions this implicit wait does not fit into the semantics.
It is not guaranteed that a rkt run
invocation "happened-before" a subsequent rkt status
invocation which the current code tries to overcome. The reality is that the user should retry rkt status
after invoking rkt run
himself.
The only nit I'd have is that we should make the above fact more clear in the documentation of rkt status
.
f7b4ad8
to
874227b
Compare
@s-urbaniak @lucab I added some bits to the doc (on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@fabiokung it looks like the CI was flaking a bit at the time this PR was last pushed. Do you mind rebasing once more on top of current master? It should be ready to go then. |
Will do. |
A pid file never gets written if systemd never gets to run in stage1. This can happen if the image had a bad command, i.e.: not in $PATH. In that case, rkt status will constantly error with: status: unable to print status: unable to get PID for pod ... Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
874227b
to
35dae38
Compare
@lucab all green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A pid file never gets written if systemd never gets to run in stage1. This can happen if the image had a bad command, i.e.: not in $PATH.
Steps to reproduce:
Signed-off-by: Fabio Kung fabio.kung@gmail.com