8000 main: fork into worker and supervisor; perform clean up. by eugeneia · Pull Request #930 · snabbco/snabb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

main: fork into worker and supervisor; perform clean up. #930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 13, 2016

Conversation

eugeneia
Copy link
Member
@eugeneia eugeneia commented Jun 3, 2016

This forks in main, leaving a “worker“ and a “supervisor” process. The worker sets a flag that causes it to be killed when the supervisor quits, and resumes as the regular snabb process. The supervisor sets up a signalfd(2) to be notified when the worker quits, and cleans up afterwards. See #509.

local signals, err = S.util.signalfd_read(signalfd)
assert(signals, tostring(err))
if not signals[1].chld then S.kill(worker_pid, "kill") end
shutdown(S.getpid())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a mystery to me why this works. E.g. I had assumed I needed to pass worker_pid to shutdown since the worker goes on to create the /var/run directory based on S.getpid(), but apparently when the worker calls S.getpid() it returns the parent pid? confused

@kbara kbara self-assigned this Jun 13, 2016
@lukego
Copy link
Member
lukego commented Jun 14, 2016

I am excited about managing processes directly with Snabb. There are a lot of nice convenience / safety / error-recovery / administrative features we could implement. I also really want to be able to store more information in /var/run/snabb directories (e.g. 64MB timeline log files) and start archiving these in a disciplined way (e.g. via a hook or config option to store them in a well-defined location with compression).

Thoughts:

  • Major change: the /var/run/snabb directories will not exist after the process terminates, right? This is a user-visible change and it seems worth explaining why this makes sense in the PR text.
  • Major change: The snabb gc command is gone. This also seems worth drawing attention to in the PR text and being clear about why it is obsolete.
  • Concern: How can we be confident that /var/run/snabb directories won't be left around without cleanup? For example, is this design robust for unexpected-but-inevitable actions like pressing ^C in the terminal or running kill -9 on the pid that is using the most CPU?

Generally I would find this change easier to digest with more explanation. Maybe some commentary in the source code explaining the overall intention and how that is achieved with these particular system calls?

@eugeneia
Copy link
Member Author
eugeneia commented Jun 14, 2016

Good points!

Major change: the /var/run/snabb directories will not exist after the process terminates, right? This is a user-visible change and it seems worth explaining why this makes sense in the PR text.

Yes, this PR makes snabb remove its own runtime directory. This makes sense because as of now the lab servers are littered with stale runtime directories. Ideally there would be a way to pass a --debug or --backup option to keep the files if that is desired. This would require additional command line option parsing before forking (NYI).

Major change: The snabb gc command is gone. This also seems worth drawing attention to in the PR text and being clear about why it is obsolete.

snabb gc is obsoleted by processes cleaning up after themselves. Also: snabb gc never really worked, it failed to remove runtime directories when other processes where reusing PIDs.

Concern: How can we be confident that /var/run/snabb directories won't be left around without cleanup? For example, is this design robust for unexpected-but-inevitable actions like pressing ^C in the terminal or running kill -9 on the pid that is using the most CPU?

If you kill -9 (SIGKILL) the supervisor then you are out of luck (no way to react to that signal): the child will be orphaned. In all other cases this is robust, e.g. sending SIGKILL to the child (the process that will actually use CPU cycles) will lead to clean shutdown of the supervisor, as well as sending SIGTERM or SIGINT to the supervisor or child.

What is currently not covered is handling of SIGCONT to the child, e.g. it is not supported to stop (SIGSTOP) and continue the child. I felt this was not something we want to do, but it might be sensible to have a good default behavior for this case.

@eugeneia
Copy link
Member Author

If you kill -9 (SIGKILL) the supervisor then you are out of luck (no way to react to that signal): the child will be orphaned.

Actually not true, the child will be killed by SIGHUP (S.prctl("set_pdeathsig", "hup")), but the runtime directory will not be deleted.

8000

@eugeneia
Copy link
Member Author
eugeneia commented Jun 14, 2016

What is currently not covered is handling of SIGCONT to the child, e.g. it is not supported to stop (SIGSTOP) and continue the child. I felt this was not something we want to do, but it might be sensible to have a good default behavior for this case.

Also wrong, SIGSTOP/SIGCONT were handled just fine... Don't really know how though?

@teknico
Copy link
Contributor
teknico commented Jun 14, 2016

Ideally there would be a way to pass a --debug or --backup option to keep the files if that is desired.
This would require additional command line option parsing before forking (NYI).

An option to keep the data directory around upon termination would be valuable.

@eugeneia
Copy link
Member Author

Added support for the worker to be stopped/continued, so that caveat is gone.

An option to keep the data directory around upon termination would be valuable.

I have made _G.developer_debug configurable via environment (SNABB_DEBUG) and main no longer deletes the run-time directory when _G.developer_debug is true.

@teknico
Copy link
Contributor
teknico commented Jun 15, 2016

I have made _G.developer_debug configurable via environment (SNABB_DEBUG)
and main no longer deletes the run-time directory when _G.developer_debug is true.

Great, thank you. Are env vars documented anywhere?

@eugeneia
Copy link
Member Author

@teknico Good point, currently no. I think this should be documented along with the sub-program mechanism in place of this outdated section.

@lukego
Copy link
Member
lukego commented Jun 16, 2016

Is it safe to say that we would benefit from having well-defined configuration options and convenient methods of supplying them e.g. config files and command line?

and further that such config items would best be described with a YANG schema for the Snabb engine and parameters like busy-wait, developer-debug, etc?

@kbara
Copy link
Contributor
kbara commented Jun 22, 2016

I'd like to retain snabb gc. This cleanup model isn't robust in the face of situations like losing power or the supervisor process getting hit with a kill -9, OOM killer, etc.

@eugeneia
Copy link
Member Author

@kbara snabb gc doesn't work (never did) because it fails when a (new, possibly non-snabb) process with a matching pid exists. It could maybe find out the name of that process and disregard it if it is not “snabb” but for now rm -r /var/run/snabb is more reliable.

@kbara
Copy link
Contributor
kbara commented Jun 27, 2016

Snabb gc wasn't ideal for those reasons, true. After talking to my team, I'm ok with removing it. Thanks.

kbara pushed a commit to kbara/snabb that referenced this pull request Jun 27, 2016
@kbara kbara added the merged label Jun 27, 2016
@kbara kbara mentioned this pull request Jun 28, 2016
@eugeneia eugeneia merged commit 127d21f into snabbco:master Jul 13, 2016
@wingo wingo mentioned this pull request Oct 31, 2016
7 tasks
dpino added a commit to dpino/snabb that referenced this pull request Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0