Rethinking race-free process signaling
Rethinking race-free process signaling
Posted Apr 5, 2019 20:24 UTC (Fri) by jkowalski (guest, #131304)In reply to: Rethinking race-free process signaling by NYKevin
Parent article: Rethinking race-free process signaling
You don't do anything like this really, you can actually set the process cloned using CLONE_PIDFD to deliver no signal to the parent on termination (by setting it to null), which means everyone having a copy of the same file descriptor can poll on it to know when it dies. That also means the process cannot be waited upon, which is what you want when you use this API from inside libraries.
It happens as a side effect of using descriptors, since the parent will get a readable instance (however it is still not clear to me this is something the patch author will support, here's to hoping that), it can pass a copy of its descriptor to others, then perhaps close it, and allow the other process to essentially poll on it, know when the process is gone, and get back the exit status.
The same could be done using references to external processes using pidfd_open, you pass a flag that requests the kernel to give you a readable instance, and if you're a real_parent or parent (as in ptrace terms), you get one. It also has the nice property of the mount namespace not being the one where you acquire pidfds, but limited to the scope of your PID namespace (which I think is a very important point that has been overlooked thus far).
Scoping the opening and adding a system call with extendable flags would allow you to lift checks in pidfd_send_signal to signall across namespaces, exactly because without userspace doing it on its own, a process can only open a pidfd to something it can address inside its PID namespace. It is otherwise a layering violation (literally) that I can use the mount namespace to circumvent this, if iy is opened up in the future. I also object to being able to peek into process state through such descriptors, that capability should be orthogonal, not bound to the pidfd, even if I have the authority to read through. *Therefore, using /proc dir fds comes with a big downside to all of this.*
The nice delegation model allows you to extend pidfd_open with, say, PRIV_KILL that allows you to bind CAP_KILL privs to a pidfd, assuming you have CAP_KILL in the owning userns, which would allow you to pass this pidfd and let the receiver signal across namespace boundaries without restrictions (it has to be opt-in as this is not what you want by default).
You could add a similar flag to bind ptrace privs of the opener, though that is a lot more involved and I have not mentioned it anywhere thus far.
Thus, you can think of the pidfd as a stable reference to the process, and such flags depending on the authority of the opener (if parent, readable, if CAP_KILL, killable, if CAP_SYS_PTRACE, ptraceable, etc) allow you to open up methods to operate on it, and since they are bound to the descriptor, it is limited in scope to the said process only. Such intent cannot be expressed when using /proc. It also does not play well with hidepid=2 (invisible dirs mean you cannot take a reference), and hidepid=1 (dirs you cannot enter mean you cannot reference threads you can see).
The whole reparent on fd-passing however is broken. There can be multiple processes keeping it open at a time.
Posted Apr 7, 2019 10:42 UTC (Sun)
by meuh (guest, #22042)
[Link]
Posted Apr 7, 2019 16:12 UTC (Sun)
by luto (subscriber, #39314)
[Link] (1 responses)
Posted Apr 7, 2019 19:31 UTC (Sun)
by jkowalski (guest, #131304)
[Link]
You could also make it available to things with NNP set, and when cloning children, the PRIV_KILL, then pass it around, send signals. These all checks happen when the flag is used during pidfd_open or clonefd or whatever.
Do you see other cases where it could be a problem?
Rethinking race-free process signaling
The whole reparent on fd-passing however is broken. There can be multiple processes keeping it open at a time.
I agree: through fork(), exec(), and SCM_RIGHTS, file descriptor can be duplicated in many processes.
Rethinking race-free process signaling
Rethinking race-free process signaling