Rethinking race-free process signaling

Posted Apr 5, 2019 1:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523)
In reply to: Rethinking race-free process signaling by Fowl
Parent article: Rethinking race-free process signaling

Might be a DDoS vector. Even with the low 1024 file descriptor limit just 32 processes can eat up the whole default PID namespace.

Rethinking race-free process signaling

Posted Apr 5, 2019 16:40 UTC (Fri) by smurf (subscriber, #17840) [Link] (8 responses)

How is that different from just accumulating 32700 zombie child processes, which you can do right now anyway?

Rethinking race-free process signaling

Posted Apr 7, 2019 1:09 UTC (Sun) by stephen.pollei (subscriber, #125364) [Link] (7 responses)

I wonder why distributions don't set limits that work for at least 99% of people.
70% of people would be ok with max login of 3
95% of people would be ok with max login of 8
99% of people would be ok with max login of 13
70% of people should be ok with a max of 89 processes per login
95% of people should be ok with max of 144 processes per login
99% of people should be ok with max of 377 processes per login

4901 processes ought to be enough for most people.
If you raise pid_max to 99999 on small system and set these limits does it strongly reduce the issues?

Lager systems might need to increase pid_max further, and have bigger rlimits.

In either case it seems like it can mostly be solved with saner configuration.

Rethinking race-free process signaling

Posted Apr 8, 2019 1:21 UTC (Mon) by dvdeug (guest, #10998) [Link] (6 responses)

Those are among the most annoying limits. If 99% of the people will never hit them, then 1% will. Because they are distribution-set, they will be hard to figure out what's wrong unless there's the clearest error messages. In the case of a process limit, I bet very few programs handle that properly, and forcing a bunch of users to try and figure out why their programs are crashing, especially because 4901 is a "random" number; if I hit 32,000 PIDs before a crash, I'd realize the problem much faster than 4901.

Arbitrary limits are a pain in the ass, and increasing the number of them and the odds you're going to hit them is not user-friendly.

Rethinking race-free process signaling

Posted Apr 8, 2019 1:59 UTC (Mon) by ebiederm (subscriber, #35028) [Link] (5 responses)

My rule of thumb are limits like that should only be low enough to catch buggy programs,
not properly running programs that consume a few more resources than normal.

There is the other issue with more pids that if they get too large they get ungainly and difficult
to use. Which argues against making 4 million the default. But otherwise something like 4 million
would probably be a fine default for a limit like that.

Rethinking race-free process signaling

Posted Apr 8, 2019 5:54 UTC (Mon) by eru (subscriber, #2753) [Link] (4 responses)

If the pid limit is 4 million, problems due to wraparound are rare, but they may occasionally happen, causing hard to trace bugs. Same with MAXINT. But if pid were a 64-bit number, and the limit the maximum of that, wraparound would never happen, so software could safely assume that pids are always unique.

Rethinking race-free process signaling

Posted Apr 8, 2019 7:27 UTC (Mon) by rbanffy (guest, #103898) [Link] (3 responses)

> But if pid were a 64-bit number, and the limit the maximum of that, wraparound would never happen

Cue to a meeting room with a dozen people dressed like characters from Things to Come trying to figure out why The Google stopped answering their questions.

Fine. It'll be a looooong time.

Rethinking race-free process signaling

Posted Apr 10, 2019 5:15 UTC (Wed) by eru (subscriber, #2753) [Link] (2 responses)

Before posting, I calculated that if the kernel creates one process every microsecond, it takes about 290 000 years for the 64-bit signed maxint to be reached. I don't think any system will have that kind of uptime.

Rethinking race-free process signaling

Posted Apr 10, 2019 18:07 UTC (Wed) by rbanffy (guest, #103898) [Link] (1 responses)

You have to let your imagination fly higher, eru. 290,000 years is a blink of an eye in cosmic terms and it's entirely possible a vast distributed and multiply redundant computer doing a Very Important Job for its users would both live that long (and be that reliable) it'd reach that limit well after everyone who designed it (and who would think fashion went too far this time) are dead or, at least, have moved on to more interesting pursuits. Also, if it's a thousand times faster, it'll get there much faster too.

Rethinking race-free process signaling

Posted Apr 12, 2019 6:19 UTC (Fri) by massimiliano (subscriber, #3048) [Link]

You have to let your imagination fly higher, eru. 290,000 years is a blink of an eye in cosmic terms...

If I let my imagination fly just a bit higher, in such a system this issue will be solved just like the current 2038 problem.

At some point the system will do a live migration to a 128 bit architecture, with conversion of the persistent state to appropriately sized values, and the "actor IDs" in the distributed systems will get a bit of fresh air with a wraparound time of 2^32*290k years, whatever that means...