[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

Preventing stack guard-page hopping

By Jonathan Corbet
June 19, 2017
Normally, the -rc6 kernel testing release is not the place where one would expect to find a 900-line memory-management change. As it happens, though, such a change was quietly merged immediately prior to the 4.12-rc6 release; indeed, it may have been the real reason behind 4.12-rc6 coming out some hours later than would have been expected. This change is important, though, in that it addresses a newly publicized security threat that, it seems, is being actively exploited.

A correction: Ben Hutchings pointed out that the Qualys analysis is based on the "main thread" stack, not any other thread stacks which, with glibc at least, are not allowed to grow. Apologies for the confusion.
The stack area in a running process is, on most architectures, placed at a relatively high virtual address; it grows downward as the process's stack needs increase. A virtual-memory region that automatically grows as a result of page faults brings some inherent risks; in particular, it must be prevented from growing into another memory region placed below it. In a single-threaded process, the address space reserved for the stack can be large and difficult to overflow. Multi-threaded processes contain multiple stacks, though; those stacks are smaller and are likely to be placed between other virtual-memory areas of interest. An accidental overflow could corrupt the area located below a stack; a deliberate overflow, if it can be arranged, could be used to compromise the system.

The kernel has long placed a guard page — a page that is inaccessible to the owning process — below each stack area. (Actually, it hasn't been all that long; the guard page was added in 2010). A process that wanders off the bottom of a stack into the guard page will be rewarded with a segmentation-fault signal, which is likely to bring about the process's untimely end. The world has generally assumed that the guard page is sufficient to protect against stack overflows but, it seems, the world was mistaken.

On June 19, Qualys disclosed a set of vulnerabilities that make it clear that a single guard page is not sufficient to protect against stack overflow attacks. These vulnerabilities have been dubbed "Stack Clash"; the associated domain name, logo, and line of designer underwear would appear to not have been put in place yet. This problem has clearly been discussed in private channels for a while, since a number of distributors were immediately ready with kernel updates to mitigate the issue.

The fundamental problem with the guard page is that it is too small. There are a number of ways in which the stack can be expanded by more than one page at a time. These include places in the GNU C Library that make large alloca() calls and programs with large variable-length arrays or other large on-stack data structures. It turns out to be relatively easy for an attacker to cause a program to generate stack addresses that hop over the guard page, stomping on whatever memory is placed below the stack. The proof-of-concept attacks posted by Qualys are all local code-execution exploits, but it seems foolhardy to assume that there is no vector by which the problem could be exploited remotely.

The fix merged for 4.12 came from Hugh Dickins, with credit to Oleg Nesterov and Michal Hocko. It takes a simple, arguably brute-force approach to the problem: the 4KB guard page is turned into a 1MB guard region on any automatically growing virtual memory area. As the patch changelog notes: "It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot." The size of the guard area is not configurable at run time (that can wait until somebody demonstrates a need for it), but it can be changed at boot time with the stack_guard_gap command-line parameter.

The 1MB guard region should indeed be difficult to jump over. It is (or should be) a rare program that attempts to allocate that much memory on the stack, and other limits (such as the limit on command-line length) should make it difficult to trick a program into making such an allocation. On most 64-bit systems, it should be possible to make the guard region quite a bit larger if the administrator worries that 1MB is not enough. Doubtless there are attackers who are feverishly working on ways to hop over those regions but, for a while at least, they may well conclude that there are easier ways to attack any given system.

The real problem, of course, is that a stack pointer can be abused to access memory that is not the stack. Someday, perhaps, we'll all have memory-type bits in pointers that will enable the hardware to detect and block such attacks. For now, though, we all need to be updating our systems to raise the bar for a successful compromise. Distributors have updates now, and the fix is in the queue for the next round of stable kernel updates due on June 21.

Index entries for this article
KernelSecurity/Vulnerabilities
SecurityLinux kernel


to post comments

Preventing stack guard-page hopping

Posted Jun 19, 2017 19:55 UTC (Mon) by tux3 (subscriber, #101245) [Link] (34 responses)

Wouldn't it be fair to say that increasing the guard page's size is only a mitigation to a fundamental issue?
If the only full fix is currently recompiling the world with something expensive like -fstack-check or the various sanitizers, that is awfully worrying.

I wouldn't be surprised to learn that there is a whole lot of software out there at various level of openness that will happily allocate a handful of MBs on demand, and that will probably never be recompiled with those options.

Preventing stack guard-page hopping

Posted Jun 19, 2017 20:13 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (17 responses)

A better change is to modify alloca() in libc to touch at least one byte on each allocated page. But it'll take some time to percolate through the userspace software.

Preventing stack guard-page hopping

Posted Jun 19, 2017 20:26 UTC (Mon) by cpitrat (subscriber, #116459) [Link] (15 responses)

This would protect remote attacks but wouldn't prevent an attacker to write his own stack allocation for local privilege escalation​.

I'm surprised a 900 lines patch is only about increasing the size of the page guard. Isn't there more in it ?

Preventing stack guard-page hopping

Posted Jun 19, 2017 21:09 UTC (Mon) by roc (subscriber, #30627) [Link] (2 responses)

> This would protect remote attacks but wouldn't prevent an attacker to write his own stack allocation for local privilege escalation​.

The local privilege escalation threat assumes that the high-privilege C code is trusted, and then exploits it.

If the attacker can write high-privilege C code, you've already lost.

Preventing stack guard-page hopping

Posted Jun 20, 2017 9:43 UTC (Tue) by moltonel (guest, #45207) [Link] (1 responses)

The libc isn't high-privilege/trusted, and any local attacker can use his own vulnerable libc-equivalent routines instead. So a protection at libc-level would only protect against remote attacks, where the attacker has to contend with the local libc or use a different vulnerability to bring his own libc-equivalent.

Preventing stack guard-page hopping

Posted Jun 20, 2017 10:13 UTC (Tue) by matthias (subscriber, #94967) [Link]

There are certainly some suid binaries linking against libc. Thus the libc is high-privilege code. The local attacker can only use the code/libraries linked into suid binaries.

If the attacker has the ability to run his own code with privileges, everything is already lost. No need for an exploit.

Preventing stack guard-page hopping

Posted Jun 20, 2017 6:54 UTC (Tue) by vbabka (subscriber, #91706) [Link]

> I'm surprised a 900 lines patch is only about increasing the size of the page guard. Isn't there more in it ?

Well, it's 900 lines of .patch file text, but the diffstat is around 300 added+deleted, so not that much.

It's large because, as explained in the commit log, the old 1 stack guard page code simply extended to N pages made many accounting issues visible, because the guard page(s) were part of the VMA's [start, end] addresses. The patch deletes that approach and replaces it so that the gap is always between VMA boundaries. That means adjusting the code to check allowed VMA placement/enlargement so that it maintains the gap if the next/prev VMA is a stack one.

Preventing stack guard-page hopping

Posted Jun 20, 2017 9:55 UTC (Tue) by moltonel (guest, #45207) [Link] (9 responses)

> A better change is to modify alloca() in libc to touch at least one byte on each allocated page.

That's going to mess up with the performance profile (allocating pages earlyer than expected) and decrease total performance in case the app wasn't going to touch those pages at all.

> This would protect remote attacks but wouldn't prevent an attacker to write his own stack allocation for local privilege escalation​.

Assuming we accept the performance hit, can we use the same technique in the kernel ? Disable overcommit ? Or is the kernel not aware of what the app is considering its stack space ?

Preventing stack guard-page hopping

Posted Jun 20, 2017 10:39 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

> That's going to mess up with the performance profile (allocating pages earlyer than expected) and decrease total performance in case the app wasn't going to touch those pages at all.

It's... not common for applications to allocate page-size structures on the heap that are not optimized out and then not use them for anything. I suppose functions that have big local variables and then do early exit based only on the parameters, but in that case the compiler can adapt to adjust the stack only after the early exits, if this is really significant (which I very much doubt).

Preventing stack guard-page hopping

Posted Jun 20, 2017 15:15 UTC (Tue) by zblaxell (subscriber, #26385) [Link] (6 responses)

> It's... not common for applications to allocate page-size structures on the [stack] that are not optimized out and then not use them for anything.

In one project I found an innocuous-looking state structure that turned out to have ~5MB of unused bytes in the middle, buried under a pyramid of macro expansion, arrays, nested members, and unreadable coding style. The code did use all the other members in the struct, on both sides of the hole.

Also it's fairly common in userland to do IO to a buffer on the stack, where the buffer is huge and the IO is tiny.

Preventing stack guard-page hopping

Posted Jun 20, 2017 16:30 UTC (Tue) by gutschke (subscriber, #27910) [Link] (5 responses)

Most programs that I see have relatively manageable buffers on the stack. Often, it is just a handful of bytes for state machines that try to read things line by line. And at other times, it might be as much as maybe a handful of kilobytes. That usually faults in no more than an extra page or two. And that's well within the normal variation of stack depth. So, performance impact should be zero.

Do you really commonly see programs allocate many hundred of kilobytes if not many megabytes on the stack? That's not a pattern that I have encountered frequently. Buffers this large are more commonly allocated on the heap.

I am not saying it doesn't happen. Anything stupid that you can think of, somebody else probably thought of before. But common? Hopefully not.

Preventing stack guard-page hopping

Posted Jun 21, 2017 11:14 UTC (Wed) by PaXTeam (guest, #24616) [Link]

besides C and buffers/alloca there's also C++ and classes instantiated as local variables, so code can inadvertantly require larger than usual stack frames that way.

Preventing stack guard-page hopping

Posted Jun 21, 2017 11:24 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

Besides, I never said it didn't happen -- just that most functions don't do it and those that do take so long that a bit of extra page touching is irrelevant (doing I/O into big buffers was what I was thinking of: any function that does I/O is going to have the I/O dominate its performance profile.)

Preventing stack guard-page hopping

Posted Jun 21, 2017 14:57 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (1 responses)

If the program uses the huge thing it allocated on the stack then it's going to fault in all the pages anyway, and that's a pretty big hit the first time around, much larger than the cost of the probe.

On the other hand, if a function is being called in a loop then the probes keep happening over and over even though the page faults don't, so the probing gets expensive.

For programs that handle toxic data there might not be a quick and easy solution--they might just have to suck up the cost of doing probes all the time, or use other techniques (e.g. constant-stack algorithm proofs, coding standards forbidding alloca() and sparse structures, etc.) to make sure stack overflows don't happen.

Since changes to alloca require recompiling the program, it's up to individual applications to make the performance/security tradeoff anyway. Isn't there already a compiler option to do this?

Preventing stack guard-page hopping

Posted Jun 22, 2017 22:37 UTC (Thu) by mikemol (guest, #83507) [Link]

They could also tune these performance/security trade-offs on a routine-by-routine basis, by stuffing sensitive routines in their own compilation unit.

LTO will need to be careful to let these considerations bubble up to the final binary, however.

Preventing stack guard-page hopping

Posted Oct 3, 2019 13:18 UTC (Thu) by ychevali (guest, #134753) [Link]

when you allocate a large array on the stack, initialization (from index 0) starts on the far end and thus jumps over all the pages in between. Case in point: for some programs, it makes a lot of sense to start by computing an array of ``small'' prime numbers (say up to 100,000 or 1,000,000) by Eratosthene sieve.

Preventing stack guard-page hopping

Posted Jun 26, 2017 9:25 UTC (Mon) by anton (subscriber, #25547) [Link]

A better change is to modify alloca() in libc to touch at least one byte on each allocated page.
That's going to mess up with the performance profile (allocating pages earlyer than expected) and decrease total performance in case the app wasn't going to touch those pages at all.
I don't think that that's a significant issue, but anyway: You just need to read the byte (the guard page is not readable, is it?). So all the not-yet-used stack pages can be the same page containing zeroes (which also means that the same cache line will be used for all these reads in a physically-tagged (i.e., normal these days) cache). Only when it is used for real, a physical page is allocated.

Preventing stack guard-page hopping

Posted Jun 26, 2017 9:09 UTC (Mon) by anton (subscriber, #25547) [Link]

This would protect remote attacks but wouldn't prevent an attacker to write his own stack allocation for local privilege escalation​.
I don't think that preventing this attack scenario prevents any halfway-competent attack. If the attacker can write his own stack allocation, he can write it to jump over guard regions of any size; actually, he can put the memory writes to the area below the stack in his otherwise-regular stack-allocation code directly. In other words: If you allow the attacker to execute his code in a setting that can escalate priviledges, you are already owned, guard page or not.

Preventing stack guard-page hopping

Posted Jun 20, 2017 14:50 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

alloca() can't be implemented as a real function, so it's only "in" glibc in the sense that the definition is in a glibc header. Further, that definition just defers to the compiler's pseudo-function __builtin_alloca(). So even rebuilding against an updated glibc isn't enough to fix this. glibc has been updated to make its own use of alloca() safer, though.

Preventing stack guard-page hopping

Posted Jun 19, 2017 21:01 UTC (Mon) by roc (subscriber, #30627) [Link] (5 responses)

-fstack-check can do a variety of things. It's often configured to insert a stack-overflow-check prologue at the start of each function, which is expensive, but all that's needed here to is to insert a one-byte write per page for functions whose stack frames require >= 4096 bytes. Does anyone have a reference to a measurement of the overhead of that approach? I couldn't find one with a quick search.

Preventing stack guard-page hopping

Posted Jun 19, 2017 21:02 UTC (Mon) by roc (subscriber, #30627) [Link]

Ah, https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01343.html explains why current -fstack-check doesn't suffice.

Preventing stack guard-page hopping

Posted Jun 19, 2017 21:45 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

Anyway, the important thing is that the overhead of inserting stack page probes should be pretty low compared to what's been measured for -fstack-check in the past.

Preventing stack guard-page hopping

Posted Jun 19, 2017 23:31 UTC (Mon) by nix (subscriber, #2304) [Link]

It's not even one byte per page. It's one byte per page for all pages but the last. (So nearly all functions need no probe at all, and the ones that do will probably be fairly slow monsters anyway.)

Preventing stack guard-page hopping

Posted Jun 20, 2017 18:54 UTC (Tue) by dd9jn (✭ supporter ✭, #4459) [Link] (1 responses)

If my memory serves me right, the OS/2 compilers inserted stack probes by default --- more than 25 years ago. I am baffled to learn that gcc doesn't default to this simple robustness feature. (Fortunately I am in the habit of avoiding alloca or possible large stack reservations.)

Preventing stack guard-page hopping

Posted Jun 22, 2017 22:02 UTC (Thu) by cesarb (subscriber, #6266) [Link]

From what I understand of what I've read on Raymond Chen's blog, Windows compilers must insert stack probes, since the stack will only grow if the guard page is hit (https://blogs.msdn.microsoft.com/oldnewthing/20060927-07/...); Linux seems to be able to grow the stack for a hit anywhere in the stack VMA. OS/2 is probably similar to Windows, so its compilers must also implement stack probing.

Preventing stack guard-page hopping

Posted Jun 19, 2017 22:31 UTC (Mon) by zblaxell (subscriber, #26385) [Link] (9 responses)

One obvious implication of having threads in a common address space and (naive) alloca() at the same time is that you can guide one thread's stack into another thread's address space no matter how far apart they are in memory. I learned this the hard way in 1998 as I was debugging a Linux program that was doing this accidentally across almost 2MB-wide stack gaps.

In userland, if alloca() wants more than a page, it can run a heaver stack-smashing check, like probing each page of the allocated area in stack-growth order, or checking some data in the heap about the current thread's stack limits. Not doing that in the kernel is perhaps understandable due to the cost, but the capability should be there for those who need it.

I've occasionally wondered what would happen if stacks were not accessible to other threads in the same process (assuming the VM context thrashing involved was magically zero cost, which probably pushes this paragraph into the realm of wishful thinking). Obviously it would break some existing programs, but it smells like bad practice in general (I see student programmers pass pointers to ephemeral variables from the caller's stack to threads all the time, with immediately disastrous results). There might be some simple heuristic (e.g. if thread A creates or joins thread B, let thread B access thread A's stack in case thread B has been given a pointer to a result A needs to store there) that's good enough for current defensible program behavior.

Preventing stack guard-page hopping

Posted Jun 19, 2017 23:26 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Obviously it would break some existing programs, but it smells like bad practice in general
A fairly common practice is to allocate some data, launch several worker threads to compute its parts and then join all the threads to get the final result. It's not uncommon for it to be allocated or have parts of on-stack data.

Preventing stack guard-page hopping

Posted Jun 20, 2017 1:40 UTC (Tue) by zblaxell (subscriber, #26385) [Link]

> launch several worker threads to compute its parts and then join all the threads to get the final result

That's pretty much how C++11 async functions work, and should be covered by the heuristic exception for "thread A creates thread B".

It wouldn't work if there was a persistent worker thread pool (i.e. the functions are executed by previously existing threads that continue to exist after the result is computed, so there is no creator/created or join relationship). It might be possible to infer data dependencies from mutex locks or higher-level objects (promise/future pairs) but maybe there's too many false positives. Or one could mark worker pool threads differently (e.g. some new pthread_attr) wrt access to other threads' stacks.

Preventing stack guard-page hopping

Posted Jun 19, 2017 23:32 UTC (Mon) by excors (subscriber, #95769) [Link]

> I've occasionally wondered what would happen if stacks were not accessible to other threads in the same process

I think that would break reasonable code like:

std::atomic_int n;
run_in_worker_threads_and_wait_for_them_all(iters, [&n] { n++; });

which passes a pointer to n (on the current thread's stack) to a bunch of worker threads (that probably weren't created by this thread).

Preventing stack guard-page hopping

Posted Jun 19, 2017 23:36 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

One obvious implication of having threads in a common address space and (naive) alloca() at the same time is that you can guide one thread's stack into another thread's address space no matter how far apart they are in memory. I learned this the hard way in 1998 as I was debugging a Linux program that was doing this accidentally across almost 2MB-wide stack gaps.
Indeed. The "Cheney on the MTA" paper describes a remarkable way of using this sort of alloca() abuse to implement a copying garbage collector using only the C stack: you write your C program in continuation-passing style, with GCed data in functions that never return but only call on to others that do the same, and then when you want to do a GC your collector copies the relevant data into a new "stack" on the heap and alloca()s to it (finding the right alloca() value via trivial pointer arithmetic from a variable on the local stack frame), then free()s the old stack.

I alternate between thinking this scheme is wonderful and should be widely emulated, and thinking it is insane and its authors should be punished by being forced to debug programs written this way (but then, they already have been).

Preventing stack guard-page hopping

Posted Jun 20, 2017 15:08 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (2 responses)

> you write your C program in continuation-passing style, with GCed data in functions that never return but only call on to others that do the same, and then when you want to do a GC your collector copies the relevant data into a new "stack" on the heap and alloca()s to it

That is... diabolical. Genius, but diabolical. A similar concept employed by Chicken Scheme is to start out the same way, using CPS and allocating on the C stack, but then after copying the live data to the heap just perform a longjmp() to unwind back to a trampoline function at the top of the original stack. That seems slightly saner than abusing alloca() to set the stack pointer.

Preventing stack guard-page hopping

Posted Jun 21, 2017 11:26 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Chicken Scheme actually uses the same scheme (derived directly from the paper). :) I guess they shifted from alloca() to longjmp() at some point, probably some compatibility problem which would make my head melt to think about.

Preventing stack guard-page hopping

Posted Jun 21, 2017 14:41 UTC (Wed) by zblaxell (subscriber, #26385) [Link]

> they shifted from alloca() to longjmp() at some point, probably some compatibility problem...

...like some eager tools maintainer implementing alloca() parameter sanity checks, perhaps? ;)

Preventing stack guard-page hopping

Posted Jun 20, 2017 13:10 UTC (Tue) by niner (subscriber, #26151) [Link] (1 responses)

Garbage collectors in multi threaded programs would be a use case for a thread accessing another thread's stack to look for pointers.

Preventing stack guard-page hopping

Posted Jun 20, 2017 16:20 UTC (Tue) by zblaxell (subscriber, #26385) [Link]

I keep trying to forget that C garbage collectors exist (or at least the ambiguous ones). If there's a single thread doing GC it could use an "accesses all thread stacks" pthread attribute.

It seems to me there's more fundamental problems to be solved before this one. How does a garbage collecting thread handle ordinary race conditions when accessing data on other thread stacks? Invasive locking? Indirect references through forwarding objects?

I'm not sure I like the idea of solving that case, largely because the difference between "frees approximately the right memory" and "frees exactly the right memory" can be pretty huge when there are adversaries throwing pointy things into your stack and heap.

Preventing stack guard-page hopping

Posted Jun 19, 2017 20:17 UTC (Mon) by PaXTeam (guest, #24616) [Link]

here's a bit more about the history of this problem: https://grsecurity.net/an_ancient_kernel_hole_is_not_clos...

Preventing stack guard-page hopping in GCC

Posted Jun 19, 2017 20:47 UTC (Mon) by mjw (subscriber, #16740) [Link] (1 responses)

Here is a proposal by Jeff Law for stack/heap collision mitigation in GCC (-fstack-check improvements): https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01343.html

Preventing stack guard-page hopping in GCC

Posted Jun 20, 2017 6:04 UTC (Tue) by cpitrat (subscriber, #116459) [Link]

Thanks for the link, very interesting reading.

Preventing stack guard-page hopping

Posted Jun 19, 2017 23:40 UTC (Mon) by jengelh (subscriber, #33263) [Link] (3 responses)

>Someday, perhaps, we'll all have memory-type bits in pointers that will enable the hardware to detect and block such attacks.

Maybe the i286's segmented memory model wasn't all that useless! Set %cs, %ds and %ss to non-overlapping regions of memory, and if %sp overflows, it will just wrap back onto the same stack you already had, not touching other regions or threads. The proposed memory-type bits are implicit and sort of given by way of the selectors.

So… let's extend that to 64 bits? The segment registers appear to already be 64 bit in LM (they were not renamed like ax->eax->rax was). One extra thing is needed, an MSR, or TSS field/CR reg, to configure a modulus for %rsp, so that it wraps at a set boundary (e.g. 21 bit) on ADD/SUB/PUSH/POP instructions.

Preventing stack guard-page hopping

Posted Jun 20, 2017 5:18 UTC (Tue) by eru (subscriber, #2753) [Link]

Maybe the i286's segmented memory model wasn't all that useless! Set %cs, %ds and %ss to non-overlapping regions of memory, and if %sp overflows, it will just wrap back onto the same stack you already had, not touching other regions or threads.

Actually the stack overflow probably traps, because it goes out of the size allocated for the stack segment. At least if you use it on 386/486/Pentium, where wrapping completely around is less likely. This has been used in a proprietary OS I have worked with. Almost everything there is in separately allocated, 386- supported segments (probably one of the very few OS'es to use the segmentation features as Intel designers intended!), but all the other pain caused by segmented memory probably makes this not worthwhile.

Preventing stack guard-page hopping

Posted Jun 20, 2017 8:47 UTC (Tue) by jikos (subscriber, #43140) [Link]

> Set %cs, %ds and %ss to non-overlapping regions of memory, and if %sp overflows, it will just wrap back onto the same stack you already had, not touching other regions or threads

That would not really make the situation any better, as that'd effectively allow the attacker to overflow the stack using the same attack vector and manipulate contents of the stack, turning this into a rather boring and easy to exploit stack overflow.

Fortunately that's not how x86 behaves with respect to segment limits; as long as the address goes over the limit, it faults.

Preventing stack guard-page hopping

Posted Jul 6, 2017 3:37 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

My understanding of the Mill architectures is that they handle the stack themselves. There are no stack pointers at all either (I suspect "large" arrays move out to elsewhere). It's not something really feasible for the existing arches, but it seems much better than going back to segmentation to me.

Maximum number of threads

Posted Jun 20, 2017 3:50 UTC (Tue) by ikm (guest, #493) [Link] (5 responses)

Wouldn't this limit the maximum number of threads which can be created in a 32-bit address space to something smaller than 3072? Is it unreasonable to ask for more threads than that, given their stacks are all small?

Maximum number of threads

Posted Jun 20, 2017 5:34 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

32-bit glibc gives each thread a whopping 2MB stack by default (per `man pthread_create`), so you'll run out of address space long before then.

Maximum number of threads

Posted Jun 20, 2017 6:39 UTC (Tue) by thestinger (guest, #91827) [Link]

It only uses that 2M value if the stack rlimit is set to unlimited. Secondary stack size is usually 8M because the rlimit for the main thread stack is usually 8M, not unlimited.

Maximum number of threads

Posted Jun 20, 2017 15:01 UTC (Tue) by BenHutchings (subscriber, #37955) [Link] (2 responses)

No, this only affects the main thread stack. glibc does not use the kernel's MAP_GROWSDOWN feature for new thread stacks. The stack guard size for new threads is controlled using pthread_attr_setguardsize().

Maximum number of threads

Posted Jun 20, 2017 15:37 UTC (Tue) by ikm (guest, #493) [Link] (1 responses)

Hmm, the parent article specifically mentions that the main thread stack is generally not a problem, but the ones of the rest of the threads are:

> In a single-threaded process, the address space reserved for the stack can be large and difficult to overflow. Multi-threaded processes contain multiple stacks, though; those stacks are smaller and are likely to be placed between other virtual-memory areas of interest. An accidental overflow could corrupt the area located below a stack; a deliberate overflow, if it can be arranged, could be used to compromise the system.

So, if I understood things right, the change was about growing the guard size of all of the program's threads.

Maximum number of threads

Posted Jun 20, 2017 15:46 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

Right, the article is incorrect on this point.

Preventing stack guard-page hopping

Posted Jun 20, 2017 14:05 UTC (Tue) by NightMonkey (subscriber, #23051) [Link] (2 responses)

Howdy. I'm a Gentoo user. Yes, still, and love it. No I'm not compiling all day and reading the output. Just part of the day. Okay, I'm reading it. But, I put the shades up so the sun can creep in. Sometimes. (Shout-out to ChromeOS, ChromiumOS, CoreOS to building whole OSes on Gentoo and whispering about it.) And I'm REALLY not cooking marshmallows over my 8 core laptop while it compiles.

I'd love to protect my systems from this problem while we wait for a stable kernel release in 4.12 (though I usually wait for 4.*.2). Is there a nice three-step combo I can perform to mitigate this in the interim? Yes, I'm crazy enough to add a gcc flag and rebuild all my binaries. Yes, I'm crazy enough to disable or enable experimental kernel features. Of course, I read that -fstack-* gcc flags apparently don't work. Thanks in advance.

P.S. Hire me if you need a nice Gentoo guy on your side. ;)

Preventing stack guard-page hopping

Posted Jun 21, 2017 11:20 UTC (Wed) by nix (subscriber, #2304) [Link]

It's in the stable queue for 4.11: <https://git.kernel.org/pub/scm/linux/kernel/git/stable/st...>

You can apply it directly from there if you want, or wait a few hours.

Preventing stack guard-page hopping

Posted Jun 22, 2017 23:49 UTC (Thu) by flussence (guest, #85566) [Link]

github.com/zen-kernel seems to be keeping up with this incident so far - they've accumulated four patches for it on top of 4.11.6 at the time of writing. (hmm, there was only one there yesterday... guess I get to reboot again.)

For added fun...

Posted Jun 20, 2017 15:27 UTC (Tue) by corbet (editor, #1) [Link] (4 responses)

It would appear that the fix merged in 4.12-rc (and queued for stable) has a couple of problems. Dave Jones found an oopsable bug; the problem seems to be understood and a fix is in the works. The change in accounting for the guard region also broke checkpoint/restore in user space (CRIU). In this case, it's not yet clear how things can be fixed.

For added fun...

Posted Jun 20, 2017 15:46 UTC (Tue) by Sesse (subscriber, #53779) [Link] (1 responses)

So users of upstream kernels essentially don't have a working patch yet? I see at least Debian's 4.9 kernels have an extra patch for fixing THP (it can seemingly optimize away the guard page).

/* Steinar */

For added fun...

Posted Jun 20, 2017 16:10 UTC (Tue) by BenHutchings (subscriber, #37955) [Link]

We applied an earlier patch set that accounts all the guard pages, which has its own compatibility problems. We'll probably replace this with Hugh's version once its regressions have been dealt with.

For added fun...

Posted Jun 20, 2017 16:11 UTC (Tue) by BenHutchings (subscriber, #37955) [Link] (1 responses)

Any change to stack guards was pretty much bound to break CRIU and rr, unfortunately.

For added fun...

Posted Jun 20, 2017 20:32 UTC (Tue) by roc (subscriber, #30627) [Link]

Actually rr should be fine because during recording we disable MAP_GROWSDOWN and emulate it ourselves. This lets us observe and record every stack-growth event. For a while we tried to detect the kernel's stack-growth activity after the fact but that was a bit of a nightmare.

Preventing stack guard-page hopping

Posted Jun 21, 2017 13:03 UTC (Wed) by arekm (subscriber, #4846) [Link]

Is there 4.1.x backport of this patch/fix anywhere?


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds