User-space shadow stacks (maybe) for 6.4

By Jonathan Corbet
March 24, 2023

Support for shadow stacks on the x86 architecture has been long in coming; LWN first covered this work in 2018. After five years and numerous versions, though, it would appear that user-space shadow stacks on x86 might just be supported in the 6.4 kernel release. Getting there has required a few changes since we last caught up with this work in early 2022.

Shadow stacks are a defense against return-oriented programming (ROP) attacks, as well as others that target a process's call stack. The shadow stack itself is a hardware-maintained copy of the return addresses pushed onto the call stack with each function call. Any attack that corrupts the call stack will be unable to change the shadow stack to match; as a result, the corruption will be detected at function-return time and the process terminated before the attacker can take control. The above-linked 2022 article has more details on how x86 shadow stacks, in particular, work.

The current version of the patch set is the eighth revision posted by Rick Edgecombe (who took it over after some 30 revisions posted by Yu-cheng Yu).

API changes

The user-space API for working with shadow stacks has not changed much in the last year. Most operations are done with arch_prctl() calls, specifically:

ARCH_SHSTK_ENABLE turns on the shadow stack for the current thread; shadow stacks are not enabled by the kernel when a process starts.
ARCH_SHSTK_DISABLE disables the use of the shadow stack for the current thread.
ARCH_SHSTK_LOCK prevents any further changes to a thread's shadow-stack status. Among other things, this operation can keep an attacker from somehow disabling the shadow stack before corrupting the call stack.
ARCH_SHSTK_UNLOCK undoes the effect of ARCH_SHSTK_LOCK. This option was added to version 4 of the patch set in December; it exists to support functionality like Checkpoint/Restore in User Space that needs to be able to change the shadow-stack status after a process has launched. This option is only available when invoked via ptrace(); a process cannot use it on itself directly.
ARCH_SHSTK_STATUS returns the current shadow-stack status.

Normally, the kernel handles the allocation and placement of shadow stacks, but there are occasions where an application will need to manage its shadow stacks directly. The map_shadow_stack() system call exists for this purpose; its prototype has changed a bit over the course of the last year:

    void *map_shadow_stack(unsigned long address, unsigned long size,
    			   unsigned int flags);

Same old SHSTK
At one point, Andrew Morton complained about the "shstk" abbreviation, saying that it "sounds like me trying to swear in Russian while drunk". As a result, that term was pulled out of much of the generic code, but remains in the x86 portion.

This call will attempt to set up a shadow stack at the given address of the requested size, returning the actual mapped address on success. The one possible value for flags is now called X86_FEATURE_USER_SHSTK; it requests that the necessary "restore token" — which, among other things, prevents multiple threads from sharing the same shadow stack — be stored into the newly created stack.

There is one other subtle change to map_shadow_stack() that affects how shadow stacks are handled in general. The shadow-stack feature has incompatibilities with 32‑bit code, especially when signals are involved. The kernel will refuse to enable a shadow stack for a thread that is running in the 32-bit mode and, in version 4 of the patch set, code was added to simply disable any signal handlers if a process switched to 32-bit mode after the shadow stack was enabled.

Beyond seeming like a bit of a hack, this approach did not fully solve the problem. As it turns out, a 64-bit thread can switch to the 32-bit mode without the kernel's knowledge or permission — meaning that the disabling of signal handlers can be circumvented. After some deliberation on how to avoid subtle problems when this happens, the decision was made (for version 5) to just always map the shadow stack at a virtual address above 4GB, making it inaccessible to 32-bit code. As a result, any attempt to switch to the 32-bit mode when a shadow stack is enabled will cause an immediate crash.

This change resulted in a new mmap() flag, MAP_ABOVE4G, which forces the mapping to be created above the 4GB virtual-address boundary. The address passed to map_shadow_stack() (if not zero, indicating no preference) must also be above 4GB or the call will fail. Someday, somebody with sufficient motivation could perhaps find a way to make 32-bit code work with shadow stacks, but given how little interest there is in 32-bit code in general, that seems unlikely to happen.

The glibc problem

While it might be nice to run all programs with shadow stacks enabled, there are applications that would break in that environment. Anything that manipulates its own call stack — just-in-time compilers, for example — will find itself out of sync with the shadow stack and brought to an untimely end. So the enabling of the shadow stack must be limited to code that can handle it.

The scheme that was developed, some time ago, was to place a special note in the .note.gnu.property ELF section of the program's executable image. If that note exists (as the result of compiler options provided when the program was built), that indicates that it is safe to run the program with the shadow stack enabled. That note is not sufficient for the kernel to make the decision, though, so the enabling of the shadow stack is left to user space, and to the C library's program loader in particular.

Enthusiastic developers in the GNU C Library (glibc) community quickly wired up support for turning on the shadow stack when it seemed appropriate; current versions of glibc are poised to turn on the shadow stack as soon as the kernel supports the feature. There is only one little problem: the glibc support was written with an early version of the user-space API in mind. That API no longer exists; trying to use it would result in crashing programs and a failure to boot. That will indeed secure it against ROP attacks, but users can be picky about just how that kind of security was achieved and may complain.

That problem was resolved early on by changing the API enough that glibc simply doesn't find it anymore and thinks that the shadow-stack functionality is not present. The glibc developers have said, though, that they intend to implement the new shadow-stack API once it is merged; thereafter, when an updated glibc shows up on a system, any program that indicates a readiness for a shadow stack will get one.

That leads to a new problem, as noted in the version-3 cover letter: not all applications that are marked as being ready really are.

But many application binaries with the bit marked exist today, and critically, it was applied widely and automatically by some popular distro builds without verification that the packages actually support shadow stack. So when glibc is updated, shadow stack will suddenly turn on very widely with some missing verification.

Applications that will break in this environment evidently include node.js and PyPy, so this seems like a real problem. A quick check on a Fedora 37 system shows that PyPy is indeed built with the shadow stack enabled:

    $ readelf -n /usr/bin/pypy
    Displaying notes found in: .note.gnu.property
      Owner                Data size        Description
      GNU                  0x00000040       NT_GNU_PROPERTY_TYPE_0
          Properties: x86 feature: IBT, SHSTK
    [...]

Even if the root cause lies in user space, it can be provoked by upgrading to a new kernel, and thus looks like a kernel regression. Kernel developers generally prefer to avoid breaking systems, even if that breakage can be said to be somebody else's fault.

The ideal solution, according to Edgecombe, would be to simply move to a new ELF bit to identify real shadow-stack readiness and have glibc use that. Distributors could then be encouraged to be more careful about marking applications as being shadow-stack ready. But, he said, "it doesn’t seem like the glibc developers are interested in working on a solution", so something else is needed. In version 3, that something else was a patch disabling the shadow-stack API when the ELF bit is detected. The idea was that distributors would eventually disable that check once they had confirmed that all of the packages they ship included correctly marked binaries.

The patch was described as "a bit dirty" and included for the sake of discussion — which indeed resulted. H.J. Lu suggested that the right approach was just to avoid upgrading glibc until the system was ready for it. Florian Weimer added that most of the incompatible code is to be found in libraries that are loaded after a process starts; the kernel test would not detect those, and it may be too late to disable the shadow stack in any case.

After a while, Edgecombe asked Linus Torvalds what he thought should be done about this problem. Torvalds answered that he did not want to preemptively disable shadow-stack support without a reason:

Once [shadow-stack functionality] is enabled in the kernel, and it turns out that people complain that it breaks existing binaries, at that point I guess it gets disabled again. Possibly at that point using something like your suggested patch. But I'm not doing it until actual problems appear, and until we actually have this code in the kernel.

The patch disabling the shadow-stack API was duly taken out of the series. Weimer described a couple of plans for ensuring that shadow stacks could be safely enabled in distributions, claiming that adopting a new ELF bit would delay that process considerably. Shadow-stack support, he said, is not much different from supporting a new system call; that, too, can break existing applications, mostly as the result of seccomp() filters that do not understand the new call.

On to 6.4

The result of the discussion is that the kernel will take no special steps to avoid breaking binaries that were incorrectly marked as being ready for shadow stacks — at least, not before a problem is demonstrated. Most of the other outstanding issues appear to be resolved, to the point that Edgecombe prefixed the current version with a remark that "we have a pretty good initial shadow stack implementation here". There are a number of desired enhancements, but those might be done better, he said, after there has been some real-world use of the code that exists now.

So, after all this work, the 40 shadow-stack patches have been added to the tip tree, which feeds them into linux-next. If no show-stopping problems turn up over the course of the next month or so, user-space shadow-stack support for x86 systems will, most likely, move upstream during the 6.4 merge window. Finally, after a long development period, the shadow (stack) will truly know what evil lies in the heart of ROP attackers.

Index entries for this article
Kernel	Releases/6.6
Kernel	Security/Control-flow integrity
Security	Linux kernel

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 14:57 UTC (Fri) by syrjala (subscriber, #47399) [Link] (22 responses)

I can understand the need for this shadow stack approach for a platform such as x86 that has legacy compatibility needs. But the thing I don't really understand is why something fairly new such as RISC-V didn't just go for a totally separate return stack from the start...

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 15:50 UTC (Fri) by pm215 (subscriber, #98099) [Link] (18 responses)

The more weird things you do in your CPU architecture and ABI, the more barriers you have to adoption. Exceed your weirdness budget and you fail to get widespread adoption. So everything you pick that isn't "same way everybody else does it" has to be pretty carefully considered and should be something you think really matters for the use cases you're targeting. Should RISC-V have used some of their weirdness budget on having a separate return stack? Maybe, but it shouldn't be a big surprise that they went with the default answer of "no, don't be weird"...

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 17:08 UTC (Fri) by atnot (subscriber, #124910) [Link] (1 responses)

I think this argument is a bit less strong for something like RISCV which has a design that has generally ignored most advancements in RISC ISAs since the 1980s and has spent most of it's novelty points on questionable things like having do-everything instructions like jalr instead.

User-space shadow stacks (maybe) for 6.4

Posted Apr 1, 2023 21:49 UTC (Sat) by anton (subscriber, #25547) [Link]

JALR is a venerable MIPS instruction (already present in the R2000 (the first MIPS CPU)), no novelty points there. And it does not do everything, it's an instruction for indirect calls (used, e.g., for calling a method in an object-oriented language).

RISC-V follows in the footsteps of the MIPS heritage (Alpha and DLX). RISC-V does support 64-bit variants, not available in the 1980s, but yes, it does not have many novelties, that's not it's purpose. What advancements of RISC since the 1980s are you thinking about?

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 21:18 UTC (Fri) by ballombe (subscriber, #9523) [Link] (15 responses)

risc-V ABI is so outdated that it looks weird already...
What a new ABI need to provide is separated kernel and userspace address space as in sparc and s390.

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 22:47 UTC (Fri) by jrtc27 (subscriber, #107748) [Link] (14 responses)

If you keep userspace positive and kernel space negative then that doesn't buy you much other than perhaps some hardening against speculative execution attacks, since you don't get any more address space bits to use (and x86 SMAP / AArch64 PAN / RISC-V SUM stops accidental dereferencing of userspace addresses. If you allow the address spaces to overlap then it's a lot more dangerous, since you go from being able to discriminate between user and kernel addresses to not being able to, and being at risk of dereferencing a (valid) userspace address as a (valid) kernel address.

User-space shadow stacks (maybe) for 6.4

Posted Mar 25, 2023 13:39 UTC (Sat) by ballombe (subscriber, #9523) [Link] (13 responses)

Fortunately that is not how it is done on sparc and s390.

User-space shadow stacks (maybe) for 6.4

Posted Mar 26, 2023 15:40 UTC (Sun) by farnz (subscriber, #17727) [Link] (12 responses)

So how is it done on SPARC and S390?

The claim is that there's only two ways to do it, neither of which bring huge advantages:

One bit of virtual address space is used to indicate whether this is a kernel address or a userspace address; given a virtual address, I can thus immediately see if it's meant to be a kernel space address or a user space address.
Two separate address spaces, where the pointer values can overlap. Given a virtual address, I can't tell whether it's meant to be kernel or user space, and I need to look at out-of-band information to determine which it is.

What's the third mechanism?

User-space shadow stacks (maybe) for 6.4

Posted Mar 26, 2023 21:37 UTC (Sun) by ballombe (subscriber, #9523) [Link] (11 responses)

The extra bits are only in the MMU, that is why this requires hardware support, see
<https://lwn.net/Articles/742245/>.

User-space shadow stacks (maybe) for 6.4

Posted Mar 27, 2023 8:46 UTC (Mon) by farnz (subscriber, #17727) [Link] (10 responses)

But you just said that SPARC and S/390 don't do it that way - this is claim version 2, where external data in the MMU determines whether a physical address is interpreted as a kernel address or a user address.

User-space shadow stacks (maybe) for 6.4

Posted Mar 29, 2023 20:54 UTC (Wed) by ballombe (subscriber, #9523) [Link] (9 responses)

I was replying to "you go from being able to discriminate between user and kernel addresses to not being able to, and being at risk of dereferencing a (valid) userspace address as a (valid) kernel address."

This is not the case, the kernel can use the MMU to discriminate between user and kernel addresses.
The point is that userspace cannot create kernel pointers.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 8:59 UTC (Thu) by farnz (subscriber, #17727) [Link] (8 responses)

OK, so how exactly, using the MMUs, do I determine if 0x1ffff is a kernel or a user address on SPARC? Take as read that I have valid page table mappings for 0x10000 to 0x80000 in both user and kernel ASIDs.

Your claim continues to be that with the SPARC setup, while in kernel mode, I can determine if the address 0x1ffff is meant to be a kernel or a user address, and I just don't see how you can do that without knowing which address space I'm meant to use.

The original claim is that using in-band signalling (top bit of address) for kernel versus user is valuable, since then I can do a trivial check on all addresses coming from userspace to confirm that they are not kernel addresses. This then means that I can fail very quickly if I attempt to use a user address as a kernel address, since I'm using different ASIDs for accesses, and when I use the kernel ASID to access a user address, it'll fault.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 10:15 UTC (Thu) by paulj (subscriber, #341) [Link] (7 responses)

You distinguish it based on the "Address Space Identifier", a separate 8-bit tag, of which the first 0x80 are restricted to privileged code. The CPU can do a trivial check on the ASI before passing an address to the MMU. The MMU has to be supplied with the ASI and the address, so the address can be translated acccording to the correct context.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 10:22 UTC (Thu) by farnz (subscriber, #17727) [Link] (6 responses)

Yes, but the original claim that ballombe made was that with just the VA, no ASID, you can distinguish SPARC kernel and user mode addresses.

He was responding to a comment that said that there were two ways of handling the kernel user split:

Use one bit from the VA space to distinguish kernel from user, and make sure that accesses with the wrong ASID fail if that bit is set to the wrong value. This means you can distinguish kernel and user addresses by the VA alone, and you can use the ASID hardware to make sure that accessing a kernel address with a user access ASID (either userspace or kernel accessing what it thinks is user address space) will fail.
Permit the two spaces to overlap, and require the ASID alone to distinguish the two classes of address space. This has the disadvantage that you can no longer look at a VA and determine whether it's "meant" to be a kernel address or a user address, and thus when you have your "bad access" fault, you cannot look at the address and go "that's clearly bad - the VA is a kernel VA, but the ASID in use is a user ASID" or vice-versa.

The assertion made was that S/390 and SPARC don't work in either of those two ways, and that you neither use bits in the VA space to distinguish the two addresses, nor do you have a possible overlap between the two address spaces. I'm asking how SPARC and S/390 make that work.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 11:04 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

Oh, sorry, I missed that aspect.

Having just skimmed the SPARCv9 Architecture Manual to look up the ASID stuff, the ASID appears to be intrinsically required to translate addresses correctly with the right context. You can't tell from the VA, you need the ASID - that's the point. The ASID is always there as part of the translation, given implicitly or explicitly.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 11:08 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

Yeah, that's what I'm familiar with from SPARCv8, where the CPU automatically uses different ASIDs for instruction and data fetches crossed with user or kernel, for 4 default ASIDs, plus has an override option for data fetches to use any ASID - but I was really hoping to hear about some clever trick in later SPARC definitions that gets me the benefits of both worlds, and the cost of neither.

Otherwise, with large enough VAs, what you end up wanting is to use in-band signalling (top bit like in x86-64, for example) to indicate kernel or user address, with the separate ASIDs ensuring that if I'm fetching with a kernel-mode instruction fetch, I can't fetch from user addresses at all, nor can I fetch kernel data, while if I'm fetching with a user-mode overriden data fetch, I can't fetch from kernel addresses at all.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 11:38 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

The ASID space is expanded in v9 basically, it seems like - judging from what you've written and my (very cursory) skim of the v9 manual.

I like the explicit tag of the ASID in SPARC. CPU can easily apply basic checks. If you're going to reserve bits to identify address space contexts, you might as well make it explicit. SPARC VAs can use the full address space, cause the ASID tag can be set in a separate register and left implicit for a stream of instructions (IIUC).

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 12:38 UTC (Thu) by farnz (subscriber, #17727) [Link]

That sounds similar to SPARCv8 - 256 ASIDs, 4 of which are predefined and used by default for all instructions that don't override ASID. The MMU is between CPU and L1 cache, and maps ASIDs to either context IDs for paging, or another memory map if there's a predefined mapping (e.g. if you follow SPARC recommendations, some ASIDs are used for 36 bits of direct map, others are used to access MMU register space). Caches track the context ID and virtual address, so that you don't have to flush caches when you change ASID to context ID mapping.

For 32 bit systems, where VA space is at a premium, I get not reserving one bit for kernel/user distinction. But in 64 bit systems, where you have a very large VA space, I don't see that reserving one bit for kernel/user is a huge price to pay for the debuggability and security check simplification it gives you (you know up-front that any top-bit-set address is a kernel address, and top-bit-clear is a user address, even without knowing the context ID that you're going to use with that address). And you can program SPARC hardware with contexts that fault if user accesses are used for kernel addresses or vice-versa.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 15:36 UTC (Thu) by geert (subscriber, #98403) [Link] (1 responses)

I think there's a big misunderstanding: the kernel does not mix kernel and user pointers. The latter are always tagged with __user, and always used with special accessors (e.g. copy_{from,to}_user()). Hence there is no need to find out if a random VA points to kernel or user space.

I assume the SPARC and s390 feature is similar, or an extension to what m68k provides: separate function codes for user and kernel (and for program and data, but that doesn't matter here). M68k also has two sets of page tables: one for the kernel (supervisor), one for userspace.
Hence userspace accesses are always translated by the user page tables.
Kernel space accesses are translated by the kernel page tables, except when using the special MOVES instruction, which will access based on a preset function code.
This mechanism supports having the full 4 GiB address space available to both kernel and user space ("4G/4G split"), without the need to split the address space in separate parts for kernel and user memory (e.g. "1G/3G" split) to let the kernel access user memory and kernel memory.

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 15:41 UTC (Thu) by farnz (subscriber, #17727) [Link]

The kernel does not deliberately mix kernel and user pointers. However, it's not hard to find past bugs where the kernel has been tricked into reading from a pointer supplied by user space; being able to validate early on that this pointer is supposed to be an __user pointer, but has the VA format of a kernel pointer (or that this pointer has the VA format of a user pointer, but is not tagged as an __user pointer) is useful for actually finding such bugs.

It's a non-issue as long as the kernel is completely free of bugs, which is the case you've described. But that's, unfortunately, not the world I live in.

User-space shadow stacks (maybe) for 6.4

Posted Mar 28, 2023 11:22 UTC (Tue) by renox (guest, #23785) [Link] (1 responses)

> But the thing I don't really understand is why something fairly new such as RISC-V didn't just go for a totally separate return stack from the start...

1) RISC-V is an evolution of MIPS , so it isn't "really new".

2) RISC-V creators targeted micro-controllers at the beginning, so if you expect any security improvement in RISC-V, you're going to be quite disappointed..
They even removed the "trap on overflow" integer arithmetic operations that the MIPS had :-(

User-space shadow stacks (maybe) for 6.4

Posted Mar 30, 2023 14:59 UTC (Thu) by ejr (subscriber, #51652) [Link]

Minor nit: Krste and crew *published* against micro-controllers because those were the most reasonable comparisons. There was plenty of high-performance work, but no private funding interest at *that* time.

User-space shadow stacks (maybe) for 6.4

Posted Apr 5, 2023 12:00 UTC (Wed) by andy_shev (subscriber, #75870) [Link]

Maybe because the human being nature is to be lazy and not repeat ourselves? That trend usually visible in pharmacy and medicine when MDs are trying to reuse the old drugs against new deseases.

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 16:23 UTC (Fri) by dezgeg (subscriber, #92243) [Link] (2 responses)

The quote-box ("Same old SHSTK") here looks quite bad under default dark mode colors.

User-space shadow stacks (maybe) for 6.4

Posted Mar 25, 2023 20:59 UTC (Sat) by gerdesj (subscriber, #5446) [Link]

Try these colours instead, it looks fine:

Page background color: #ffffff
Left column color: #ffcc99
Middle column background: #ffffff
Headline background: #ffcc99
Form/byline background: #eeeeee
Sidebar background: #ffcc99
Text color: black
Link color: DarkBlue
Visited link color: #444
Quoted text (in email) color: #990099
Old (seen) comment background color: #cccccc
Logo color: green

Dark mode colors

Posted Mar 29, 2023 22:45 UTC (Wed) by corbet (editor, #1) [Link]

So the color messup was the result of a dumb typo in the definitions of those colors; I have fixed it now. You will probably have to go into the customization area and re-select the dark-mode colors to get the fix, unfortunately; apologies for that.

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 16:33 UTC (Fri) by stop50 (subscriber, #154894) [Link] (1 responses)

Why was pypy compiled already with the shadowstack elf info?

User-space shadow stacks (maybe) for 6.4

Posted Mar 25, 2023 12:55 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

The compiler adds it by default to all of the object files it produces. That is okay if there is even a single assembly source file, but not if the runtime compilation or stack switching is done entirely with C code.

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 16:59 UTC (Fri) by old (subscriber, #154324) [Link] (1 responses)

Shadow stack is also a real pain for dynamic instrumentation in userspace when changing the return address on the stack to instrument function's entry/exit like kretprobe does.

Great to see the ARCH_SHSTK_UNLOCK thanks to CRIU. That should be enough to ease the pain.

User-space shadow stacks (maybe) for 6.4

Posted Apr 6, 2023 8:36 UTC (Thu) by andrey.turkin (guest, #89915) [Link]

Also every fiber implementation out there has to be retrofitted with shadow stack support.
On the other hand, it is very handy to have a separate stack filled only with the actual call flow - it could allow for a very quick and reliable way to get a stack trace (without the stack frames but many tools don't actually need it).

User-space shadow stacks (maybe) for 6.4

Posted Mar 24, 2023 19:28 UTC (Fri) by fredex (subscriber, #11727) [Link] (1 responses)

SHSTK. add a VCH at the end and you (almost) have Shostakovich. Speaking of Russian...

User-space shadow stacks (maybe) for 6.4

Posted Mar 25, 2023 9:27 UTC (Sat) by dottedmag (subscriber, #18590) [Link]

Shadow stack virtual channel, hmmm

PTRACE_ARCH_PRCTL

Posted Mar 26, 2023 11:03 UTC (Sun) by geofft (subscriber, #59789) [Link]

[PATCH v4 38/39] x86/shstk: Add ARCH_SHSTK_UNLOCK mentions "the ptrace arch_prctl interface," which I hadn't heard of before and appears to be undocumented - the arch_prctl(2) and ptrace(2) manpages don't mention each other.

It looks like this is an x86_64-specific extension. From arch/x86/kernel/ptrace.c:

#ifdef CONFIG_X86_64
                /* normal 64bit interface to access TLS data.
                   Works just like arch_prctl, except that the arguments
                   are reversed. */
        case PTRACE_ARCH_PRCTL:
                ret = do_arch_prctl_64(child, data, addr);
                break;
#endif

That is, to call arch_prctl(code, addr) on a proces you're tracing, run ptrace(PTRACE_ARCH_PRCTL, pid, addr, code). For the specific operation in this patch, it would be ptrace(PTRACE_ARCH_PRCTL, pid, features, ARCH_SHSTK_UNLOCK), I think.

User-space shadow stacks (maybe) for 6.4

Posted Apr 20, 2023 13:34 UTC (Thu) by immibis (guest, #105511) [Link] (2 responses)

A 64-bit thread may switch to 32-bit mode without the kernel's knowledge? How does that happen?

User-space shadow stacks (maybe) for 6.4

Posted Apr 29, 2023 20:36 UTC (Sat) by foom (subscriber, #14868) [Link] (1 responses)

Whether you're in the 32bit or 64bit submode of long mode is configured via attributes on the current code segment.

On Linux, code segments with both attributes are available for all processes, so you can flip back and forth with just:
ljmp $0x33, label ; jump to label in 64bit mode
ljmp $0x23, label ; jump to label in 32bit mode

The numbers correspond to __USER_CS and __USER32_CS from https://github.com/torvalds/linux/blob/master/arch/x86/in...

I don't know if there's any real non-exploit code which actually does this, though...

User-space shadow stacks (maybe) for 6.4

Posted Apr 29, 2023 21:49 UTC (Sat) by dtlin (subscriber, #36537) [Link]

That must be how Wine's WoW64 works (https://www.winehq.org/announce/8.0). 32-bit code running inside a 64-bit process.