Kernel developers at Cauldron

By Jonathan Corbet
September 18, 2024

A Linux system is made up of a large number of interdependent components, all of which must support each other well. It can thus be surprising that, it seems, the developers working on those components do not often speak with each other. In the hope of improving that situation, efforts have been made in recent years to attract toolchain developers to the kernel-heavy Linux Plumbers Conference. This year, though, the opposite happened as well: the 2024 GNU Tools Cauldron hosted a discussion where kernel developers were invited to discuss their needs.

David Malcolm started the discussion by asking whether there is interest in performing more static analysis on the kernel. Steve Rostedt pointed out some of the tools that are used for that purpose now, noting that sparse is useful for checking pointer annotations in the kernel. It can, for example, find code that does not treat user-space pointers with appropriate caution or follow the read-copy-update locking rules. David Faust said that there had been a proposal to incorporate the sparse annotations into BPF type format (BTF) tags, which might be possible with the help of C2x attributes.

Rostedt suggested that this kind of annotation could have helped to find a recent BPF bug. The BPF verifier was unaware of the fact that some tracepoints could fire with a null pointer value passed in and, as a result, did not require BPF programs attached to those tracepoints to check for null. That meant that some BPF programs were able to crash the kernel, which is not supposed to be possible. A "could be null" annotation could help the verifier in such situations.

Malcolm pointed out that GCC has supported a nonnull attribute for a long time, but that is the opposite of the needed "might be null and must be checked" meaning. It is an optimization-related feature, and not really suited for this purpose. As the discussion moved on, Malcolm said that he had created a tracker bug on using GCC to run static analysis on the kernel.

José Marchesi brought up the topic of the struct layout randomization GCC plugin that can be used to build the kernel. There is a problem, in that the randomization happens after the creation of debug data; if a structure's layout is reordered, the debug information will be incorrect. Clang, instead, orders the work correctly and does not have this problem. Segher Boessenkool asserted that the plugin is simply broken, and Sam James said that the plugin situation in general is "a mess". There is evidently a fix in circulation that depends on a new plugin hook for GCC. But I pointed out that the kernel project would like to move away from GCC plugins entirely in favor of having the necessary features supported directly by the compiler. Fixing the plugin would be welcome, but replacing it with a proper implementation would be better.

There was some discussion about the value of struct layout randomization in general, with some calling it "security through obscurity". Rostedt, though, defended the technique as a useful way to limit the effectiveness of exploits. Bradley Kuhn said that there are some "serious licensing issues" around some of the GCC plugins used by the kernel, with some users getting "aggressive emails" from the original author of some of that work. It would be far better, he said, to rewrite that functionality from scratch, built into the compiler.

It was also mentioned that layout reordering could also be used to optimize structures, eliminating internal holes. Boessenkool pointed out that this would violate the standard, which requires the address of a structure to be equal to the address of its first element. That seems like a price that some users would be willing to pay for a useful (and optional) feature, though. This part of the discussion concluded that struct layout reordering would be useful to have in GCC, but nobody said that they would work to actually implement it.

Next on the list was the unwinding of user-space stacks in the kernel, possibly by making use of SFrame data. There is interest in possibly moving the kernel over to SFrame from the ORC format used by the kernel now, Rostedt said. User-space unwinding would be useful for the generation of profiles that include both the user and kernel sides of the equation. It would be implemented by setting a task flag in the kernel saying that a user-space stack trace is needed; that trace would be generated just prior to returning to user space from the kernel.

Creating that trace would require access to user-space SFrame data from within the kernel, though. That, in turn, could require the addition of a system call for user space to provide that data. To complicate things, the SFrame situation could change rapidly over time, especially in processes running a just-in-time compiler. So perhaps the SFrame data would be stored in a memory region that is shared between the kernel and user space, eliminating the need to make a system call every time something changes.

The final topic that was discussed was a desire to obtain hints from the compiler about functions that do not return or which contain jump tables. As Josh Poimboeuf explained, this information is needed to make the kernel's objtool utility work properly on 64-bit Arm systems; that, in turn, is needed to support live patching. Indu Bhagat said that this information could be useful for some control-flow-integrity applications as well. The discussion wandered inconclusively for a while after that, with no clear solution identified.

The session did not manage to address half of the potential subjects that had been listed at the beginning. It did show, though, that there is value in getting groups of developers to talk with each other about their needs and wishes. The discussion will continue at the Linux Plumbers Conference in Vienna and, presumably, at future events as well.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event. ]

Index entries for this article
Kernel	Development tools
Conference	GNU Tools Cauldron/2024

struct address = first element address is relied on for pseudo-OO

Posted Sep 18, 2024 15:00 UTC (Wed) by smcv (subscriber, #53363) [Link] (12 responses)

> Boessenkool pointed out that [reordering structs] would violate the standard, which requires the address of a structure to be equal to the address of its first element

Keeping the first element at this position is necessary for the pseudo-object-orientation convention where a subclass struct starts with its superclass (as used in GLib and CPython among others), which is one of the few places where C allows aliasing between distinct types (aliasing is allowed if one type starts with the other):

/* superclass */
struct Vehicle { ... };

/* subclass */
struct Bicycle {
struct Vehicle parent;
struct Wheel *wheels[2];
}

struct Vehicle *vehicle = ...;

if (my_object_system_is_a(vehicle, TYPE_BICYCLE)) {
struct Bicycle *bike = (struct Bicycle *) vehicle;
if (!bike->wheels[0]) {
warning("your front wheel has been stolen");
}
}

struct address = first element address is relied on for pseudo-OO

Posted Sep 18, 2024 22:10 UTC (Wed) by arsen (subscriber, #161285) [Link]

that doesn't matter in the case that we were discussing at cauldron since the struct randomization linux wants requires opting in on each struct (except for "vtable-y" structs consisting of purely function pointers).

plus, this is kernel specific, so any arguments based on the C standard is moot IMO

struct address = first element address is relied on for pseudo-OO

Posted Sep 19, 2024 11:54 UTC (Thu) by segher (subscriber, #109337) [Link]

It is required for that yes, and for many other things. It is hard to make a usefully defined struct layout without such guarantees, and this isn’t even considering the multiple full percents of performance loss reordering like this would cause!

struct address = first element address is relied on for pseudo-OO

Posted Sep 19, 2024 12:12 UTC (Thu) by segher (subscriber, #109337) [Link] (9 responses)

Yup, exactly, and a lot of code depends on this, sometimes implicitly or unexpectedly, and much more than how much code depends on the “compiled member order” to be the same as the source member order” in general!

The kernel does a lot of similar trickery as well…

Layout randomization

Posted Oct 7, 2024 22:56 UTC (Mon) by ssokolow (guest, #94568) [Link] (8 responses)

Yup, exactly, and a lot of code depends on this, sometimes implicitly or unexpectedly, and much more than how much code depends on the “compiled member order” to be the same as the source member order” in general!

That's actually the motivation for the Rust compiler's nightly/experimental -Z randomize-layout flag... "Accessing those through avenues the compiler can't fix-up risks walking toward forcing C++-style de facto ABI stabilization on the default Rust ABI, so we're pushing back against Hyrum's Law".

If I remember correctly, the reason it's not on by default is because, while it doesn't apply to #[repr(C)] types, it's still no less at odds with reproducible builds.

Layout randomization

Posted Oct 7, 2024 23:07 UTC (Mon) by segher (subscriber, #109337) [Link] (7 responses)

No, this is something very different.

There is a lot of code that depends on the actual semantics of C. Not imagined semantics, or
observed behaviour from some single single run on a single implementation: actual,
guaranteed semantics. Very useful semantics as well, in many ways.

Layout randomization

Posted Oct 9, 2024 1:50 UTC (Wed) by ssokolow (guest, #94568) [Link] (6 responses)

Of course. There's a reason one of the purposes of using #[repr(C)] in Rust is to turn off automatic struct packing.

My point is that, with Rust's native ABI, "code [that] depends on this, sometimes implicitly or unexpectedly" is defined as broken and exempt from the stability promise, so they implemented a tool to help your test suite figure out if your code does.

Layout randomization

Posted Oct 9, 2024 14:05 UTC (Wed) by segher (subscriber, #109337) [Link] (5 responses)

At least as long as you keep saying "sometimes" in there anyone sane will consider this utmost fallacy. Sorry.

It is a language ***guarantee***. Users can depend on such things!

(And making a language with even fewer guarantees is not the way forward imo, but that is a different discussion).

Layout randomization

Posted Oct 10, 2024 8:51 UTC (Thu) by taladar (subscriber, #68407) [Link] (3 responses)

Figuring out when your code assumes something is guaranteed that is in fact not guaranteed but just happens to be the same in your test runs is important though.

And if you guarantee absolutely everything your code will become very brittle in the face of changes the world forces on others (e.g. if you guarantee something that won't hold on a new CPU architecture or with some new network protocol or crypto cipher) and it will be hard to perform any kind of optimizations on it.

Layout randomization

Posted Oct 10, 2024 12:06 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

> And if you guarantee absolutely everything your code will become very brittle in the face of changes the world forces on others

As always, I think the wrong language is being used ...

What developers used to do - maybe still should - is stick a bunch of "assert"s just after your function is called (to document your pre-requisites), and a similar bunch of asserts just before a return to document your post-requisites (call those guarantees if you like).

Effectively you're saying "this is what I need to function correctly, this is what I guarantee if everything works as designed". To what extent that's massively comprehensive is down to you, but if somebody then comes and says "your function left this data structure in a mess", you can then go back and say "where's the assert that checks what you want?". If it's not in your code, "not your problem". If it is in your code, where the **** did it get corrupted?

Cheers,
Wol

Layout randomization

Posted Oct 10, 2024 18:55 UTC (Thu) by segher (subscriber, #109337) [Link] (1 responses)

Yup. But here we specifically are talking about a language guarantee, which any developer will trust to be implemented correctly by the compiler (or the rest of the system) (except in the very exceptional cases where there is a bug of course, and finding it then can be interesting!)

This is *the* core thing a compiler does. Not trusting compiler developers to do their job is very offensive.

Layout randomization

Posted Oct 11, 2024 7:56 UTC (Fri) by taladar (subscriber, #68407) [Link]

Not trusting them with things where they might make a mistake would be fine but you can't exactly check everything with asserts every time it comes up. Do you always check booleans to see if they are either True or False? Do you check every value for correct alignment? Do you check the order of fields in every struct to see if it is the expected order? That just doesn't make sense.

Layout randomization

Posted Oct 11, 2024 9:08 UTC (Fri) by ssokolow (guest, #94568) [Link]

The "Sometimes" is purely a matter of four things:

You used unsafe and raw pointer access to circumvent their attempts to reliably give you an error message.
You ignored the docs for unsafe that told you that's not a supported mode of operation. (In the same way that you can write a kernel extension by writing bytes to dev/kmem but it's not supported.)
You didn't test the resulting code under Miri (Rust's Undefined Behaviour sanitizer built on their execution engine for running const fn in initializers).
They can't randomize the memory layout thoroughly enough to ensure your invalid code breaks outside of Miri without either making reproducible builds impossible or making performance un-competitive with C and C++.

You're well into "bringing a lawsuit against the park security guard for preventing you from climbing the fence to walk in front of the roller-coaster" territory.

Definition of struct

Posted Sep 18, 2024 18:06 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (4 responses)

A structure type is defined as

[...] a sequentially allocated nonempty set of member objects

ie, it's not just the first defined element which has to be allocated first but also, the second has to be allocated second and so on. Some people would doubtlessly argue sequentially doesn't mean the sequence used by compiled code has to correspond to the sequence used on the source code in any way. OTOH, this would render the sequentially meaningless as different members have to be allocated one after another and hence, I don't think that's a valid argument.

Real world code, eg, the Linux networking stack, also frequently uses C structs to represent protocol-defined structures like IP or TCP headers.

Definition of struct

Posted Sep 18, 2024 20:04 UTC (Wed) by garyguo (subscriber, #173367) [Link] (3 responses)

There is even a more clear explanation about this later at 6.7.2.1.13

> Within a structure object, the non-bit-field members and the units in which bit-fields
> reside have addresses that increase in the order in which they are declared. A pointer to a
> structure object, suitably converted, points to its initial member (or if that member is a
> bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
> padding within a structure object, but not at its beginning.

Definition of struct

Posted Sep 19, 2024 6:20 UTC (Thu) by riking (subscriber, #95706) [Link] (2 responses)

The desire for the bug-crunching of struct layout randomization being in tension with the guaranteed layout algorithm of the major implementations is why Rust gained -Zrandomize-layout relatively quickly.

Responding to some of the article text: Using struct layout randomization for security is wishful thinking. An attacker with an arbitrary read primitive can probe until they get enough information to figure out whether this kernel is the Fedora randomization or the Debian randomization or [...] and specialize the attack from there.
The primary value of struct layout randomization is bug hunting, figuring out where encapsulation boundaries are being punched to do code crimes without permission, and also catching out-of-bounds writes that just happen to land inside the struct.

Definition of struct

Posted Sep 19, 2024 9:35 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

> Responding to some of the article text: Using struct layout randomization for security is wishful thinking.

That's especially true because this is about compile-time reshuffling of struct members: For every individual binary, the struct layout will always remain fixed. It's just no longer possible to deduce it from reading the source. This has a strong smell of the old "It's insecure, because everyone can read the source!" argument to it, ie, that the problem is not that the code has exploitable bugs but that people who aren't very good at understanding machine code can perhaps find them.

Given enough time, we'll doubtlessly see array indexing randomization as well and someone will claim to be convinced that this simply must be good for something. After all, it's going to be much more complicated than just mapping the array index 1 to the second element of the array!

Definition of struct

Posted Sep 27, 2024 15:21 UTC (Fri) by nix (subscriber, #2304) [Link]

It's still valuable for distros that have users build the kernel themselves, perhaps from a distro-provided .config or an augmented version thereof. (Yes, maybe there aren't many of us, but they include every user of several whole distros, like gentoo.)

I wonder who would that be

Posted Sep 18, 2024 22:04 UTC (Wed) by intelfx (subscriber, #130118) [Link]

> Bradley Kuhn said that there are some "serious licensing issues" around GCC plugins, with some users getting "aggressive emails" from the original author of some of that work

Let me guess: the grsecurity guy?

Length of a feather

Posted Sep 18, 2024 22:35 UTC (Wed) by sam_c (subscriber, #139836) [Link] (3 responses)

I wish we'd had more time to discuss the other (important!) topics, but I felt the session was productive. I think for future years, we should see about finding a way to allow longer BoFs on request. The session was only an hour and felt like it could've easily been twice that.

On the upside, I think SFrames and the infra around them got a lot of thrashing out both in the corridors and Indu's talk and some progress is going forward there.

Length of a feather

Posted Sep 19, 2024 12:04 UTC (Thu) by segher (subscriber, #109337) [Link] (2 responses)

There are about equally many BoFs as presentations now, so if we want longer BoFs, we’ll need more rooms. Or maybe one or more of the BoFs can fit in the third room (the small room). Or maybe we’ll need to make the Cauldron take four days!

I guess this is a long way to say “it is not as easy as all that / we do our best” :-)

Length of a feather

Posted Sep 27, 2024 15:23 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

I'd be surprised if there weren't more empty rooms around in the university :) perhaps some of them also don't have a tradeoff between being impossibly hot or having what sounds like a regiment of tanks driving by drowning out the speaker several times a minute. (I suspect I'm deafer than most attendees, though: maybe most people were fine with it.)

Length of a feather

Posted Oct 1, 2024 13:10 UTC (Tue) by segher (subscriber, #109337) [Link]

That's not the point. I already said that btw.

Even with just *two* sessions at a time there already are conflicts where people want to attend two at the same time, or even *have to*. The organisers have been happy to reschedule things where needed in previous years, but if we would have four rooms, such problems become unavoidable.

It was not "impossible hot", rather cool even. I did not notice any noise nuisance, neither directly nor did other people talk about it.

I'm sure things could work better, but it was done quite well this year.