BPF-based error injection for the kernel

By Jonathan Corbet
November 29, 2017

Diligent developers do their best to anticipate things that can go wrong and write appropriate error-handling code. Unfortunately, error-handling code is especially hard to test and, as a result, often goes untested; the code meant to deal with errors, in other words, is likely to contain errors itself. One way of finding those bugs is to inject errors into a running system and watching how it responds; the kernel may soon have a new mechanism for doing this sort of injection.

As an example of error handling in the kernel, consider memory allocations. There are few tasks that can be performed in kernel space without allocating memory to work with. Memory allocation operations can fail (in theory, at least), so any code that contains a call to a function like kmalloc() must check the returned pointer and do the right thing if the requested memory was not actually allocated. But kmalloc() almost never fails in a running kernel, so testing the failure-handling paths is hard. It is probably fair to say that a large percentage of allocation-failure paths in the kernel have never been executed; some of those are certainly wrong.

The kernel gained a fault-injection framework back in 2006; it can be used to test error-handling paths by causing memory allocation requests to fail. Just making kmalloc() fail universally is unlikely to be helpful, though; execution will almost certainly never make it to the code that the developer actually wants to test. The fault-injection framework has some parameters to control which allocation attempts should fail, but the mechanism is somewhat awkward to use and is not as flexible as one might like. So the number of developers actually using this framework is small.

Fully generalizing fault injection would be a lot of work. A developer may want to see what happens when a specific kmalloc() call fails, but perhaps only when it is invoked from a specific call path or when some other condition is true. It has not been possible in the past to describe these conditions to the framework but, in recent years, a new technology has come along that can provide the required flexibility: the BPF virtual machine.

It is already possible to attach a BPF program to an arbitrary function using the kprobe mechanism. Such programs are useful for information gathering, but they cannot be used to affect the execution of the function they are attached to. Thus, they are not usable for error injection. That situation changes, though, with this patch set from Josef Bacik, which is intended to turn BPF into a generalized mechanism for the injection of errors into a running kernel.

The core of the new mechanism is a BPF-callable function called bpf_override_return(). If a BPF program attached to a kprobe calls this function, the execution of the function the program is attached to will be shorted out and its return value will be replaced with a value supplied by that BPF program. The patch set contains an example in the form of a test program:

    SEC("kprobe/open_ctree")
    int bpf_prog1(struct pt_regs *ctx)
    {
	unsigned long rc = -12;

	bpf_override_return(ctx, rc);
	return 0;
    }

This function can be compiled to BPF using the LLVM compiler. The SEC() directive at the top specifies that this function should be attached to a kprobe placed at the beginning of open_ctree(), a function in the Btrfs filesystem implementation. After the placement of this probe and the attachment of the BPF function, a call to open_ctree() will be overridden and the value -12 (-ENOMEM) will be returned. This is a relatively simplistic example, of course; it is expected that many uses will require more sophisticated BPF programs to narrow down the set of situations where the injection will occur.

This patch set had been through several revisions and appeared ready for inclusion into the mainline; it had even been applied to the networking tree for the 4.15 merge window. Things came to a halt, though, when Ingo Molnar blocked the progress of this patch set out of worries that it violated one of the basic promises behind the BPF virtual machine and could destabilize the kernel:

One of the major advantages of having an in-kernel BPF sandbox is to never crash the kernel - and allowing BPF programs to just randomly modify the return value of kernel functions sounds immensely broken to me.

After some discussion, a solution was agreed to: BPF programs would retain the ability to override kernel functions, but only for functions that have been specifically marked to allow this to happen. A new macro called BPF_ALLOW_ERROR_INJECTION() was introduced; it can be used to add the required annotation to a function. See, for example, this patch adding the marking for open_ctree(). Molnar suggested some additional conditions — only functions whose return value cannot crash the kernel should be annotated, and the override function should only change integer error values — but nothing enforces those rules in the current patch set.

Bacik's patch set only marks that one function; it is not clear whether those markings will be added in any quantity to the mainline kernel, or whether they will, instead, be maintained as private patches by the developers who use them. One can imagine that there could be some resistance to marking up the mainline in this way. But, on the other hand, there would be value in marking functions like kmalloc() to enable the development of generic tools that can be used to test specific allocation-error handling paths.

That question is only likely to be resolved once the mechanism is in place and patches marking functions for error injection start to appear. Meanwhile, the objections to the core mechanism have been addressed, and its path into the mainline appears to be clear. It has missed the 4.15 merge window, though, so it will almost certainly have to wait until 4.16.

Index entries for this article
Kernel	Development tools/Kernel debugging
Kernel	Fault injection

kernel-corrupting BFF programs?

Posted Nov 29, 2017 20:12 UTC (Wed) by darwish (guest, #102479) [Link] (9 responses)

I honestly agree with Ingo here. At the theoretical level, this is the anti-thesis of BPF! Having a sound design is criical for operating systems progress in the long term..

kernel-corrupting BFF programs?

Posted Nov 29, 2017 21:05 UTC (Wed) by SEJeff (guest, #51588) [Link] (1 responses)

Somewhere out of the public eye, Brad Spengler (aka spender) is coming up with ways that this can turn into root exploits and is writing code to ensure grsecurity prevents it.

kernel-corrupting BFF programs?

Posted Dec 1, 2017 1:02 UTC (Fri) by JdGordy (subscriber, #70103) [Link]

dumb question but don't you already need root to attach these programs?

kernel-corrupting BFF programs?

Posted Nov 29, 2017 22:00 UTC (Wed) by NAR (subscriber, #1313) [Link] (4 responses)

I believe most code cannot be unit tested without mock objects/functions/BPF programs. So I see the value of this code, but I find it strange if it's not between #ifdef DEBUG and #endif directives...

kernel-corrupting BFF programs?

Posted Nov 30, 2017 19:55 UTC (Thu) by k3ninho (subscriber, #50375) [Link] (3 responses)

> I believe most code cannot be unit tested without mock objects/functions/BPF programs.

I applaud the spirit of your sentence, but the specific meaning of the words you used is wrong. A unit test is an isolatable unit which should never touch outside and can be run in embarrassingly-parallel concurrent fashion; integration tests are the counterpart which validate that the functional units integrate correctly with the rest of the program and system. This matches up nicely with the design paradigm of 'separation of interfaces and implementation' which has benefits in refactoring and deprecating old implementation details while presenting the same interface to consumers (plus optionally allowing different versions of your interfaces to be in use, maintained and stable). Having the right words for these situations helps clear thinking when designing and implementing systems.

A unit test shouldn't need a lot of state supplied by the test harness; shouldn't need to reach outside its function, procedure or method to show that it's doing what you believe it should be doing. There's a 'bad code smell' that comes with complex units. You know those times when they're flaky because they're vulnerable to side effects? I typically advise people with complex unit tests to refactor and simplify the tests as well as simplifying the underlying code. How can you not like simpler code plus you get embarrassingly-parallel concurrency for running these tests when they're little atoms of logic?

> I find it strange if it's not between #ifdef DEBUG and #endif directives...
You might even wrap the BPF_ALLOW_ERROR_INJECTION() annotation macro in a config wrapper so that no prod system has this injector but someone looking to replicate an observed failure can work on it. It's a given that you'd have to be careful that you're running through the exact same logic as the production binary that failed.

K3n.

kernel-corrupting BFF programs?

Posted Dec 1, 2017 13:58 UTC (Fri) by hkario (subscriber, #94864) [Link] (2 responses)

> A unit test shouldn't need a lot of state supplied by the test harness;
> shouldn't need to reach outside its function, procedure or method to show
> that it's doing what you believe it should be doing. There's a 'bad code smell'
> that comes with complex units. You know those times when they're flaky
> because they're vulnerable to side effects? I typically advise people with
> complex unit tests to refactor and simplify the tests as well as simplifying the
> underlying code. How can you not like simpler code plus you get
> embarrassingly-parallel concurrency for running these tests when they're
> little atoms of logic?

that's the theory, the practice is that they're needed to test interfaces and sometimes even code designed and written two decades ago

theoretical purism also doesn't help in the initial refactoring of the code - how can you refactor code if you can't know if you're not changing its behaviour because you don't have test cases for it?

kernel-corrupting BFF programs?

Posted Dec 5, 2017 16:37 UTC (Tue) by k3ninho (subscriber, #50375) [Link] (1 responses)

> that's the theory, the practice is that they're needed to test interfaces ...

Oh, so you're talking about integration testing, which is something I'm keen that you know to be distinct from unit testing? Conflating these two things is bad for your code!

> how can you refactor code if you can't know if you're not changing its behaviour because you don't have test cases for it?

One of my favourite books is Michael C. Feathers' "Working Effectively with Legacy Code" which calmly and politely says, again and again:
0. write tests round the code as it is, as best you understand it*
1. change the code, i.e. refactor or add functionality
2. ensure the tests keep working

*: ...and write more tests as you come to understand it better

You might sample existing data on the live system and replay it, but that's unlikely to protect against embarrassing edge cases. You might instrument your existing code and log its data flows while live, then reproduce them in your test framework, but it loses validity over time. The shortest route to a reliable system when you don't understand it is, from experience, the workflow which loops through "wrap with tests as best you understand it / change the code / ensure the tests continue to pass."

K3n.

kernel-corrupting BFF programs?

Posted Dec 5, 2017 17:22 UTC (Tue) by hkario (subscriber, #94864) [Link]

> Oh, so you're talking about integration testing, which is something I'm keen that
> you know to be distinct from unit testing?

if you don't have units that can be tested in isolation (most of kernel code) then by definition you don't have unit tests :)

> One of my favourite books is Michael C. Feathers' "Working Effectively with
> Legacy Code"

that's what I was had in mind

kernel-corrupting BFF programs?

Posted Nov 30, 2017 0:32 UTC (Thu) by iabervon (subscriber, #722) [Link]

I think the BPF part of this is perfectly reasonable. After all, BPF programs have always been about specifying something different that happens instead of what was requested. On the other hand, I'd expect it to get attached to something like "fault/open_ctree", which would only be a possible target if you'd compiled in fault injection. That is, a BPF program can never crash the kernel, but the kernel could consult a BPF program to decide whether to crash itself, assuming the kernel supports conditionally crashing itself.

kernel-corrupting BFF programs?

Posted Nov 30, 2017 7:28 UTC (Thu) by mjthayer (guest, #39183) [Link]

Perhaps a way for the person loading the BPF programme to say that they really wanted to do that, rather than having to say at kernel build time what can be overridden? I would expect this to be interesting to a very different audience to typical BPF users, and one that can live with the occasional kernel crash.

disable this in production kernels, please

Posted Nov 30, 2017 9:00 UTC (Thu) by sasha (guest, #16070) [Link] (1 responses)

In my experience, out-of-memory linux machine always needs reboot, especially if it uses NFS homes. Allowing a BPF program to bring the computer completely to unusable state does not differ from allowing it to crash kernel.

I understand that "out-of-memory linux machine always needs reboot" is the point they are trying to fix, but the way chosen looks strange for me. I think there should be a way to disable this feature for production kernels.

disable this in production kernels, please

Posted Dec 1, 2017 3:37 UTC (Fri) by josefbacik (subscriber, #90083) [Link]

It's a config option, I suspect most distros will have it off. And I don't just want it for memory, in fact I originally did it for testing NBD's handling of getting EINTR back from kernel_sendmsg since that path is hard to test in a targeted way. I've used it for memory stuff as well, cases where we do GFP_ATOMIC allocations and expect that it may fail and should be handling the failure gracefully but weren't, so it was handy for that as well. There are a myriad of reasons to use this beyond just randomly OOM'ing in certain places.

BPF-based error injection for the kernel

Posted Nov 30, 2017 9:28 UTC (Thu) by error27 (subscriber, #8346) [Link] (2 responses)

"Just making kmalloc() fail universally is unlikely to be helpful".

I actually wrote a patch once where you would boot, then write to a sysfs file and after that every kmalloc() would fail the first time you call it. You had to start applications five times before they had triggered the allocation failures and were able to run but it all worked surprisingly well.

BPF-based error injection for the kernel

Posted Nov 30, 2017 19:36 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (1 responses)

How did you identify "it"? Call stack?

BPF-based error injection for the kernel

Posted Dec 1, 2017 20:20 UTC (Fri) by error27 (subscriber, #8346) [Link]

I renamed kmalloc() to kmalloc_real() and then made a macro named kmalloc() to call kmalloc_real(). The macro returned NULL the first time it was called when the sysfs flag was set.

Something like this (written on phone).

#define kmalloc(size, gfp) ({
Void *ret;
Static int tested;
If (sysfs && !tested) {
Ret = NULL;
Tested = 1;
} Else {
Ret = kmalloc_real(size, gfp);
}
Ret;
})