BPF-based error injection for the kernel
As an example of error handling in the kernel, consider memory allocations. There are few tasks that can be performed in kernel space without allocating memory to work with. Memory allocation operations can fail (in theory, at least), so any code that contains a call to a function like kmalloc() must check the returned pointer and do the right thing if the requested memory was not actually allocated. But kmalloc() almost never fails in a running kernel, so testing the failure-handling paths is hard. It is probably fair to say that a large percentage of allocation-failure paths in the kernel have never been executed; some of those are certainly wrong.
The kernel gained a fault-injection framework back in 2006; it can be used to test error-handling paths by causing memory allocation requests to fail. Just making kmalloc() fail universally is unlikely to be helpful, though; execution will almost certainly never make it to the code that the developer actually wants to test. The fault-injection framework has some parameters to control which allocation attempts should fail, but the mechanism is somewhat awkward to use and is not as flexible as one might like. So the number of developers actually using this framework is small.
Fully generalizing fault injection would be a lot of work. A developer may want to see what happens when a specific kmalloc() call fails, but perhaps only when it is invoked from a specific call path or when some other condition is true. It has not been possible in the past to describe these conditions to the framework but, in recent years, a new technology has come along that can provide the required flexibility: the BPF virtual machine.
It is already possible to attach a BPF program to an arbitrary function using the kprobe mechanism. Such programs are useful for information gathering, but they cannot be used to affect the execution of the function they are attached to. Thus, they are not usable for error injection. That situation changes, though, with this patch set from Josef Bacik, which is intended to turn BPF into a generalized mechanism for the injection of errors into a running kernel.
The core of the new mechanism is a BPF-callable function called bpf_override_return(). If a BPF program attached to a kprobe calls this function, the execution of the function the program is attached to will be shorted out and its return value will be replaced with a value supplied by that BPF program. The patch set contains an example in the form of a test program:
SEC("kprobe/open_ctree") int bpf_prog1(struct pt_regs *ctx) { unsigned long rc = -12; bpf_override_return(ctx, rc); return 0; }
This function can be compiled to BPF using the LLVM compiler. The SEC() directive at the top specifies that this function should be attached to a kprobe placed at the beginning of open_ctree(), a function in the Btrfs filesystem implementation. After the placement of this probe and the attachment of the BPF function, a call to open_ctree() will be overridden and the value -12 (-ENOMEM) will be returned. This is a relatively simplistic example, of course; it is expected that many uses will require more sophisticated BPF programs to narrow down the set of situations where the injection will occur.
This patch set had been through several revisions and appeared ready for inclusion into the mainline; it had even been applied to the networking tree for the 4.15 merge window. Things came to a halt, though, when Ingo Molnar blocked the progress of this patch set out of worries that it violated one of the basic promises behind the BPF virtual machine and could destabilize the kernel:
After some discussion, a solution was agreed to: BPF programs would retain the ability to override kernel functions, but only for functions that have been specifically marked to allow this to happen. A new macro called BPF_ALLOW_ERROR_INJECTION() was introduced; it can be used to add the required annotation to a function. See, for example, this patch adding the marking for open_ctree(). Molnar suggested some additional conditions — only functions whose return value cannot crash the kernel should be annotated, and the override function should only change integer error values — but nothing enforces those rules in the current patch set.
Bacik's patch set only marks that one function; it is not clear whether those markings will be added in any quantity to the mainline kernel, or whether they will, instead, be maintained as private patches by the developers who use them. One can imagine that there could be some resistance to marking up the mainline in this way. But, on the other hand, there would be value in marking functions like kmalloc() to enable the development of generic tools that can be used to test specific allocation-error handling paths.
That question is only likely to be resolved once the mechanism is in place
and patches marking functions for error injection start to appear.
Meanwhile, the objections to the core mechanism have been addressed, and
its path into the mainline appears to be clear. It has missed the 4.15
merge window, though, so it will almost certainly have to wait until 4.16.
Index entries for this article | |
---|---|
Kernel | Development tools/Kernel debugging |
Kernel | Fault injection |
Posted Nov 29, 2017 20:12 UTC (Wed)
by darwish (guest, #102479)
[Link] (9 responses)
Posted Nov 29, 2017 21:05 UTC (Wed)
by SEJeff (guest, #51588)
[Link] (1 responses)
Posted Dec 1, 2017 1:02 UTC (Fri)
by JdGordy (subscriber, #70103)
[Link]
Posted Nov 29, 2017 22:00 UTC (Wed)
by NAR (subscriber, #1313)
[Link] (4 responses)
Posted Nov 30, 2017 19:55 UTC (Thu)
by k3ninho (subscriber, #50375)
[Link] (3 responses)
I applaud the spirit of your sentence, but the specific meaning of the words you used is wrong. A unit test is an isolatable unit which should never touch outside and can be run in embarrassingly-parallel concurrent fashion; integration tests are the counterpart which validate that the functional units integrate correctly with the rest of the program and system. This matches up nicely with the design paradigm of 'separation of interfaces and implementation' which has benefits in refactoring and deprecating old implementation details while presenting the same interface to consumers (plus optionally allowing different versions of your interfaces to be in use, maintained and stable). Having the right words for these situations helps clear thinking when designing and implementing systems.
A unit test shouldn't need a lot of state supplied by the test harness; shouldn't need to reach outside its function, procedure or method to show that it's doing what you believe it should be doing. There's a 'bad code smell' that comes with complex units. You know those times when they're flaky because they're vulnerable to side effects? I typically advise people with complex unit tests to refactor and simplify the tests as well as simplifying the underlying code. How can you not like simpler code plus you get embarrassingly-parallel concurrency for running these tests when they're little atoms of logic?
> I find it strange if it's not between #ifdef DEBUG and #endif directives...
K3n.
Posted Dec 1, 2017 13:58 UTC (Fri)
by hkario (subscriber, #94864)
[Link] (2 responses)
that's the theory, the practice is that they're needed to test interfaces and sometimes even code designed and written two decades ago
theoretical purism also doesn't help in the initial refactoring of the code - how can you refactor code if you can't know if you're not changing its behaviour because you don't have test cases for it?
Posted Dec 5, 2017 16:37 UTC (Tue)
by k3ninho (subscriber, #50375)
[Link] (1 responses)
Oh, so you're talking about integration testing, which is something I'm keen that you know to be distinct from unit testing? Conflating these two things is bad for your code!
> how can you refactor code if you can't know if you're not changing its behaviour because you don't have test cases for it?
One of my favourite books is Michael C. Feathers' "Working Effectively with Legacy Code" which calmly and politely says, again and again:
*: ...and write more tests as you come to understand it better
You might sample existing data on the live system and replay it, but that's unlikely to protect against embarrassing edge cases. You might instrument your existing code and log its data flows while live, then reproduce them in your test framework, but it loses validity over time. The shortest route to a reliable system when you don't understand it is, from experience, the workflow which loops through "wrap with tests as best you understand it / change the code / ensure the tests continue to pass."
K3n.
Posted Dec 5, 2017 17:22 UTC (Tue)
by hkario (subscriber, #94864)
[Link]
if you don't have units that can be tested in isolation (most of kernel code) then by definition you don't have unit tests :)
> One of my favourite books is Michael C. Feathers' "Working Effectively with
that's what I was had in mind
Posted Nov 30, 2017 0:32 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted Nov 30, 2017 7:28 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
Posted Nov 30, 2017 9:00 UTC (Thu)
by sasha (guest, #16070)
[Link] (1 responses)
I understand that "out-of-memory linux machine always needs reboot" is the point they are trying to fix, but the way chosen looks strange for me. I think there should be a way to disable this feature for production kernels.
Posted Dec 1, 2017 3:37 UTC (Fri)
by josefbacik (subscriber, #90083)
[Link]
Posted Nov 30, 2017 9:28 UTC (Thu)
by error27 (subscriber, #8346)
[Link] (2 responses)
I actually wrote a patch once where you would boot, then write to a sysfs file and after that every kmalloc() would fail the first time you call it. You had to start applications five times before they had triggered the allocation failures and were able to run but it all worked surprisingly well.
Posted Nov 30, 2017 19:36 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Posted Dec 1, 2017 20:20 UTC (Fri)
by error27 (subscriber, #8346)
[Link]
Something like this (written on phone).
#define kmalloc(size, gfp) ({
kernel-corrupting BFF programs?
kernel-corrupting BFF programs?
kernel-corrupting BFF programs?
kernel-corrupting BFF programs?
kernel-corrupting BFF programs?
You might even wrap the BPF_ALLOW_ERROR_INJECTION() annotation macro in a config wrapper so that no prod system has this injector but someone looking to replicate an observed failure can work on it. It's a given that you'd have to be careful that you're running through the exact same logic as the production binary that failed.
kernel-corrupting BFF programs?
> shouldn't need to reach outside its function, procedure or method to show
> that it's doing what you believe it should be doing. There's a 'bad code smell'
> that comes with complex units. You know those times when they're flaky
> because they're vulnerable to side effects? I typically advise people with
> complex unit tests to refactor and simplify the tests as well as simplifying the
> underlying code. How can you not like simpler code plus you get
> embarrassingly-parallel concurrency for running these tests when they're
> little atoms of logic?
kernel-corrupting BFF programs?
0. write tests round the code as it is, as best you understand it*
1. change the code, i.e. refactor or add functionality
2. ensure the tests keep working
kernel-corrupting BFF programs?
> you know to be distinct from unit testing?
> Legacy Code"
kernel-corrupting BFF programs?
kernel-corrupting BFF programs?
disable this in production kernels, please
disable this in production kernels, please
BPF-based error injection for the kernel
BPF-based error injection for the kernel
BPF-based error injection for the kernel
Void *ret;
Static int tested;
If (sysfs && !tested) {
Ret = NULL;
Tested = 1;
} Else {
Ret = kmalloc_real(size, gfp);
}
Ret;
})