The state of realtime Linux

By Jonathan Corbet
June 15, 2010

Since 2005, the realtime preemption project has worked to provide deterministic response times in stock Linux kernels. Over that time, though, it has come to appear that there is no guaranteed latency with regard to when all of this code will actually be merged. At LinuxTag 2010, realtime hacker Thomas Gleixner talked about the state of this patch set, what's coming, and, yes, when it might actually be merged in its entirety. Don't hold your breath.

In truth, the realtime preemption code has been going into the mainline, piece by piece, for years. Some recently-merged pieces include threaded interrupt handlers and the sleeping spinlock precursor patches. The threaded handlers make a number of driver tasks simpler (regardless of any realtime needs) by eliminating much of the need for tasklets and workqueues. They have also proved to be useful in providing support for some strange i2c-attached interrupt controller hardware. The spinlock changes do not affect the generated code (in mainline kernels), but they are useful for annotating the type of each lock.

Recent movements of code into the mainline notwithstanding, the realtime patchset isn't getting any smaller. It seems that the realtime developers have an interesting problem: the realtime kernel is a really good place to try out a wide variety of new features. So, despite the fact that code occasionally moves to the mainline, new stuff keeps getting added to the realtime tree.

This tree's attractiveness for the testing of new code comes from the fact that it tends to reveal scalability problems much more quickly than mainline kernels do. The extra preemptibility offered by this kernel comes at a cost: the price for lock contention is much higher. So the realtime tree shows scalability issues at lower levels of contention than non-realtime kernels. The important point is that the scalability bottlenecks encountered by realtime kernels are not unique to realtime; they just come sooner than the same bottlenecks will show up with the mainline. So realtime kernels can be used to look forward to the problems that the mainline kernel will be experiencing next year.

Thus, for example, realtime kernels exhibit scalability problems in the virtual filesystem layer that are otherwise only seen in big-iron torture-test labs. That makes them useful for testing features, and especially useful for testing scalability improvements. That is why code like the VFS scalability patch set currently makes its home in that tree. Eventually, most of these pieces will get merged into the mainline. Thomas says that it will all be in by the end of the year - but which year is not something he is willing to commit to.

The next patch set to move to the mainline might be Peter Zijlstra's memory management preemptibility series, which solves some long latencies in the memory management code; the current plan is to push these patches for 2.6.36. Another bit of code which might make the move is an option to force all drivers to use threaded interrupt handlers regardless of whether they explicitly request them. This option would almost certainly not be turned on for most production kernels, but it makes the testing of drivers with involuntarily threaded handlers easier.

The realtime tree also suffers from a few unsolved problems. One of them is latencies in the slab allocator, which runs with preemption disabled for long periods of time. The SLQB allocator had raised hopes for a while, but it appears that it will not be pushed for merging anytime soon. So the realtime hackers have to find a way to fix one of the existing allocators, or give up and write a slab allocator of their own. Thomas noted that there are still a few letters left in the SL?B namespace, so there might just be an SLRB in the future. That is all quite vague at this point, though; Thomas admitted that he has no idea how this problem will be resolved.

Another ongoing problem is the increasing use of per-CPU data. In throughput-oriented environments, per-CPU data increases scalability by eliminating contention between processors. But use of per-CPU data necessarily requires that preemption be disabled while the data is being manipulated; to do otherwise is to risk that the process working with that data will be preempted or moved to another processor, making a mess of things. Disabling preemption is anathema in an environment where everything is always supposed to be preemptable, though. So the realtime patch set currently puts a lock around per-CPU data accesses, eliminating the preemption problem but wrecking scalability. Here, too, a real solution has not yet been found.

Thomas finished with a bit of talk about testing of the realtime tree. Quite a bit of "enterprise-class" testing is done in the well-furnished labs at companies like IBM and Red Hat. At the embedded level, the Open Source Automation Development Lab has a modest testing lab of its own. But there's another interesting source of testing: the Linux audio community has been enthusiastic in its use of the realtime kernel and has helped find a number of issues. There's also a growing set of tools maintained in the rt-tests collection.

All told, the picture painted by Thomas was one of a healthy project, even if we still don't know when it will all get into the mainline. Even in the realtime world, there are things we simply have to wait for.

Index entries for this article
Kernel	LinuxTag/2010
Kernel	Realtime

The state of realtime Linux

Posted Jun 17, 2010 15:54 UTC (Thu) by shane (subscriber, #3335) [Link] (2 responses)

So the realtime patch set currently puts a lock around per-CPU data accesses, eliminating the preemption problem but wrecking scalability. Here, too, a real solution has not yet been found.

It sounds like what is needed is a way to direct the kernel that a task can be pre-empted but not manipulated in a way that is unsafe for pre-CPU state.

The state of realtime Linux

Posted Jun 17, 2010 16:11 UTC (Thu) by nevets (subscriber, #11875) [Link] (1 responses)

It is not the state of the task that we are worried about, but the state of the per-cpu data it is modifying.

If you have some per-cpu data that is never touched in interrupt context, all you need to do to protect it is to disable preemption. This is the same as a lock, since it makes the modification of the data serialized. No one else can modify because you must be on the CPU to modify it and no one can preempt the current user that is modifying the data.

Now if the task is preempted, another task can get on the CPU and modify the data breaking the serialization of the previous task.

Now in PREEMPT_RT, instead we add a special per_cpu_locked() variable. When you grab the per-cpu variable, you grab the lock for that variable (per-cpu). Now you can be preempted, and even migrated. But you will always be touching the data on the original CPU. If someone else tries to touch that variable, it must first grab the per-cpu lock, which will cause it to block until the first task is finished with it.

This solves the serialization, but hurts scalability, since a box of 2048 CPUs can have a bit of cacheline bouncing if tasks are constantly migrating after grabbing a per-cpu variable.

The state of realtime Linux

Posted Jun 17, 2010 18:19 UTC (Thu) by dlang (guest, #313) [Link]

it sounds like you may want to be a little less eager to migrate processes (or even say that you don't migrate a process that is holding a per-cpu lock)

The state of realtime Linux

Posted Jun 18, 2010 7:52 UTC (Fri) by georg.wassen (guest, #63868) [Link] (1 responses)

> Even in the realtime world, there are things we simply have to wait for.
That's definitely a canditate for the next 'Quotes of the week' (but, sadly, that would be quoting yourself...)

The state of realtime Linux

Posted Jun 25, 2010 15:41 UTC (Fri) by Duncan (guest, #6647) [Link]

That one I sort of expected... after this one, which snuck up and slapped this unwary reader along side the head with a rather large large dead fish![1] =:^)

"""Over that time, though, it has come to appear that there is no guaranteed latency with regard to when all of this code will actually be merged."""

Duncan

[1] http://en.wikipedia.org/wiki/The_Fish-Slapping_Dance