Realtime preemption and read-copy-update

[Posted March 30, 2005 by corbet]

Ingo Molnar's massive realtime preemption patch is an attempt to bring near-realtime response to the stock Linux kernel. It works by making almost everything in the kernel preemptible. Spinlocks turn into preemptible mutexes; interrupt handlers get moved into preemptible kernel threads, etc. The result is a major change in how the scheduling of kernel code is done and quick response to external events. This work has been quieter in recent times, but it has not stalled by any means.

When LWN last looked at the realtime preemption patch, one of the remaining rough spots was its interaction with the read-copy-update (RCU) mechanism. RCU, remember, encapsulates a conceptually simple (though a bit more gnarly in the implementation) technique. A resource of interest (a routing table entry, say) is referenced by a pointer. When that resource must be changed, a copy is made and the changes are done there; the pointer is then directed at the new copy. At some future, safe time, the old version can be freed. Linux RCU works by requiring that all accesses to RCU-protected data structures be atomic; with that constraint, a "safe time" can be defined as "after every processor on the system has scheduled." Since scheduling while holding a reference to an RCU-protected structure is against the rules, any such structure which was made inaccessible before all processors schedule cannot be referenced by any processor afterward.

Since accesses to RCU-protected structures must be atomic, the RCU locking function (rcu_read_lock()) disables preemption. But disabling preemption is exactly what the realtime preemption patch is trying to get away from, so something had to give. Ingo had solved this problem by requiring that all RCU users identify an explicit lock which protects the structures in question, and modifying the RCU locking functions to take that lock as a parameter. This approach was never optimal. It caused the creation of a whole new family of new RCU functions to cope with every type of lock that might be used, and, simultaneously, decreased the flexibility of the RCU read locking mechanism. And, to a great extent, it simply replaced RCU with more traditional locking which, while it works, does not have the scalability advantages which were the motivation for RCU in the first place.

The RCU issue was clearly on Ingo's mind:

If PREEMPT_RT is merged into the upstream kernel then it will (at least initially) be at a status similar to NOMMU: it will be tolerated as long as it causes no 'drag' on the main code. The RCU API variants i introduced clearly violated this requirement, and were my #1 worry wrt. upstream mergability.

So Ingo was pleased when RCU creator Paul McKenney proposed some approaches for making RCU and realtime preemption work together. Paul's message goes through a series of increasingly complex solutions, and is worth reading in its own right. The core idea, however, is that, in a fully preemptible world, RCU cannot depend on atomic access to data structures, and thus cannot use the "all processors have scheduled" heuristic to know that the time has come to execute a given set of RCU cleanup functions. So the tracking of code executing within RCU critical sections must be made more explicit. Paul's solutions used a reader/writer lock for that purpose, but the approach taken in Ingo's latest realtime preemption patch is a little different.

The code executed to go into an RCU-protected section now looks like this (when configured for realtime preemption):

    void rcu_read_lock(void)
    {
	if (current->rcu_read_lock_nesting++ == 0) {
		current->rcu_data = &get_cpu_var(rcu_data);
		atomic_inc(&current->rcu_data->active_readers);
		smp_mb__after_atomic_inc();
		put_cpu_var(rcu_data);
	}
    }

The idea is simple: a per-CPU count of processes in RCU critical sections is kept. When a process goes into a critical section, a pointer to the current CPU's counter is stored with the task information, so that the right counter will be decremented later on. There is also a per-process variable which keeps track of RCU section nesting. No further work needs to be done before the process can access the protected structure; in particular, no locks are acquired.

When the process exits the critical section, the process is reversed: the nesting count is decremented. When that count goes to zero, the per-CPU count is decremented as well. If the per-CPU count drops to zero, then that processor is deemed to have "quiesced," with no processes running within RCU critical sections. Once all CPUs have quiesced in this way (as tracked by a bitmask of processors in the system), all RCU cleanup functions queued before their respective processors quiesced can be called.

This scheme restores the core RCU functionality, allowing lock-free access to fast-path data structures. It also retains the current RCU API, with the result that the realtime preemption patch becomes significantly less intrusive. It is not a perfect implementation, however. It requires that each CPU regularly find itself with no processes executing within RCU critical sections. Since these sections are now preemptible, the "quiet" times could be quite far apart on heavily-loaded systems. While the system is waiting for a processor to quiesce, the RCU callback structures for the cleanup functions will continue to accumulate, to the point that quite a bit of memory could be used before the cleanup actually happens. For the realtime case, this tradeoff is acceptable: latency, not memory use, is the most important factor. Since the existing RCU algorithm is used when realtime preemption is not configured in, everybody should be happy. In practice, further work may be required; in particular, it may be necessary to find a way to force RCU cleanup when the system gets low on memory. Meanwhile, however, the realtime preemption patch appears to have gotten past one more major hurdle on its way toward possible inclusion into the mainline.

Index entries for this article
Kernel	Latency
Kernel	Preemption
Kernel	Read-copy-update
Kernel	Realtime

Realtime preemption and read-copy-update

Posted Apr 1, 2005 5:56 UTC (Fri) by bronson (subscriber, #4806) [Link]

Wow. Just when I thought Linux was getting good enough, that it has all the features I need for the forseeable future, along comes something like this that makes me say, I want I want I want!