Eliminating tasklets

[Posted June 24, 2007 by corbet]

Tasklets are a deferred-execution method used within the kernel; they were added in the 2.3 development series as a way for interrupt handlers to schedule work to be done in the very near future. Essentially, a tasklet is a function to be called (with a data pointer) in a software interrupt as soon as the kernel is able to do so. In practice, a tasklet which is scheduled will (probably) be executed when the kernel either (1) finishes running an interrupt handler, or (2) returns to user space. Since tasklets run in software interrupt mode, they must be atomic - no sleeping, references to user space, etc. So the work that can be done in tasklets is limited, but they are still heavily used within the kernel.

There is another problem with tasklets: since they run as software interrupts, they have a higher priority than any process on the system. Tasklets can, thus, create unbounded latencies - something which the low-latency developers have been long working to eliminate. Some efforts have been made to mitigate this problem; if the kernel has a hard time keeping up with software interrupts it will eventually dump them into the ksoftirqd process and let them fight it out in the scheduler. Specific tasklets which have been shown to create latency problems - the RCU callback handler, for example - have been made to behave better. And the realtime tree pushes all software interrupt handling into separate processes which can be scheduled (and preempted) like anything else.

Recently, Steven Rostedt came up with a different approach: why not just get rid of tasklets altogether? Since the development of tasklets, the kernel has acquired other, more flexible ways of deferring work; in particular, workqueues function much like tasklets, but without many of the disadvantages of tasklets. Since workqueues use dedicated worker processes, they can be preempted and do not present the same latency problems as tasklets; as a bonus, they provide a process context which allows work functions to sleep if need be. Workqueues, argues Steven, are sufficiently capable that there is no need for tasklets anymore.

So Steven's patch cleans up the interface in a few ways, and turns the RCU tasklet into a separate software interrupt outside of the tasklet mechanism. Then the tasklet code is torn out and replaced with a wrapper interface which conceals a workqueue underneath. The end result is a tasklet-free kernel without the need to rewrite all of the code which uses tasklets.

There is little opposition to the idea of eliminating tasklets, though it is clear that quite a bit of performance testing will be required before such a change could go into the mainline kernel. But almost nobody likes the wrapper interface; it is just the sort of compatibility glue that the "no stable internal API" policy tries to avoid. So there is a lot of pressure to dump the wrapper and simply convert all tasklet users directly to workqueues. Needless to say, this is a rather larger job; it's not surprising that somebody might be tempted to try to avoid it. In any case, the current patch is good for testing; if the replacement of tasklets will cause trouble, this patch should turn it up before anybody has gone to the trouble of converting all the tasklet users.

Another question needs to be answered here, though: does the conversion of tasklets to workqueues lead to a better interrupt handling path, or should wider changes be considered? Rather than doing a context switch into a workqueue process, the system might get better performance by simply running the interrupt handler as a thread as well. As it happens, the realtime tree has long done exactly that: all (OK, almost all) interrupt handlers run in their own threads. The realtime developers have plans to merge this work within the next few kernel cycles.

Under the current plans, threaded interrupt handlers would probably be a configuration-time option. But if developers knew that interrupt handlers would run in process context, they could simply do the necessary processing in the handler and do away with deferred work mechanisms altogether. This approach might not work in every driver - for some devices, it might risk adding unacceptable interrupt response latency - but, in many cases, it has the potential to simplify and streamline the situation considerably. The code would not just be simpler - it might just perform better as well.

Either way, the removal of tasklets would appear to be in the works. As a step in that direction, Ingo Molnar is looking for potential performance problems:

So how about the following, different approach: anyone who has a tasklet in any performance-sensitive codepath, please yell now. We'll also do a proactive search for such places. We can convert those places to softirqs, or move them back into hardirq context. Once this is done - and i doubt it will go beyond 1-2 places - we can just mass-convert the other 110 places to the lame but compatible solution of doing them in a global thread context.

This is a fairly clear call to action for anybody who is concerned about the possible performance impact of this change on any particular part of the kernel. If you think some code needs faster deferred work response than a workqueue-based mechanism can provide, now is not the time to defer the work of responding to this request.

Index entries for this article
Kernel	Tasklets
Kernel	Workqueues

Eliminating tasklets

Posted Jun 28, 2007 10:22 UTC (Thu) by rwmj (subscriber, #5474) [Link] (6 responses)

I've always been confused by tasklets (and bh's which preceeded them IIRC).

Can someone explain to me (a dabbler in the kernel at best) why tasklets are needed, and why you can't just execute the work inside the interrupt handler? Or alternatively give an example of work which cannot be done either inside the handler, nor in the context of the process, but needs to go in a tasklet instead?

Rich.

Eliminating tasklets

Posted Jun 28, 2007 12:58 UTC (Thu) by nevets (subscriber, #11875) [Link] (2 responses)

My post about the tasklet-to-workqueue conversion contained a reference to a nice paper http://www.wil.cx/matthew/lca2003/paper.pdf.

Softirqs and tasklets replaced bottom halves, because bottom halves were a large bottle neck on SMP systems. If a bottom half was running on one CPU no other bottom halves could run on any other CPU. It's obvious how these wouldn't scale.

Softirqs and tasklets replaced bottom halves. The difference between softirqs and tasklets, is that a softirq is guaranteed to run on the CPU it was scheduled on, where as tasklets don't have that guarantee. Also the same tasklet can not run on two separate CPUS at the same time, where as a softirq can. Don't confuse the tasklet restriction with that of the bottom halves. Two different tasklets can run on two different CPUs, just not the same one.

Now to answer your question. I can't argue why we have tasklets (I'm trying to get rid of them ;-) but I'll give the best example of why we have softirqs. That's the networking code. Say you get a network packet. But to process that packet, it takes a lot of work. If you do that in the interrupt handler, no other interrupts can happen on that IRQ line. That would cause a large latency to incoming interrupts and perhaps you'll overflow the buffers and drop packets. So the interrupt handler only moves the data off to a network receive queue, and returns. But this packet still needs to be processed right away. Before anything else. So it goes off to a softirq for processing. Now you still allow for interrupts to come in. Perhaps the network interrupt comes in again on another CPU. The other CPU can start processing that packet with a softirq on that CPU, even before the first packet was done processing.

See how this can scale well? But the same tasklet can't run on two different CPUs, so it doesn'h have this advantage. In fact if a tasklet is scheduled to run on another CPU but is waiting for other tasklets to finish, and you try to schedule the tasklet on a CPU that's not currently processing tasklets, it will notice that the tasklet is already scheduled to run and not do anything. So tasklets are not so reliable when it comes to latencies. Hence, why I'm working on getting rid of them, since I don't beleive they accomplish what people think they do.

What about USB networking devices?

Posted Jul 2, 2007 12:52 UTC (Mon) by rankincj (guest, #4865) [Link] (1 responses)

At least one device I know receives network data via "bulk" URBs, and I believe that URB callback functions are run in the hard IRQ context of the USB hub device. Is there a better place than a tasklet to offload the work into in this case?

What about USB networking devices?

Posted Jul 5, 2007 5:43 UTC (Thu) by HalfMoon (guest, #3211) [Link]

All networking drivers, USB or otherwise, hand packets off to be processed in a network tasklet. So no matter what that particular device's driver does, most of the work is already done in a tasklet.

If that USB networking device uses the "usbnet" framework, it won't do much at all in hardirq context. That driver just queues its RX packets to its own tasklet, then immediately resubmits the URB with a new skbuff. (And then the bulk-IN callback can be called immediately with the next packet. For high speed devices, it's quite realistic to get multiple back-to-back packets like that.) So: only "usb stuff" is done in hardirq context, and all the network stuff is done in a tasklet.

There are other USB network drivers which work differently, mostly older drivers for older chips ... thing is, to get the best throughput on a USB network device you need to maintain a queue of packets in the hardware, and only the usbnet framework does that.

Eliminating tasklets

Posted Jun 28, 2007 14:50 UTC (Thu) by arjan (subscriber, #36785) [Link] (2 responses)

if you do all work in the irq handler, latency will suck... remember that irq handlers often run with irq's disabled (and at minimum, it's own irq will not happen even if others might).

Offloading the "hard work" out of the hard irq handler means that you can service the hardware short and sweet, with the lowest latency possible. And that the longer taking work gets batched and processed effectively...

Eliminating tasklets

Posted Jun 29, 2007 21:15 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

But note that the latency that gets improved is the latency of processing interrupts, not the latency of anything a process does. When you consider that a tasklet can't sleep and runs before the CPU returns to regular process stuff, and limit your view to single CPU systems, it isn't as clear that rescheduling interrupt handling for a different time helps any latency. A program that gets interrupted still is not going to get control back until all that interrupt processing is done.

Here's the latency that gets improved: Consider 10 interrupts of the same class that happen one after another. The first 9 take 1ms to service and nobody's urgently waiting for the result. #10 only takes a microsecond, and if you don't respond within 1ms, expensive hardware will go idle. Without tasklets, those interrupts get serviced in order of arrival, so expensive hardware will be idle for 8 ms. With tasklets, you make the code for 1-9 reschedule their work to tasklets (only takes a microsecond to reschedule) and #10 completes in 10 microseconds, soon enough to keep the expensive hardware busy.

Eliminating tasklets

Posted Jun 30, 2007 6:47 UTC (Sat) by dlang (guest, #313) [Link]

with workqueues it's not the case that all the interrupt related processing must be completed before userspace gets a chance to run again. with tasklets that is the case. so the switch means that a userspace program that's waiting for some data doesn't need to keep getting delayed while the spu is handling other incomeing data.