Triggers: less busy busy-waiting

By Jonathan Corbet
August 18, 2008

Kernel code must often wait for something to happen elsewhere in the system. The preferred way to wait is to use any of a number of interfaces to wait queues, allowing the processor to perform other tasks in the mean time. If the kernel code in question is running in an atomic mode, though, it cannot block, so the use of wait queues is not an option. Traditionally, in such situations, the programmer simply must code a busy wait which sits in a tight loop until the required event takes place.

Busy waits are always undesirable, but, in some situations, they become even more so. If the wait is going to be relatively long, it would be better to put the processor into a lower power state. After all, nobody cares if it executes its empty loop at full speed, or, even, whether the loop executes at all. If the wait is running within a virtualized guest, the situation can be even worse: by looping in the processor, a busy wait can actively prevent the running of the code which will eventually provide the event which is being waited for. In a virtualized environment, it is far better to simply suspend the virtual system altogether than to let it busy wait.

Jeremy Fitzhardinge has proposed a solution to this problem in the form of the trigger API. A trigger can be thought of as a special type of continuation intended for use in a specific environment: situations where preemption is disabled and sleeping is not possible, but where it is necessary to wait for an external event.

A trigger is set up in either of the two usual patterns:

    #include <linux/trigger.h>

    DEFINE_TRIGGER(my_trigger);   
    /* ... or ... */
    trigger_t my_trigger;
    trigger_init(&my_trigger);

There is a sequence of calls which must be made by code intending to wait for a trigger:

    trigger_reset(&my_trigger);
    while(!condition)
    	trigger_wait(&my_trigger);
    trigger_finish(&my_trigger);

Triggers are designed to be safe against race conditions, in that if a trigger is fired after the trigger_reset() call, the subsequent trigger_wait() call will return immediately. As with any such primitive, false "wakeups" are possible, so it is necessary to check for the condition being waited for and wait again if need be.

Code which wishes to signal completion to a thread waiting on a trigger need only make a call to:

    void trigger_kick(trigger_t *trigger);

This code should, of course, ensure that the waiting thread will see that the resource it was waiting for is available before calling trigger_kick().

A reader of the generic implementation of triggers may be forgiven for wondering what the point is; most of the functions are empty, and trigger_wait() turns into a call to cpu_relax(). In other words, it's still a busy wait, just like before except that now it's hidden behind a set of trigger functions. The idea, of course, is that better versions of these functions can be defined in architecture-specific code. If the target architecture is actually a virtual machine environment, for example, a trigger can simply suspend the execution of the machine altogether. To that end, there is a new set of paravirt_ops allowing hypervisors to implement the trigger operations.

Jeremy has also created an implementation for the x86 architecture which uses the relatively new monitor and mwait instructions. In this implementation, a trigger is a simple integer variable. A call to trigger_reset() turns into a monitor instruction, informing the processor that it should watch out for changes to that integer variable. The mwait instruction built into trigger_wait() halts the processor until the monitored variable is written to. No more busy waiting is required.

There is a certain elegance to the monitor/mwait implementation, but Arjan van de Ven worries that it may prove to be too slow. So changes to the x86 implementation are possible. There have not been a lot of comments about the API itself, though, so the trigger functions may well make it into the mainline in something close to their current form.

Index entries for this article
Kernel	Triggers