Tracing infrastructures

[Posted September 19, 2006 by corbet]

Sometimes, things just do not go according to plan. Mathieu Desnoyers is the current maintainer of the Linux Trace Toolkit, a kernel event tracing package which has, despite a significant user base, remained outside of the mainline for many years. He recently posted a new LTT release with the following introduction:

Following an advice Christoph gave me this summer, submitting a smaller, easier to review patch should make everybody happier.

What resulted was a thread of hundreds of messages, many of which could be considered to be impolite even by linux-kernel standards. Clearly, LTT has hit a nerve - especially surprising given that the points of real disagreement are minimal.

At times, people have questioned whether the kernel needs any sort of tracing facility at all. That particular question would appear to have been resolved (affirmatively); the disagreement now would appear to be whether that tracing should be static or dynamic. Static tracing works by putting explicit tracepoints into the source code (they look like function calls); the tracing framework can then enable or disable those tracepoints at run time as desired. In a dynamic system, instead, tracepoints are injected into a running system, usually in the form of a breakpoint instruction.

The kernel already has dynamic tracing in the form of KProbes; LTT, instead, uses (primarily) a static model. So the biggest question, at least on the surface, has been over whether Linux needs a static tracing package in addition to the dynamic mechanism it has now. This debate revolves around a few points:

Overhead, part 1: when tracing is not being used (the normal situation on most systems), dynamic tracepoints clearly have lower overhead: they do not exist at all. For all the work that is done to make static tracepoints be fast when they are not in use, they still exist, and will still have a (small) runtime cost.
Overhead, part 2: when tracing is being used, static tracepoints will tend to be faster. The breakpoint mechanism used by KProbes can (in the current implementation) take about ten times as many CPU cycles as a static tracepoint. There are projects in the works (djprobes, in particular) which can reduce this overhead considerably; Ingo Molnar also, as part of the discussion, posted a series of patches which cut the KProbes overhead roughly in half.
One might wonder why overhead concerns people in this case. Tracing is often used to track frequent events, so a higher tracepoint overhead can slow things down in a noticeable manner. More to the point, though, heavyweight tracepoints can change the timing of events, leading to the dreaded "heisenbugs" which vanish when the developer actively looks for them.
Maintenance overhead: some developers are concerned that the addition of static tracepoints to the kernel code will complicate the maintenance of that code. Tracepoints clutter the code itself, and they must continue to work into the indefinite future. In a sense, each one can be thought of as a little system call which, once placed, cannot be changed. Developers also worry that there will be pressure to add increasing numbers of these tracepoints over time.
On the other hand, dynamic tracepoints impose a different sort of overhead: everybody who is interested in a set of tracepoints must take on the maintenance of those tracepoints. As the kernel changes, the tracepoints will need to move around to follow those changes. Keeping a set of dynamic tracepoints current can, in fact, be a nontrivial and tiresome job. Tools like SystemTap help in this regard, but they are far from a complete solution at this time. Static tracepoints placed into the kernel code, instead, will continue to work as that code changes.
Flexibility: dynamic tracepoints can be placed anywhere at any time, but static tracepoints require, at a minimum, a source code edit, rebuild, and reboot. Dynamic tracepoints can more easily support runtime filtering of events as well. On the other hand, static tracepoints currently are better at accessing local variables.
Architecture support: KProbes are not currently implemented on all architectures, so they are not available to all Linux users. Static tracepoints tend to require less architecture-specific trickiness, and are thus easier to support universally. On the other hand, it has been argued, the addition of static tracepoints would take away much of the incentive architecture maintainers might have to make KProbes work.

Reading through the discussion, one could be forgiven for going into a state of complete despair. The interesting thing, though, is that the level of disagreement is lower than one might think. There is a near consensus among the participants that there is a place for both static and dynamic tracepoints. Static tracing of events of interest will help a lot of people - user-space developers and system administrators, not just kernel developers - understand what is going on in the system. Making all of these people figure out where to place, for example, a tracepoint to report scheduler changes in a specific kernel makes things a lot harder.

The key point, however, is that the value of the static point is not really its static placement, but the fact that it is a clear indicator of where the tracepoint needs to be. So it has been suggested that an answer which might please everybody is to insert "markers" rather than tracepoints. These markers, which could live in a different section of the kernel image, are simply signs pointing out where a dynamic tracepoint should be inserted, should the need exist. To this end, Mathieu has posted a simple marker patch; it was promptly fired upon for implementation issues, but there are few people who are opposed to the idea.

So markers may well be the way this work goes forward. If the LTT code could be reworked around the marker concept, then the way might be clear for a discussion of what else needs to happen before that code could be merged (there are a number of issues to talk about there which have been, thus far, overshadowed by the current debate). After suitable consideration, a carefully-selected set of markers/tracepoints could be added to the mainline kernel, enabling anybody to easily hook into and monitor well-known events. Once the smoke clears, there might just be a viable solution which will please almost everybody.

Index entries for this article
Kernel	Development tools/Kernel tracing
Kernel	KProbes

Tracing infrastructures

Posted Sep 21, 2006 5:49 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

as I read the flamefest I thought the argument was that dynamic tracepoints were better at seeing and getting hold of variables.

the dynamic tracepoint has access to the entire system environment at the time the trace event takes place, but the static tracepoint has to specify (in the compiled source) exactly what variables to pay attention to. if it specifies too many it creates additional work for GCC to store and pass variables that don't mean anything.

Tracing infrastructures

Posted Sep 21, 2006 14:07 UTC (Thu) by ajax (guest, #7251) [Link] (1 responses)

Global variables, yes. Local variables tend to move back and forth between registers and memory so it is hard, at any point in the code, to determine where a local variable resides at that point.

Tracing infrastructures

Posted Sep 22, 2006 10:12 UTC (Fri) by nix (subscriber, #2304) [Link]

Describing that is DWARF2's job, of course. GCC's generation of DWARF2 info is not perfect, but it's good enough most of the time.

You can't require an unchanging interface to changeable internals.

Posted Sep 21, 2006 7:34 UTC (Thu) by xoddam (subscriber, #2322) [Link] (4 responses)

> Tracepoints clutter the code itself, and they must continue to
> work into the indefinite future. In a sense, each one can be
> thought of as a little system call which, once placed, cannot
> be changed.

That argument looks absurd to me. A tracing interface is intimately
tied to kernel internals. There is no way it should be considered
part of the "Don't break userspace" contract.

The 'marker' idea looks very sound to me. A config option at
compile time could choose static tracepoints, dynamic tracepoints
or no tracepoints at all. Maintaining a set of tracepoints 'out
of tree', when the tracepoints are plainly inside the kernel,
makes far *less* sense than eg. an out-of-tree device driver.

You can't require an unchanging interface to changeable internals.

Posted Sep 21, 2006 19:56 UTC (Thu) by AJWM (guest, #15888) [Link] (2 responses)

> A config option at
> compile time could choose static tracepoints, dynamic tracepoints
> or no tracepoints at all.

Yes, my thoughts exactly on reading this article. It seems to be something that is crying out for a config option or two.

There's still the argument that it's that much more code to maintain, but any arguments about runtime effects would be decided by whoever does the compile.

You can't require an unchanging interface to changeable internals.

Posted Sep 22, 2006 0:57 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

the problem is that if it has a significant runtime effect it won't be turned on when you need it (production systems running distro kernels)

this needs to be something that redhat (and others) can leave on all the time so that when there are problems the tools can be used.

if you have to recompile the kernel and reboot your production system into a lower-performing kernel for days to weeks until you can duplicate the problem you just are not going to do so.

however if you can run for days or weeks with normal performance, and then when teh problem kicks in load up the tracer to capture what's going on for a bit before you reboot to get things back up again, you have a tool that can be used.

You can't require an unchanging interface to changeable internals.

Posted Sep 28, 2006 21:57 UTC (Thu) by efexis (guest, #26355) [Link]

Can't we learn anything from SMP Alternatives? Ie, stick a couple of NOP's in the code big enough to replace it with a CALL instruction when you want to add a tracepoint there. On 64bit systems I guess the CALL instruction is going to be longer (including the 64bit address), so maybe a JMP $+9 and a few NOP's would be quicker than just the NOP's (I haven't looked into processor instruction timings for a -long time-).

I don't see the issue.

Posted Sep 28, 2006 21:38 UTC (Thu) by jd (guest, #26381) [Link]

We already have vast numbers of wrappered calls to the LSM, so clearly people don't object THAT strenuously to wrappered calls in the code. (In fact, if the static tracing were implemented as a LSM module, you could even use the SAME wrappered calls and not need to add a damn thing that isn't already being maintained anyway.)

For that matter, we've a bazillion wrappered calls to BUG() and Torvalds-knows-what else. I see a far stronger cause for objecting to multiple independently-maintained wrappers to trivial, highly specific operations. It would seem to make much more sense to have a single meta-macro that allowed ANY of the assorted tools (lsm, static probes, kernel debugging info, jelly babies, etc) to use ANY of the points to make decisions, based on their specific requirements and the configuration at the time.

This would put all of the complexity into a single meta-macro (so eliminating almost all maintenance issues) and would provide a far wider range of sampleable points for future updates.

In general, added complexity is a Bad Thing. However, if by adding something, you provide a general, unified solution to N existing problems that previously needed N independent solutions, you have actually reduced complexity, which is a Good Thing.

I would not want LTT to be a mainstream kernel component if it actually added to the complexity of the system, but since I see no reason for complexity to be added and ways in which complexity can be removed, I believe LTT in the mainstream kernel is not only achievable but has the potential to cut a lot of crud out. To me, that would be wonderful.

Tracing infrastructures

Posted Sep 21, 2006 12:09 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Thanks for a coherent summary. Again, the subscription pays for itself.

Tracing infrastructures

Tracing infrastructures

Tracing infrastructures

Tracing infrastructures

You *can't* require an unchanging interface to changeable internals.

You *can't* require an unchanging interface to changeable internals.

You *can't* require an unchanging interface to changeable internals.

You *can't* require an unchanging interface to changeable internals.

I don't see the issue.

Tracing infrastructures

You can't require an unchanging interface to changeable internals.

You can't require an unchanging interface to changeable internals.

You can't require an unchanging interface to changeable internals.

You can't require an unchanging interface to changeable internals.