[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

Back to the drawing board for utrace?

By Jake Edge
January 27, 2010

The utrace tracing framework has had a tortuous path towards the mainline, but it always seemed like it was headed that direction. Over the past week or so, things have gotten rather murkier for the mainline inclusion of utrace. Linus Torvalds made a pronouncement that would seem to leave SystemTap without a future in the mainline—something that many had suspected for a while—but also put the future of utrace in doubt. Further discussion may have provided a way forward, but, at least in its current form, mainline utrace seems very unlikely.

The discussion resulted from a request by Frank Ch. Eigler to include utrace into linux-next. That led to a discussion about whether it was ready for linux-next—because it was likely to be merged in the next release cycle—or whether it should spend some time in another tree. Since an earlier version of utrace was in Andrew Morton's -mm tree, that was a potential path. Morton said that utrace "didn't break anything", but:

I still don't think I've seen a really compelling reason for merging it. At least, I wouldn't be able to explain why we did it. But presumably there _are_ such reasons, because it was a lot of development work.

Someone please sell this to us.

Morton also dredged up a response he had gotten from Oleg Nesterov the last time he asked, which listed various potential uses for utrace. In-kernel uses for utrace are important—new features are rarely merged without one—and an earlier utrace merge attempt ran into opposition because it lacked one. This time around, Nesterov and Roland McGrath included a rewrite of the ptrace() system call using utrace as part of the patch submission. It was hoped that rewriting the notoriously ugly ptrace() code using the cleaner utrace API would be the last hurdle for inclusion into the mainline.

But, replacing the guts of the ptrace() call, even though it may clean things up, is controversial. ptrace() is part of the kernel ABI that must be maintained—ugly or not—but cleaning it up is not without its risks, as Morton points out:

ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. This leads one to expect that a rip-out-n-rewrite is a high-risk prospect. So, quite reasonably, one looks for a good reason for taking such risk.

The risk is small, though, according to Eigler, because "this code has been deployed in fedora and rhel for several *years*, with millions of users. It's not some rickety experiment." Eigler also added to Nesterov's list of utrace uses as SystemTap's user-space probing is based on utrace. But SystemTap and one of the other potential uses on that list, namely reworking seccomp to use utrace, are what set Torvalds off:

So if things like system tap and "security models that go behind the kernel by tying into utrace" are the reasons for utrace, color me utterly uninterested. In fact, color me actively hostile. I think that's the worst possible situation that we'd ever be in as kernel people (namely exactly the "do things in kernel space by hiding behind utrace without having kernel people involved")

Torvalds's complaint stems from the fact that utrace provides no user-space interface at all. It is purely an internal kernel API that is meant to be used by kernel code like the ptrace() rewrite, but also for kernel modules, which is part of what worries Torvalds. It provides lots of hooks that can be used by "random crazy out-of-tree crap", but doesn't provide any benefit to user space at all, he said:

If somebody were to argue that "this is a simple series of patches to clean up ptrace and make it possible to strace a debugged process", then that would have been different. That's not what you or others have been doing. You've been pushing exactly the _reverse_ of that, namely how great it is for some random totally new features that I'm convinced aren't even used by a lot of people.

One of the biggest problems with ptrace() is its signal-oriented interface. Programs using ptrace() act as the parent process of the tracee and must use wait() to detect state changes. For that reason, there can only be one ptrace() active for a particular process. So an strace of a program that is being debugged with gdb will not succeed. The ptrace() implementation using utrace would change that, but not directly, as there would still need to be a kernel piece that attached another utrace engine.

An in-kernel gdb "stub" using utrace—floated as an RFC back in November—could provide that kernel piece, but was met with a fair amount of resistance when it was proposed. The limitation that ptrace() imposes is seen as something that could, perhaps should, be lifted, but adding a relatively large, kernel-only API to do that is excessive. As Torvalds puts it:

Maybe somebody would be interested in trying to take the utrace improvements, and scaling down what they promise, and ignoring all input except for "I want to strace and gdb at the same time".

So stop the crazy "new kernel interfaces" crap. Stop the crazy "maybe we can use it for ftrace and generic user event tracing too". Stop the crazy.

The elephant in the room, of course, is SystemTap. It creates, builds, and loads kernel modules for doing its tracing, and uses utrace for the user-space tracing. That model is not popular with most kernel developers, especially for an out-of-tree solution—the APIs that it relies on are far too volatile. SystemTap must be updated when those interfaces change, and all of the previous versions must be maintained so that SystemTap can still be used with older kernels. Because of that, SystemTap may be out-of-sync with development kernels, which makes its utility for kernel hackers quite small.

The utrace proponents are pushing it as something useful in its own right, completely separate from its use in SystemTap, but one gets the sense that many of the kernel developers aren't quite buying that. Ted Ts'o tries to explain his concerns to Eigler

[...] utrace doesn't export a syscall (which is an ABI that we are willing to promise will be stable), but rather a set of kernel API's (which we never promise to be stable), and the fact that there will be out-of-tree programs that are going to be trying to depend on that interface (much like Systemtap does today when it creates kernel modules) [...]

He goes on to compare the situation to that of the NVIDIA graphics drivers, which leads Kyle Moffett to propose a variation on Godwin's law: "As an LKML discussion grows longer, the probability of an unfavorable comparison involving nVidia or Microsoft approaches 1." More to the point, though, Moffett said he was uninterested in SystemTap:

I'm interested in things like the ability to stack gdb with strace, the RFC gdb-stub posted a week ago, etc. None of those abilities would be out-of-tree modules at all [...]

Ts'o sees those features as potentially useful, but points out that they should be submitted with utrace for review. It may be that utrace in its present form does not survive that review:

So what should be reviewed is utrace *plus* these other userland interfaces, which may get critiqued and improved, and utrace patches can be reviewed in light of these new features. But be warned.... if it turns out that only 30% of utrace is only needed to support gdb stacking with strace, etc., the other 70% will likely get ejected and the utrace patches streamlined to support these in-tree users.

Without an in-tree "killer feature" that only utrace can provide, there is going to be resistance to merging such an easily-abused API. Several suggestions were made—notably by Torvalds and Ingo Molnar—to enhance ptrace() itself to support some new features (such as multiple active calls or the ability to read/write more than a word at a time between the two processes), but that would mean scrapping much or all of the utrace work. Nesterov and McGrath, who are the ptrace() maintainers, have been largely silent throughout the discussion, but, previously, they have made it clear that they would much rather work with the utrace-based ptrace() implementation. So it is unclear when or if enhancements to the current code might happen.

Without utrace, SystemTap will have to find other ways to hook user space, but that doesn't really faze the kernel developers—particularly after Torvalds's unequivocal rejection of that approach—as there are other tracing solutions in the pipeline. Ftrace and perf events are slowly building capabilities, and are doing so in-tree. They are likely to grow the needed features to support kernel and user-space tracing a la SystemTap (and DTrace). Molnar specifically invites the SystemTap developers to collaborate:

Also, if any systemtap person is interested in helping us create a more generic filter engine out of the current ftrace filter engine (which is really a precursor of a safe, sandboxed in-kernel script engine), that would be excellent as well. Right now we support simple C-syntax expressions like:
    perf record -R -f -e irq:irq_handler_entry --filter 'irq==18 || irq==19'
More could be done - a simple C-like set of function perhaps - some minimal per probe local variable state, etc. (perhaps even looping as well, with a limit on number of [predicate] executions per filter invocation.)

It is unfortunate, in many ways, that SystemTap has gotten to this point. While it is possible that Torvalds could change his mind, he and other kernel developers find the new tracing features to be "a million times superior" to SystemTap. That could leave Red Hat holding the SystemTap bag for quite some time to come, as it will need to support it for existing, and likely future, RHEL versions. It is interesting to note that this alternate solution, based on Ftrace, etc., is also largely coming out of Red Hat.

It seems possible that utrace will be a casualty here as well. By incorporating features that were needed for SystemTap, and not providing a user-space interface, it tried to both do too much and too little. There are some potential ways forward, but its unclear whether they will be pursued. Torvalds points to the realtime tree as an example of how to get "crazy" things merged:

Yeah, it's taken them years, and they still have out-of-tree stuff. And yeah, they had to change some things to make them more palatable to the mainline kernel - the whole fundamental raw spinlock change is just the most recent example of that.

But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy.

There are definitely lessons here, but the standard ones don't seem to apply. SystemTap and utrace were developed in the open, as free software from the outset, and were fairly often discussed on linux-kernel. SystemTap in particular was regularly criticized, to seemingly no avail. The biggest lesson—and the hardest to learn, especially after a feature has shipped—may be that ignoring the advice and complaints of the kernel developers is likely to come back and bite in the end. It is not terribly surprising, really, but that seems to be what is happening here.

Index entries for this article
KernelUtrace


to post comments

Back to the drawing board for utrace?

Posted Jan 28, 2010 3:35 UTC (Thu) by fuhchee (guest, #40059) [Link] (1 responses)

Many of the points made in the emails quoted here have been
retorted to some extent. It would do an injustice to the
topic to accept these excerpts at face value.

Back to the drawing board for utrace?

Posted Jan 29, 2010 16:00 UTC (Fri) by fuhchee (guest, #40059) [Link]

For that matter, unquoted comments such as ...

"By incorporating features that were needed for SystemTap, and not providing a user-space interface, it tried to both do too much and too little."

... reflect several misunderstandings.

utrace did not incorporate any features particularly for systemtap. It was designed and built independently, and *prior* to this part of systemtap getting started. As a well-designed framework, it turned out to be useful for more than just debugger control, so uprobes used it. (These dependencies are not etched in stone.)

And as for "too little", this line of thinking presumes that any new internal kernel functionality must by necessity be exposed via brand-new userspace protocol. It ignores suitable pre-existing interfaces (ptrace, gdb remote protocol) that simply work *better* with the new internals.

When an article relays so much opinion and heated debate, such editorial comments can lead astray.

Back to the drawing board for utrace?

Posted Jan 28, 2010 11:00 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

At least once, utrace *was* posted with sample implementations (I think
including gdbstub, certainly including a reimplementation of ptrace using
utrace). The response: come back without all that extra stuff on the side.

It seems the utrace developers are damned if they do and damned if they
don't.

Back to the drawing board for utrace?

Posted Feb 4, 2010 17:32 UTC (Thu) by ariveira (guest, #57833) [Link]

Exactly my apreciation of the situation... that's maybe the reason they do
not participated in the discussion. i too would be somewhat fed up ...

Many times the right hand of the kernel community does not know what the left
hand is doing so to speak

Back to the drawing board for utrace?

Posted Jan 28, 2010 13:42 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

So it's official: kernel developers hate userspace developers.

Back to the drawing board for utrace?

Posted Jan 28, 2010 19:09 UTC (Thu) by daney (guest, #24551) [Link]

> So it's official: kernel developers hate userspace developers.

Userspace? I vaguely remember about hearing about such a thing...

But seriously, I don't think it is hatred. Perhaps benign neglect.

el trace narcotico

Posted Jan 28, 2010 19:18 UTC (Thu) by bkoz (guest, #4027) [Link] (1 responses)

Thanks for providing some coverage of the languid soap opera featuring a charming but troubled kernel and the ardent suitors from the userspace development community who care about debug, trace, and analysis.

Pointing at a still-unmerged tree for realtime and saying "that's the way to do it, see?" seems disingenuous at best. Saying the maintainers of ptrace have no comment is also questionable, when in a related article Oleg indicates long-term solutions outside of ptrace are mandatory.

From outside the kernel community, tuning in at regular intervals to observe the serialized show, it looks to me that these two groups are just passing in the same development space without any real progress.

Featuring:

THE KERNEL. We watched him grow up, gain power and prestige while trying to remain sane in a world gone mad. There's a constantly changing field of opportunity and adversary that he attempts to navigate deftly. Detractors, even gentle ones, whisper of a touch of dementia, a rather narrow and introspective view of the world. Me! Me! It's not what I say or do or even what I say I plan to do (API), no it's what I've done (ABI).

SUITOR U. Has loved and maintained key parts of THE KERNELs business for years. Fix this. Fix that. Has proposed a grand utrace building for the main kernel grounds, and when the blueprints were shown, and told to adjust and update the main house kitchen in addition (ptrace), did so. While doing this, the groundskeepers stopped by and asks the suitor about using the utrace building basement for some new scheme. Last week's episode was when SUITOR U took the latest in a long series of plans back to THE KERNEL, and due to the general inclement weather and dour moods, was told something else entirely. Oh dear!

SUITOR L. Remember me? We danced way back when. I had a real pretty dress, a special bias cut, and caught your attention for a few months. You still let me come to your balls but now I only dance with an advisor or two, and console myself by saying I have the pick of the entourage. Call me. I follow you on twitter.

SUITOR F. It's my cotillion!!! OMG. He danced with me. Was told he's a rake but in this light, doesn't care. The night is young, whooopeeee!

el trace narcotico

Posted Jan 28, 2010 19:43 UTC (Thu) by dlang (guest, #313) [Link]

the realtime tree is a success because huge portions of it _have_ been merged over the last couple of years. In addition, both sides now consider this process to have improved not only the kernel, but also the realtime support compared to what would have happened if the realtime tree had been accepted as-is when it was started.

this is even allowing for bugfixes over time. They (the realtime people) have explicitly stated that breaking out portions from their tree and submitting them as individual features (and dealing with the demands to justify and clean things up before they are accepted) has significantly improved the realtime tree itself.

Back to the drawing board for utrace?

Posted Jan 31, 2010 7:59 UTC (Sun) by sfink (guest, #6405) [Link] (3 responses)

Speaking as just a dumb userspace developer:

Man, am I confused! I have a relatively straightforward problem: I am trying to track down
latency and jitter problems in a realtime media stream generating application. How do I
decide what tool to use? Oprofile? Ftrace? Perf events? SystemTap? LTTng?

I've tried oprofile briefly, and it seemed mostly irrelevant for my problems. I can't find any
clear descriptions of what ftrace and perf events actually are, so that I could figure out
whether I should bother with them. I've been fairly happy with systemtap so far -- it is at least
straightforward to dive into and start generating customized traces that at least startto
address what I'm interested in. Each further step seems to involve more and more groveling
about in the kernel sources, but at least I can understand the path ahead and predict what's
going to be possible. (I have a fairly generic problem -- I want to measure the jitter between
context switches of my realtime threads and then diagnose the reasons for that jitter by
reporting what the CPUs were up to when I wasn't running.)

But now I see that SystemTap is viewed as the bastard stepchild, and I wonder if my
investment into learning to use SystemTap was an expensive mistake. Any guidance on
choosing an appropriate tool for unfortunates like me?

Back to the drawing board for utrace?

Posted Feb 1, 2010 11:59 UTC (Mon) by fuhchee (guest, #40059) [Link] (2 responses)

"But now I see that SystemTap is viewed as the bastard stepchild"

Heh. But the passive voice gives the opinion false authority.

"and I wonder if my investment into learning to use SystemTap was an expensive mistake."

You should explore the alternatives and use whatever works for you.
Listening to simple smears may well be self-defeating.

Back to the drawing board for utrace?

Posted Feb 1, 2010 13:36 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

WHICH ONE???

I'm in the same WTF situation. There are like 6 "tracing" solutions for Linux. Most of them with only a few lines of documentation (SystemTap at least is the most documented of them).

That's a complete SNAFU. Why can't we have one nice user-oriented solution?

SystemTap documentation

Posted Feb 1, 2010 14:29 UTC (Mon) by mjw (subscriber, #16740) [Link]

Yes, choice is such a drag. But none of them will go away soon. Especially not SystemTap which has always had a pretty loyal following and dedicated developers to make it work with whatever the kernel provides. Ultimately they will just share more and more features and they will all become more or less frontends to the same backend kernel features. Something like utrace and uprobes will ultimately get supported in the kernel and then the other tools will also get better user space tracking.

The SystemTap documentation can be found at:
http://sourceware.org/systemtap/SystemTap_Beginners_Guide/
http://sourceware.org/systemtap/langref/
http://sourceware.org/systemtap/examples/

Back to the drawing board for utrace?

Posted Feb 5, 2010 2:09 UTC (Fri) by mfedyk (guest, #55303) [Link]

Thank you.

Generating code dynamically that gets compiled and linked into the kernel just seems scary and error prone.

IMO, come up with an API that can express the different things that you want to find out. And oops, in a roundabout way you now are getting a small limited scripting language in the kernel from tracing events and what do you know? Dtrace does that as well.

No, I'm not saying do it how foo does it but I can't see how dynamically generating C code to make a kernel module is seen superior to a small audited scripting language built into the kernel.

Though if anyone besides Ingo had tried getting that scripting language into the kernel, I would laugh. _No Possible Way_. Seriously.

It would be as absurd as putting X drivers in the kernel did a few short years ago.


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds