Topics from the LLVM microconference
Topics from the LLVM microconference
Posted Aug 31, 2015 10:44 UTC (Mon) by nix (subscriber, #2304)Parent article: Topics from the LLVM microconference
BPF for tracing is currently a hot area, Starovoitov said. It is a better alternative to SystemTap and runs two to three times faster than Oracle's DTrace. Part of that speed comes from LLVM's optimizations plus the kernel's internal just-in-time compiler for BPF bytecode.This claim seems exceptionally unlikely to me. Interpreting DOF is really not an expensive operation: it's just a switch plus some very simple prologue/epilogue code for shuffling the arguments and return value into place plus the code needed to actually do what the DOF has asked, and most DTrace uses I've seen (even Brendan's! :) ) have no probes with anything longer than a few hundred opcodes attached to them: lacking loops and with only non-nested analogues of conditionals, D is not a language in which one would write something long or complicated enough to need optimization. All of DOF interpretation plus all the buffer management is going to be hugely dominated by the cost of taking a trap (for sdt/usdt) or a ring transition into kernel space (for systrace), so this only really applies to fbt, and if he's tested fbt on Linux I'd be quite astonished since it only exists on one person's computer so far.
But it may be true! It's possible that LLVM's native code for argument marshalling is better than the handwritten stuff DTrace uses, and it's just barely possible that in some synthetic workloads this dominates. If there's some actual data showing it, particularly if it's relevant outside pure benchmarks, I'd be fascinated to see it.
Posted Aug 31, 2015 16:46 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Sep 1, 2015 8:00 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Really, without knowing the benchmark I'm left grasping in the dark.
(As for fixing it... branches could definitely be reduced, or predicted, I suppose, at least in the hot spots. We haven't really done much performance optimization of this bit of the system -- the assumption has been that getting into dtrace_probe() would almost always be the expensive part. So there is surely room for improvement here.)
Topics from the LLVM microconference
Topics from the LLVM microconference