Merging kdb and kgdb

By Jake Edge
February 17, 2010

It was something of a surprise when Linus Torvalds merged kgdb—a stub to talk to the gdb debugger—back in the 2.6.26 merge window, because of his well-known disdain for kernel debuggers. But there is another kernel debugging solution that has long been out of the mainline: kdb. Jason Wessel has proposed merging the two solutions by reworking kgdb to use the "kdb shell" underneath, which would lead to both solutions being available for kernel hackers.

The two debuggers serve different purposes, with kdb having much less functionality, but they both have uses. Kgdb allows source-level debugging using gdb over a serial line, but that requires a separate system. For systems where it is painful or impractical to set up a serial connection, kdb may provide enough capability to debug a problem. In addition, things like kernel modesetting (KMS) allow for additional features that kdb has lacked. Wessel described one possibility:

A 2010 example of where kdb can be useful over kgdb is where you have a small netbook, no serial ports etc... and you are running X and your file system driver crashes the kernel. With kdb plus kms you can get an opportunity to see the crash which would have otherwise been lost from /var/log/messages because the crash was in the file system driver.

While kgdb allows access to all of the standard debugging commands that gdb provides, kdb has a much more limited command set. One can examine and change memory locations or registers, set breakpoints, and get a backtrace of the stack, but those commands typically require using addresses, rather than symbolic names. Currently, the best reference for kdb commands comes from a developerWorks article, though Wessel plans to change that. There is some documentation that comes with the patches, but a command reference will depend on exactly which pieces, if any, actually land in the mainline.

It should be noted that one of the capabilities that was removed from kdb as part of the merger is the disassembler. It was x86 specific, and the new code is "99% platform independent", according to the FAQ about the merged code. Because kgdb is implemented for many architectures, rewriting it atop kdb led to support for many more architectures for kdb. Instead of just the x86 family, kdb now supports arm, blackfin, mips, sh, powerpc, and sparc.

In addition, kgdb and kdb can work together. From a running kgdb session, one can use the gdb monitor command to access kdb commands. There are several that might be helpful like ps for a process list or dmesg to see log output.

The FAQ lists a number of other advantages that would come from the merge, beyond just getting kdb into the mainline so that its users no longer have to patch their kernels, The basic idea behind the advantages listed is to unite the users and developers of kgdb and kdb so that they are all pulling in the same direction, because "both kdb and kgdb have similar needs in terms of how they integrate into the kernel". There have been arguments in the past about which of the two solutions is best, but, since they serve different use cases, having both available would have another benefit: "No longer will people have to debate which is better, kdb or kgdb, why do we have only one... Just go use the best tool for the job."

Wessel notes that Ubuntu has enabled kgdb in recent kernels, which is something he would like to see done by other distributions. If kdb is available, that too could be enabled, which would make it easier for users to access the functionality:

My other hope is that the new kdb is much easier to use in the sense that the barrier of entry is much lower. For example, someone with a laptop running a kernel with a kdb enabled kernel can use it as easily as:

    echo kms,kbd > /sys/module/kgdboc/parameters/kgdboc
    echo g > /proc/sysrq-trigger
    dmesg
    bt
    go

And voila you just ran the kernel debugger.

In the example above, Wessel shows how to enable kdb (for keyboard (kbd) and KMS operation), then trap into it using sysrq-g (once enabled, kdb will also be invoked if there is a panic or oops). The following three commands are kdb commands for looking at log output, getting a stack backtrace, and continuing execution.

The patches themselves are broken up into three separate patchsets: the first and largest adds the kdb infrastructure into kernel/debug/ and moves kgdb.c into that directory, the second adds KMS support for kdb along with an experimental patch to do atomic modesetting for the i915 graphics driver, and the third allows kernel debugging via kdb or kgdb early in the boot process; starting from the point where earlyprintk() is available. Wessel is targeting 2.6.34 and, at least so far, the patches have been well received. The most recent posting is version 3 of the patchset, with a long list of changes made in response to earlier comments. Furthermore, an RFC about the idea last May gained a fair number of comments that clearly indicated there was interest in kdb and merging it with the kgdb code.

Sharp-eyed readers will note some similarities between this proposal and the recent utrace push. In both cases, an existing debugging facility was rewritten using a new core, but there are differences as well. Unlike utrace, the kdb/kgdb patches directly provide some lacking user-space functionality. Whether that is enough to overcome Torvalds's semi-hostile attitude towards kernel debuggers—though the inclusion of kgdb would seem to indicate some amount of softening—remains to be seen.

Index entries for this article
Kernel	Debugging
Kernel	Development tools/kgdb
Kernel	kdb
Kernel	kgdb

Merging kdb and kgdb

Posted Feb 18, 2010 6:08 UTC (Thu) by madscientist (subscriber, #16861) [Link] (2 responses)

Off-topic, but what is the status of kgdboe? Anyone know? Last I checked it was not in the mainline kernel. Is anyone maintaining this? Is anyone trying to get it merged?

kgdboe

Posted Feb 18, 2010 15:37 UTC (Thu) by jwessel (guest, #63702) [Link] (1 responses)

There is no active work presently to get kgdboe merged to the mainline. Kgdboe is viewed as an unstable connection type in its current design.

For example, with IRQ preemption there is no safe way to share the ethernet hardware. There are few if any ethernet drivers that have a completely robust NET POLL API implementation. The remaining problem is that the amount of code which cannot be debugged with kgdboe is much larger than when using the dbgp or serial based I/O driver.

There are several proposals in existence about how to change the design and at the point that someone picks up one of those to carry forward you would also have the capability to run kdb over the kgdboe I/O driver.

Jason.

kgdboe

Posted Mar 9, 2010 0:03 UTC (Tue) by johnh500 (guest, #49452) [Link]

Hi Jason, which of these proposals do you consider reasonable, and could you please point us to any of them? I am maintaining a large driver that really needs the high speed of kgdboe (plus the ability to debug laptops that have no serial port), so I've been thinking about doing this for some time now.

Merging kdb and kgdb

Posted Feb 18, 2010 9:45 UTC (Thu) by marcH (subscriber, #57642) [Link] (3 responses)

Debuggers are invaluable to understand bugs in poorly designed or poorly implemented software.

Since Linux is perfectly designed, implemented and documented there is no need for a Linux debugger. Linus is right!

How code is

Posted Feb 18, 2010 12:54 UTC (Thu) by alex (subscriber, #1355) [Link] (2 responses)

I think Linus' argument is if the code is so hard to follow you can't see
how it ended up in it's final state then that's a problem with the code. The
worry is using debuggers can short circuit understanding the code path and
result in fixes that address the symptom rather than the cause.

I can see the logic although I think he discounts the usefulness of being
able to diagnose system state at failure points. Certainly I'd hate to debug
my user-space code with just a faulting address and register dump at the
fault point.

How code is

Posted Feb 18, 2010 15:37 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

> although I think he discounts the usefulness of being able to diagnose system state at failure points.

Yes, this is the part that I do not understand. "Debuggers are terrible as a design tool, so... better not use any for investigation?!" That sounds a bit extreme.

Linus prefers not to see any kid playing with sharp knives in the kitchen, since they are only supposed to bake cakes. He is concerned about any blood accidentally polluting the kernel: fair enough. But that does not explain why he, an adult, is also not using any.

Wait: maybe he is secretly using a kernel debugger. Just like any other parent: "do what I say" (not what I do).

maybe he has changed his mind

Posted Feb 18, 2010 15:54 UTC (Thu) by alex (subscriber, #1355) [Link]

Linus isn't totally dogmatic, he has been known to change his mind from time
to time. Rare, but it does happen.