An update on GDB

By Jake Edge
April 16, 2014

In an amusingly titled ("Not Just Software Botox: Rejuvenating GDB") talk at the 2014 Collaboration Summit, Stan Shebs of Mentor Graphics presented the history of the debugger along with efforts and plans to update the tool. At 28 years old, GDB is one of the oldest free-software projects that is still in use. Partly due to its age, it needs renovation of various sorts, he said.

History and background

The oldest surviving version of GDB's source code is 2.0 from 1987, though it has a copyright of 1986 in the code. It ran natively on Motorola 68K (Sun 2 and 3) and DEC VAX systems and consisted of 25K lines of C. The current version is 7.7, with 35 target architectures (running both native and cross-architecture). It has 700K lines of C code, plus 23K lines of tests in the test suite. The basic command set is largely the same between those two versions, though essentially all of the code has changed in that time.

GDB is the standard debugger for Linux, and a commonly available one for the rest of the Unix family. It is also the most widely available debugger for embedded processors. There is a good chance, Shebs said, that most of the devices in the room have a GDB stub (the code that handles the remote GDB protocol) stored in their flash or ROM somewhere, though it may be disabled. GDB is also the standard debug engine for the Eclipse integrated development environment (IDE). Because of its long history, GDB contains the details of changes to many system internals (e.g. chip instruction sets, operating system and compiler behavior) over the years.

There are certain requirements that the project needs to continue to fulfill as it makes changes, Shebs said. GDB needs to be able to control and examine programs built from low-level languages. It needs to debug programs at the source level and to debug optimized code. It also needs to do both native and cross-debugging. The project also has to follow Free Software Foundation (FSF) policies, "even if we don't like it". For example, the project was told that it must switch to the GPLv3.

There are a set of "non-requirements" as well, he said, some of which would be "nice to haves", but aren't hard and fast requirements. There is no push to support proprietary compilers; GDB does, but it doesn't have to. Nor does it need to target 16-bit architectures. There is no requirement to structure it as a library or set of components, nor to match the competition on features. It does not need to have a GUI. Perhaps surprisingly, it is not required to work with IDEs, though there would be "much howling from Eclipse" if the project stopped supporting that particular IDE.

Rework

There have been several rework efforts on GDB in the past, Shebs said. The first was in 1990 and is remembered by almost no one who still works on GDB; it added the BFD library to read executables and object files, rather than hardcoding it in the debugger itself. 1991 saw the addition of the target vector that separated out the handling of operating-system-specific interfaces needed to target a particular type of system (e.g. file targets, remote stub targets, etc.).

In 1999, asynchronous event handling was added to GDB, so that local and remote events could be handled simultaneously, without one blocking the other. That year also saw the beginning of the "architectural object" effort. It took four to five years to get each supported architecture into that new object-oriented model. The project moved from a snapshot-based development model to a public CVS tree in 1999 as well. In 2003, a move to object-oriented frame objects was made. This replaced the simple stack-frame-pointer tracking that was done earlier with more detailed tracking.

Both of the object-oriented changes foreshadow one of the bigger rework projects that is planned: moving from C to C++. GDB is too large for C, Shebs said. The project has introduced features that simulate C++ (e.g. target vector, architecture and frame objects) over the years. C++ is not so chaotic and non-portable any more, and its overhead is not really a problem now either, so it is a reasonable choice. The idea has been discussed since 2008, with a "rough consensus" in favor of it forming recently. There have been some changes made to smooth the path toward C++, but no one-way changes have been made yet.

Another effort is to document the internals of GDB. There is an internals manual that was started by John Gilmore and worked on by Shebs along the way, but it is well out of date at this point. The existing manual will be abandoned and the information will be moved to the GDB wiki. The plan is to identify widely used code as an "internal API" and then to use Doxygen to build a web-based manual of that API.

There was a complaint from the audience about Doxygen-generated documentation being rather "sterile". But Roland McGrath said that problem is not a tool issue exactly. Doxygen doesn't solve all of the problems with documentation, he said, but it makes it easier for people to do the right thing. Shebs noted that it allows developers to mark certain sections of the comments to become part of the documentation; it will also cross-reference symbols for the entries.

Finding the commonalities between GDB and the gdbserver (the remote debugging server) and factoring them out into a separate library is another planned rework. Both GDB and gdbserver call ptrace() to control the process being debugged, but they currently do it with separate code. In addition, handling the remote protocol is done by both, separately. Moving that all to a common library will not only reduce code duplication, it will also expose any assumptions that the native debugger side has, Shebs said.

The original assumption of there being just a single process to debug has long gone by the wayside, but there is still work to do to support multiple processes (and threads) on multiple cores in modern systems. The original single process to be debugged was known as the "inferior" process (it was being debugged, thus had bugs, thus it was inferior, he said with a chuckle). Since then, inferior objects have been added to track processes. In addition, "address space" and "program space" objects have been added. They are not particularly interesting for Unix processes, as there is just one of each, but there are uses for multiple programs in the same address space.

Multiple processes on multiple cores lead to a number of race conditions that need to be dealt with. For example, the "breakpoint dance" that occurs when the user steps over a breakpoint is race-prone. The breakpoint must be disabled, then reenabled after the step, which leaves a window where other processes may not break correctly. Inferior objects and threads must be added to the user interface as well, so that one could, for example, break on a particular function only in certain threads.

Python scripting was added to GDB in 2008, while Guile scripting was just added this year. It is useful for application-specific higher-level commands and it reduces the call for C code to be added into GDB, he said. But scripts are "not really a solved problem" as they are complicated by the control algorithms in the debugger.

Git, maintainers, and politics

GDB has moved to Git. The original CVS repository contained GDB, binutils, Cygwin, and other projects from what was originally the Cygnus source tree. When considering the move to Git, there was the question of what to do with the other projects. GDB decided to convert the binutils and GDB parts of the tree and leave the others to fend for themselves. Red Hat contributed a sanitized repository that had lots of history before the CVS tree became public. Tom Tromey did a bunch of scripting to convert the CVS history into Git. Some pre-public revisions were lost, but there is now a single Git repository for GDB and binutils. That work was completed toward the end of 2013.

Originally, there was a single maintainer for GDB; Richard Stallman was the first, and there were others, including Shebs, along the way. By around 2005, though, there were global and area maintainers, but no single person was in charge of the whole project. Decisions were more consensus-driven, but that meant the project was somewhat less decisive as a whole. A GDB steering committee was formed in 2000, but it never made any decisions. Stallman disbanded it in 2012 and "nobody mourned its passing", Shebs said.

On the issue of politics, Shebs said that he had no interest in dishing dirt on individuals, but that there were some issues that had come up over the years that were worth mentioning. GDB originally started out in the "cathedral" model of development, with no public repository. Moving to the public CVS tree was meant to combat that and move to a more "bazaar" style of development. That has been mostly successful, he said, pointing to the few forks that there have been over the years as evidence.

There is ongoing tension between experimentation and stability. It is sometimes hard to make changes because it could break for some users on some random language and operating system combination. That tends to make developers conservative, but GDB has moved toward a more experimental model. The project is willing to risk breaking 10% of its users to make progress.

But backward compatibility is one area where it needs to step lightly. If GDB breaks the remote protocol, lots of people get unhappy quickly. That means adding new packets rather than changing the old ones. In addition, the API that Eclipse uses is not tightly specified and the IDE depends on some undocumented behavior, which means that those are areas where changing the code must be done carefully.

Retaining old code, even for systems that no longer work at all, has been a source of some tension in the project. There is anxiety about removing the old, dead code and configurations, he said, but the project now does so routinely. There used to be some issues between hobbyist/volunteer developers and those who are paid to work on GDB, but that is mostly in the past. These days, most who are working on GDB are paid to do so.

Shebs's last point was about responding to the competition. He noted that the LLVM debugger (LLDB) is far behind where GDB is, so it does not really provide much competition. Other debuggers, such as TotalView, are focused on niches (e.g. high-performance computing) rather than being general-purpose, so again those are not spurring much feature development.

It was good to get a nice overview of the state of GDB today, as it has been some time since that kind of report has come to our attention. It is a project that many use, often daily, but somehow doesn't generate the attention that tools like GCC seem to garner. The project seems healthy and headed in a reasonable direction so that it will likely be the Linux debugger of choice for many years to come.

[ Thanks to the Linux Foundation for supporting my travel to the Collaboration Summit. ]

Index entries for this article
Conference	Collaboration Summit/2014

An update on GDB

Posted Apr 17, 2014 7:19 UTC (Thu) by mjw (subscriber, #16740) [Link]

There was also a nice overview talk on GDB at Fosdem last month by Pedro Alves that focused on the technical aspects of merging the local/remote gdb/gdbserver and multi-process/target features of GDB with lots of details in the slides: https://fosdem.org/2014/schedule/event/gdb_target_run_val...

There was also a talk by Philippe Waroquiers on how Valgrind acts like a GDB server: https://fosdem.org/2014/schedule/event/valgrind_gdb/

An update on GDB

Posted Apr 17, 2014 11:54 UTC (Thu) by rstu (guest, #88902) [Link] (1 responses)

I'd love to hear their thoughts on rr.

An update on GDB

Posted Apr 17, 2014 21:45 UTC (Thu) by madscientist (subscriber, #16861) [Link]

I was very interested, until I got to the part where it only supports 32bit apps... but I'll keep an eye on it as I think it could be really useful.

An update on GDB

Posted Apr 20, 2014 19:56 UTC (Sun) by robbe (guest, #16131) [Link]

An area where I'd like the free software to improve is tools for reverse engineering. I think many developments hint at this art being a necessary self-defense skill (against surveillance, malware, devices that lock-out their owner).

There are of course tools available, but being proprietary hampers, in my opinion, the advancement of the reverse engineering trade -- at least the "good guys".

Guile support

Posted Apr 21, 2014 17:24 UTC (Mon) by jmaline (guest, #59517) [Link]

My ears perked up at the mention of GDB getting Guile scripting recently. I think I recall that the origin of Guile was a proposal to add Tcl scripting to GDB.

RMS objected and the result was Guile. Some history here, but it doesn't mention the GDB connection I think I remember.
http://www.gnu.org/software/guile/docs/master/guile-tut.h...