Enhanced printk() merged

By Jake Edge
July 9, 2008

A change very late in the development cycle for 2.6.26 provides a framework for extending printk() to handle new kinds of arguments. Linus Torvalds just merged the change—after -rc9—presumably partially because he knew he could trust the author, but also because it should have no effect on the kernel. It will provide for better debugging output once code is changed to take advantage of it.

The core idea is to extend printk() so that kernel data structures can be formatted in kernel-specific ways. In order to get some compile-time checking, the %p format specifier has been overloaded. For example, %pI might be used to indicate that the associated pointer is to be formatted as a struct inode, which could print the most interesting fields of that structure. GCC will be able to check for the presence of a pointer argument, but because it does not understand the I part, cannot enforce that it is a pointer of the right type.

Extending printk() in this manner allowed Torvalds—who authored the patch—to add two new types to printk(): %pS for symbolic pointers and %pF for symbolic function pointers. In both cases, the code uses kallsyms to turn the pointer value into a symbol name. Instead of a kernel developer having to read long address strings and then trying to find them in the system map, the kernel will do that work for them.

The %pF specifier is for architectures like ppc and ia64 that use function descriptors rather than pointers. For those architectures, a function pointer points to a structure that contains the actual function address. By using the %pF specifier, the proper dereferencing is done.

As an example of how the augmented printk() could be used, Torvalds converted printk_address(). The CONFIG_KALLSYMS dependency and the kallsyms_lookup() were removed, essentially leaving a one-line function:

    printk(" [<%016lx>] %s%pS\n", address, reliable ? "": "? ", (void *) address);

If kallsyms is not present, the new printk() just reverts to printing the address in hexadecimal, which allows the special case handling to be done there.

The clear intent is to allow additional extensions to printk() to support other kernel data structures. The change to vsprintf(), which underlies printk(), actually allows for any sequence of alphanumeric characters to appear after the %p. The new pointer() helper function currently only implements the two new specifiers, but others have been mentioned.

The mostly likely additions are for things like IPv4, IPv6, and MAC addresses. Torvalds specifically mentions using %p6N as a possibility for IPv6 addresses. Some would rather have seen a different syntax be used, %p{feature} was suggested, but that would conflict with some current uses of %p in the kernel. Torvalds is happy with his choice:

I _expressly_ chose '%p[alphanumeric]*' because it's basically totally insane to have that in a *real* printk() string: the end result would be totally unreadable.

The patch took an interesting route to the kernel, with much of the discussion evidently going on in private between Torvalds, Andrew Morton, and others before popping up on the linuxppc-dev and linux-ia64 mailing lists. The patch itself has not been posted to linux-kernel in its complete form, but was committed on July 6. While it is a bit strange to see such a change this late in the development cycle, it is a change that should have no impact as there are no plans to actually use the new specifiers in 2.6.26.

Index entries for this article
Kernel	printk()

Enhanced printk() merged

Posted Jul 17, 2008 8:59 UTC (Thu) by meuh (guest, #22042) [Link] (3 responses)

Strange way to specify format.

According to printf() manual page:

Each conversion specification is introduced by the '%' character [...] after which the following appear in sequence:

Zero or more flags

[...]

A conversion specifier character that indicates the type of conversion to be applied.

So, the correct way to specify a struct inode pointer should be %Ip, where I is the flag and p the conversion specifier.

Did kernel developers read userland manual pages ? :)

Enhanced printk() merged

Posted Jul 17, 2008 9:56 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

If they used that format, GCC would warn about it. The format chosen looks 
to GCC (as to userland printf()) like a %p with unrelated characters after 
it, so GCC doesn't check those unrelated characters because they're just 
literal text as far as it knows.

Enhanced printk() merged

Posted Jul 17, 2008 16:41 UTC (Thu) by meuh (guest, #22042) [Link] (1 responses)

Extending GCC to support some new kind of format string is also possible,
sadly this can't be done dynamically, patching GCC is required.

GCC already knows about other format string, see:
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Target-Format...
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Function-Attr...

Enhanced printk() merged

Posted Jul 17, 2008 17:41 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes indeed, and printk() is marked up with the printf attribute. The trick 
is to find a way to define new format characters that doesn't cause GCC to 
warn about all of them.

There were attempts in the past to make the format attributes dynamically 
redefinable, but if I recall correctly the consensus in the end was that 
this was simply too damn complicated.

(I wonder if what we need is loose versions of the format attribute's 
archetypes, which warn about incorrect numbers of parameters and type 
mismatches for format letters GCC knows about, but does not check format 
letters that GCC doesn't know? As long as nobody tries to reimplement 
something like .* which changes the number of arguments consumed, this 
should work fine.)

Unescaped characters confuse Konqueror

Posted Aug 20, 2008 14:39 UTC (Wed) by Robert (subscriber, #36811) [Link] (1 responses)

The string printk(" [<%016lx>] confuses Konqueror (3.5.9) and it terminates the page behind the <. It would be better to convert these characters to HTML entities: < and >.

Unescaped characters confuse Konqueror

Posted Aug 20, 2008 14:46 UTC (Wed) by corbet (editor, #1) [Link]

Not just "better", it's required to make the page valid HTML. Sorry that one slipped through, it's fixed now.