A kernel message catalog

By Jonathan Corbet
August 4, 2008

Kernel developers will often use printk() to output a message when something goes wrong. Such messages tend to be helpful to kernel developers; if nothing else, they can be used to find the place in the source where the message is emitted, and that, in turn, is most useful for somebody trying to figure out what the message is really saying. So, if your kernel tells you, for example, "lguest is afraid of being a guest," a quick dig through the source turns up a comment reading "Lguest can't run under Xen, VMI or itself. It does Tricky Stuff." Problem solved - or, at least, understood.

But, for the bulk of Linux users and administrators, the act of printk() interpretation by recourse to the kernel source is, itself, Tricky Stuff. If the kernel cannot tell them directly what the problem is, they would much rather have a more straightforward means of translating messages into some sort of useful English.

Or maybe not: for many Linux users, English may not be much more helpful than straight kernel-speak. It would be really nice to translate those messages into some sort of useful French, or Chinese, etc. What it comes down to, in the end, is that printk() alone will never be able to provide sufficient information to users in a way which can be understood and used to solve problems.

Just over one year ago, LWN looked at some proposals for adding structure to kernel messages. After that, the discussion went quiet, to the point that it seemed like not much was happening in the messaging area. But one should not forget that we are dealing with companies like IBM which have been creating massive binders full of kernel message documentation for several decades. They're not going to give up so easily. So the posting (by Martin Schwidefsky) of a new kernel messaging proposal is not an entirely surprising event.

In the latest scheme, each source file which generates structured messages defines a macro KMSG_COMPONENT as a string naming the specific subsection. This name will often match the name of the module which is created from that code, but that is not necessarily the case. The name, once chosen, is supposed to remain fixed forevermore; it becomes, in essence, part of the user-space interface and should always match the documentation.

Then, each message is assigned an integer identification number. The combination of the component name and the message number should be unique throughout the kernel; it is used by various tools to associate a more detailed explanation of whatever the message is intended to communicate. The message number is used with one of a number of new printk()-like functions:

    kmsg_alert(id, format, args...);
    kmsg_err(id, format, args...);
    kmsg_warn(id, format, args...);
    kmsg_info(id, format, args...);
    kmsg_notice(id, format, args...);

    kmsg_dev_alert(id, dev, format, args...);
    /* ... */

The "_dev" versions take an additional struct device argument (like dev_printk()) and encode the device name in the resulting message. That message (for all variants) will include the component name and the message number in any output. So, for example, the S/390 "xpram" driver includes the following:

    #define KMSG_COMPONENT "xpram"

        /* ... */
        if (devs <= 0 || devs > XPRAM_MAX_DEVS) {
	    kmsg_err(1, "%d is not a valid number of XPRAM devices\n", devs);

Should this particular error check trigger, the resulting message will look like this:

    xpram.1: 42 is not a valid number of XPRAM devices

Thus far, our user is probably not feeling much better informed than before. But there is additional information which is made available and associated with that message tag. In this particular case, it looks like this:

/*?
 * Tag: xpram.1
 * Text: "%d is not a valid number of XPRAM devices"
 * Severity: Error
 * Parameter:
 *   @1: number of partitions
 * Description:
 * The number of XPRAM partitions specified for the 'devs' module parameter
 * or with the 'xpram.parts' kernel parameter must be an integer in the
 * range 1 to 32. The XPRAM device driver created a maximum of 32 partitions
 * that are probably not configured as intended.
 * User action:
 * If the XPRAM device driver has been compiled as a separate module,
 * unload the module and load it again with a correct value for the
 * 'devs' module parameter. If the XPRAM device driver has been compiled
 * into the kernel, correct the 'xpram.parts' parameter in the kernel
 * parameter line and restart Linux.
 */

Here, we have a more verbose description of the message. Even more helpfully (one hopes), there is a discussion of what can be done to make this message go away. This information can be provided within the source or in a separate documentation file; it can also, presumably, be nicely formatted and distributed to paying customers as a binder for the system administrator's bookshelf. It can be translated into other languages for Linux users worldwide (and beyond: one could have a lot of fun with the Klingon translation for this kind of material).

The patch includes a script (written in Perl with undocumented messages, of course) which (when invoked with make D=1) will go through the source and make sure that every kernel message has an associated description block; it can also format the descriptions into man pages if desired. There are checks for missing descriptions or overloaded message ID numbers; the script does not, at the moment, check for a change in the message text.

Martin's first posting made this work specific to the S/390 architecture; following a suggestion from Andrew Morton, he made it generic in later versions. The cost of this work is zero for those who do not use it, so there is a reasonable chance that it will find its way into the mainline eventually. Before the message catalog system can be truly useful, though, developers will have to go through and document a substantial portion of the messages created by the kernel - and keep that documentation current as the kernel evolves.

Index entries for this article
Kernel	Messages

A kernel message catalog

Posted Aug 7, 2008 9:20 UTC (Thu) by dark (guest, #8483) [Link] (2 responses)

I notice that "XPRAM_MAX_DEVS" in the check became "32" in the tag description. This means that if XPRAM_MAX_DEVS changes, someone will have to remember to update its value in any descriptions that use it. I think this will get painful.

In this case it's not that hard because XPRAM_MAX_DEVS is defined as a literal number 32 and is defined in the same file, so you could put a comment there saying what descriptions need to be updated if its value changes. In the general case, such limits might live in separate header files, their values might be architecture-dependent or depend on other limits, etc. It will be difficult to keep descriptions in sync with the code if they often contain such values.

I think the right thing to do for this particular message is to put XPRAM_MAX_DEVS into the printk itself ("%d is not a valid number of XPRAM devices, should be between 1 and %d") and then have the description say something like "must be in the specified range". Will this approach work in general?

A kernel message catalog

Posted Aug 7, 2008 23:15 UTC (Thu) by vomlehn (guest, #45588) [Link] (1 responses)

Yes, that would work, but note we are now raising the bar for kernel development. Rather than
just slamming out any old message in a printk, kernel developers will now need to carefully
consider the associated English (or Chinese or Klingon) text of the expanded message. I'd like
to believe that we can do this, but I'm concerned that past history suggests we are not that
good at this kind of thing. I see lots of reluctance from people to put many comments in code
based on how rarely they get updated when the code changes.

Note that I think that having good kernel error messages is a *very* good thing. But it's
harder than what we've been doing and I think doing it half-way may very well be worse than
not doing it all.

A kernel message catalog

Posted Aug 8, 2008 8:23 UTC (Fri) by dark (guest, #8483) [Link]

I think doing it half-way would still be an improvement, if the effect is that googling for "xpram.1" will always bring up relevant pages :)

A kernel message catalog

Posted Aug 8, 2008 3:56 UTC (Fri) by JohnNilsson (guest, #41242) [Link] (3 responses)

Instead of manually picking an id for each message why not just make a hash of the message
string? No need to manually maintain the ids. No possibility to change a text without updating
the id.

A kernel message catalog

Posted Aug 8, 2008 11:50 UTC (Fri) by jamesh (guest, #1159) [Link] (2 responses)

So if a spelling or grammar error gets missed, the message can't be fixed without breaking
(possibly printed) documentation?  That doesn't sound like an improvement.

That said, if there is a way of getting rid of the IDs entirely it might be worth pursuing.
Not having to maintain message IDs is one of the reasons why gettext won over competing message
translation frameworks.

A kernel message catalog

Posted Aug 8, 2008 13:41 UTC (Fri) by JohnNilsson (guest, #41242) [Link] (1 responses)

One could always maintain a mapping between id's that are the same. But yeah, that probably
isn't much of an improvement.

A kernel message catalog

Posted Aug 8, 2008 20:10 UTC (Fri) by giraffedata (guest, #1954) [Link]

Of all the code changes that break the documentation, grammar and typo corrections will be an insignificant minority.

There's no need to do something fancy like a hash -- just use the literal format string as the identifier. It's already in the documentation code anyway.

Another idea, more searchable than actual text but less of a development PITA than a number: an alphanumeric message ID. Have the macro turn it into a name that would make the compilation fail if it isn't unique.

A kernel message catalog

Posted Aug 8, 2008 20:47 UTC (Fri) by giraffedata (guest, #1954) [Link] (4 responses)

we are dealing with companies like IBM which have been creating massive binders full of kernel message documentation for several decades.

Do they still? That practice began when computer storage was too expensive to keep all that information online. Today, a message manual increases cost for everyone involved. And that's true even if the manual is stored online.

It irks me endlessly that software engineers seem to think they have to pay by the word for error messages, and that it's somehow undignified to come right out and tell the user what's going on. Here we have a proposal to send the user on a treasure hunt to find out why his module wouldn't load. The whole paragraph explaining the XPRAM devices problem could be right in the printk, and the second paragraph giving advice about it would not be unappreciated, right there in the log, either.

I understand the laziness that makes people not compose error messages, but in this case, the developer does write the message; he just sequesters it.

There are some messages that happen so normally that a full explanation would be an inappropriate use of message space, but the example isn't one of those.

A kernel message catalog

Posted Aug 9, 2008 1:05 UTC (Sat) by erwbgy (subscriber, #4104) [Link] (2 responses)

Embedding the text directly in the source files would make translations more difficult, which
would probably mean they didn't happen, so I don't think this is a good idea.

Instead, perhaps some post-processing of the source code could be done to embed the text for
your chosen language, so you do get the text directly in the log, as you suggest.

A kernel message catalog

Posted Aug 9, 2008 10:07 UTC (Sat) by dark (guest, #8483) [Link] (1 responses)

Embedding the text in the source code needn't make it harder to translate -- gettext solves exactly this problem and I'm sure the technique could be adapted for the kernel. The drawback is that if the kernel outputs translated messages directly then the English-language version is not available when submitting bug reports or asking kernel developers for help. For that, a unique identifier would help.

A kernel message catalog

Posted Aug 9, 2008 16:31 UTC (Sat) by giraffedata (guest, #1954) [Link]

The drawback is that if the kernel outputs translated messages directly then the English-language version is not available when submitting bug reports or asking kernel developers for help. For that, a unique identifier would help.

Note that the current proposal works the other way around: the kernel produces English and a unique identifier, then the user looks up the identifier separately and finds a more complete explanation possibly in another language. And in mine, the difference is that the English part is longer and many users don't need the additional step.

"harder to translate" might mean harder for someone to generate the other-language message repository, as compared to where the English version is in a separate file of message text. But in the current proposal, I believe the full English text is in the source file anyway; it's just in comments instead of in the actual message. A program extracts this text from throughout the kernel and a translator uses that extract to generate a message repository.

A kernel message catalog

Posted Aug 11, 2008 1:35 UTC (Mon) by ajf (guest, #10844) [Link]

They still rigorously define the error codes - more out of habit than a desire to be helpful,
I suspect, since almost all of the ones I've encountered since I first encountered WebSphere
Application Server seem to be documented as "Something didn't work. Don't do that.".