KS2011: Structured error logging

By Jonathan Corbet
October 24, 2011

2011 Kernel Summit coverage

The idea that the messages emitted by the kernel should be more structured is hardly new. There would be value in standardized messages: tools would be easier to write, translations would be facilitated, documentation of messages could be written, and so on. But every attempt to improve the kernel's logging has run aground. That may be the fate of an initiative described at the 2011 kernel summit, but, then, something may yet come of it.

Lennart Poettering and Kay Sievers got up to talk about their "journal" work. Their point is that syslog is getting quite old - it is essentially 1980's technology - and we need a better way to deal with messages. Output that is (only) human readable is not as useful as it could be.

Traditional syslog has a lot of problems. Messages are untrusted; they can come from anywhere. There are lot of different logging techniques. Parsing log files is a hard problem - tools that do so generally turn into "regex horrors." Kernel messages tend to have associated context which does not make it into the log file and must be reconstructed, unreliably, later on. Time zone information is missing, the log stream can be manipulated by an attacker, the use of UDP for network logging means that messages can be dropped, there is "no sane access control," and more. In response to all these problems, Lennart and Kay have been designing "the journal," which is intended to provide unified logging through the entire software stack. The user-space side of the journal exists now and is integrated with service management; the kernel code is not yet available.

The user-space code is able to augment kernel messages with associated context information; it can, for example, add the real device name that matches a message emitted from the kernel. Strings like "sda" make no sense when seen in isolation on today's dynamic systems, even one day later.

The most controversial part of the plan (so far as they were able to present it) was the idea of attaching a 128-bit randomized tag to each message. These tags would have no central management; developers are expected to simply generate a new tag when they create a new message. Tools could then be taught the meaning of specific tags and could quickly recognize relevant messages in the logs. Translators and documentation writers could attach information to specific tags. The tag database would be maintained on the net and run in user space; kernel developers would not have to care about the message tags at all.

Linus broke in to make the claim that UUIDs (which is essentially what these tags are) have never worked in the past; why, he asked, do people still try to use them? The IDs simply get copied around and reused; that will happen with kernel messages too, he predicted. Besides, kernel messages already have a random bit string attached to them - it's the string that is the message itself (or the format string from which it is created). Lennart's defense, which included the assertion that UUIDs are used successfully in the EFI BIOS interface, seemed to convince few people. From there, the conversation became rather loud and unstructured; those who have complained about the lack of contention in recent kernel summits should be less unhappy this time around.

Some of the pieces of a structured logging interface already exist, as Alan Cox pointed out; functions like dev_printk() and friends already encode associated device information in a set way. But Christoph Hellwig asserted that what users really care about is actions on objects; what should be done is to add a new interface catering to that need and leave printk() alone. As it happens, Lennart and Kay propose the addition of a journalk() logging function for just that purpose. Output from printk() would go to the journal, but structured messages with IDs would be created with journalk().

It was suggested that what we are really seeing here is a reinvention of tracepoints. Alan added that things have to be done carefully, since these messages, however they are created, become part of the ABI. He also noted that we have a good interface for logging now: netlink. All that is needed is to put everything together. It turns out that using netlink for the journal is part of the plan - but the code does not exist yet.

At the end of the session, Alan recommended that the developers not really even think about printk(); they should find a way to characterize what they really need, then come up with a mechanism to provide it. Kay and Lennart sat down without having had the time to present their full proposal; Kay promised that they would try again next year.

Next: Coming to love cgroups

Index entries for this article
Kernel	Kernel messages
Kernel	Messages

KS2011: Structured error logging

Posted Oct 25, 2011 2:34 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

I think that's one area where Windows did TheRightThing(tm). Messages are structured as XML with each message having a unique UUID (wonderful for event correlation).

XML might be replaced by something better, but the principle stands.

KS2011: Structured error logging

Posted Oct 25, 2011 2:49 UTC (Tue) by dlang (guest, #313) [Link] (17 responses)

to paraphrase

those who do not understand syslog are doomed to reinvent it, poorly

there are some real issues with syslog (the lack of any ability to know where the log message _really_ came from on a system for example), but most of the problems that are attributed to syslog are really failures in sane log formatting on the part of the application generating the log message, and what makes anyone thing that if you change to some new mechanism the application programmers will be consistent any more than they have in the past?

I am dealing with this at work as well where the application programmers declare syslog 'obsolete' and go on to create their own logging mechanism.

the end result after much effort?? someone finally writes a tool to get this custom log into syslog, and then things settle down.

KS2011: Structured error logging

Posted Oct 25, 2011 3:19 UTC (Tue) by dlang (guest, #313) [Link] (16 responses)

since I don't think I was clear enough in my prior message

don't invent a new logging mechanism, just clean up the formatting of your logs

if the kernel could just adopt a rule along the lines of:

All new log messages would start at the beginning of a line, all continuations of a log message would start with whitespace

it would be a huge win.

this would be enough to let log daemons figure out what a complete log is, and from there log parsers can take it reasonably.

log parsers frequently devolve into regex hell, but they don't have to, and it doesn't require creating a new logging protocol to solve the problem.

KS2011: Structured error logging

Posted Oct 25, 2011 7:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

So, let's see - a record from our code:
============
[Sat 26 Oct 2011] "Joe User" requested lmpp://myserver.dc=some.dc=com/service , "SUCCESS" has been returned.
============

How should we parse it? Well, let's start with the date. It can be parsed by regexps, but it's already some amount of code.

Then there's user name. It can't be parsed by regexps at all (because of quoting, for example "Joe \"the mad\" User"). Then there's URL, which also can not be reliably parsed by regexps. And then finally the exit code which luckily is just a pre-defined string.

It's _really_ _really_ easy to make log unparseable accidentally. And given that quite a lot of log messages are printed only during exceptional/error conditions you might not discover it until it's too late.

So something which just CAN NOT be misused is sorely needed.

KS2011: Structured error logging

Posted Oct 25, 2011 7:50 UTC (Tue) by dlang (guest, #313) [Link]

you can also botch rules for any structured format that you make as well

the fact that your program outputs bogus logs now is a problem with your program, not with the logging protocol.

KS2011: Structured error logging

Posted Oct 25, 2011 7:56 UTC (Tue) by liljencrantz (guest, #28458) [Link] (4 responses)

Minor nit: You can easily parse quoted strings using something like:

"([^"]|\\.)*"

This will work since regexps choose the longest mathing string. Or am I missing something?

KS2011: Structured error logging

Posted Oct 25, 2011 8:21 UTC (Tue) by l0b0 (guest, #80670) [Link] (3 responses)

You also need to account for the fact that you might have an even or odd number of backslashes before the quote:

echo '"foo \"bar\" baz"' | grep -E '"([^"]|\\.)*"' # Succeeds
echo '"foo \"bar\\" baz"' | grep -E '"([^"]|\\.)*"' # Ouch, that's a literal backslash, not an escaped quote!

To fix it, we would need to check that any quotes are preceded by an *odd* number of backslashes: "([^"]|(?<=\\(\\\\)*)")*" Unfortunately this doesn't work with grep -P ("lookbehind assertion is not fixed length"). I don't know if any other regex engines support this.

KS2011: Structured error logging

Posted Oct 25, 2011 8:53 UTC (Tue) by iq-0 (subscriber, #36655) [Link] (2 responses)

Try this one: "(\\\\|\\[^\\]|[^\\"])*"

KS2011: Structured error logging

Posted Oct 25, 2011 9:35 UTC (Tue) by liljencrantz (guest, #28458) [Link] (1 responses)

Or "(\\.|[^\\"])*"

KS2011: Structured error logging

Posted Oct 25, 2011 10:02 UTC (Tue) by nix (subscriber, #2304) [Link]

This I think is proof that you need a proper parser rather than just regex matching. Regexps are not a parser, they are (the core of) a tokenizer.

KS2011: Structured error logging

Posted Oct 25, 2011 10:00 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

There is something else that should be done. The syslog network protocol should be changed (or, rather, a new one defined) that contains *both* the formatted message *and* its facility, priority, format string and args, cleanly separated, and syslog() tweaked to generate that. Then syslog consumers can do proper classification without needing to bother with all this UUID nonsense, nor with (as now) some horrific scheme involving analyzing large numbers of messages (or by-hand work) to figure out which bit of them is the format string and which bit is not.

KS2011: Structured error logging

Posted Oct 25, 2011 16:34 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

have you looked at the latest syslog RFC? I think it supports exactly what you are looking for.

however you need to note that the problematic formatting isn't done by syslog, but by the applications that are generating the log.

even in the old syslog RFC, the portions under the control of the syslog daemon are well defined and specified (although there is a LOT of stuff out there that violates these specs)

KS2011: Structured error logging

Posted Oct 26, 2011 14:01 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

That's why I said syslog() needed to be tweaked. Not syslogd, the syslog() function in libc.

KS2011: Structured error logging

Posted Oct 27, 2011 1:24 UTC (Thu) by dlang (guest, #313) [Link]

Ok, that makes sense.

KS2011: Structured error logging

Posted Oct 25, 2011 16:53 UTC (Tue) by erwbgy (subscriber, #4104) [Link] (4 responses)

I have often wondered. Do you intentionally write like e.e.cummings ? :-)

KS2011: Structured error logging

Posted Oct 25, 2011 17:04 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

no, and I'm not sure what you are referring to about my writing style.

KS2011: Structured error logging

Posted Oct 26, 2011 13:56 UTC (Wed) by sorpigal (guest, #36106) [Link] (1 responses)

e. e. cummings is famous for using only lower-case characters

KS2011: Structured error logging

Posted Oct 27, 2011 1:19 UTC (Thu) by dlang (guest, #313) [Link]

Ahh, Nothing deliberate, just pure laziness. I do tend to be lazy about capitalisation except when writing formally.

KS2011: Structured error logging

Posted Oct 26, 2011 6:56 UTC (Wed) by sdalley (subscriber, #18550) [Link]

Hmm. He does seem to have a serene disregard for the rule that sentences should start with a capital letter. On the other hand, he doesn't seem an up so floating many bells down kind of guy, at all ...

KS2011: Structured error logging

Posted Oct 25, 2011 9:55 UTC (Tue) by nix (subscriber, #2304) [Link]

Parsing log files is a hard problem - tools that do so generally turn into [Kay Sievers and Lennart Poettering] "regex horrors."

This is why syslog-ng has had a proper parser, driven by a database of log message formats, for many years now. No regex horrors needed. It's really quite nice (and vastly underused).

KS2011: Structured error logging

Posted Oct 25, 2011 19:21 UTC (Tue) by raven667 (subscriber, #5198) [Link] (3 responses)

Here is a crazy thought; is it possible at compile time to get a hash of the contents of the printk() call (before format string replacements) to generate a unique ID? That should give every unique version of a message a unique identifier that could be a key for searching and parsing. Other kinds of conventions or standards for the contents of a message, especially for multi-line messages, would be useful as well to simplify parsing but anything that requires massive changes all over the kernel is probably no good.

KS2011: Structured error logging

Posted Oct 27, 2011 7:18 UTC (Thu) by epierre (subscriber, #60841) [Link]

If you change the format string to either correct a typo or add a little more information, the hash value changes, that makes the identifier very unstable.

KS2011: Structured error logging

Posted Oct 27, 2011 11:10 UTC (Thu) by Felix.Braun (guest, #3032) [Link] (1 responses)

The thought of translating error messages was also floated. That would obviously change the has value. But then, I think that nowadays every sysadmin has some basic understanding of english. So I'm not so sure translation would even be necessary.

KS2011: Structured error logging

Posted Nov 3, 2011 13:52 UTC (Thu) by renox (guest, #23785) [Link]

> But then, I think that nowadays every sysadmin has some basic understanding of english.

Totally wrong, just think about the use of Linux on desktops where each user is also a "sysadmin".

KS2011: Structured error logging

Posted Nov 1, 2011 5:04 UTC (Tue) by jcm (subscriber, #18262) [Link] (1 responses)

I'm fond of my 1980s technology, but I see the point here. Still, I would hope any such effort is properly standardized between distributions. If there's an opportunity for larger technology standardization between Linux and Unix-like systems out there, that should be done too. This kind of thing should not be brought to the attention of industry standards bodies after it's been done.

KS2011: Structured error logging

Posted Nov 1, 2011 14:32 UTC (Tue) by nix (subscriber, #2304) [Link]

But standardizing things before widespread implementation experience is available is a recipe for awful standards. You *must* implement first, or you end up with the export keyword all over again.