Toward better kernel releases

[Posted December 7, 2004 by corbet]

It was asked recently: is the 2.6.10 release coming sometime soon? Andrew Morton replied that the latter part of December looked like when it might happen. He also noted that he is trying to produce a higher-quality release this time around:

We need to be be achieving higher-quality major releases than we did in 2.6.8 and 2.6.9. Really the only tool we have to ensure this is longer stabilisation periods.

Andrew also noted that getting people to test anything other than the final releases is hard, with the result that many bugs are only reported after a new "stable" kernel is out. If things don't get better, says Andrew, it may be necessary to start doing point releases (e.g. 2.6.10.1) for the final stabilization steps. Alternatively, the kernel developers could switch to a new sort of even/odd scheme, so that 2.6.11 would be a new features release, and 2.6.12 would be bug fixes only.

Much of the discussion, however, centered around regression testing. If only there were more automated testing, the reasoning goes, fewer bugs would make it into final kernel releases. This wish may eventually come true, but, for now, it appears that regression testing is not as helpful as many would like.

OSDL has pointed out that it runs a whole set of tests every day. The problem, they say, is getting people to actually look at the results. It may be that not enough people know about OSDL's work, and, for that reason, the output is not being used. But it also may be that the testing results are simply not that useful.

Consider this posting from Andrew Morton on regression testing:

However I have my doubts about how useful it will end up being. These test suites don't seem to pick up many regressions.... We simply get far better coverage testing by releasing code, because of all the wild, whacky and weird things which people do with their computers. Bless them.

The test suites, it seems, are not testing for the right things. One could argue that the test suites simply have not, yet, been developed to the point where they are performing comprehensive testing of the kernel. This gap could be slowly filled in by having kernel bug fixes be accompanied by new tests which verify that the bug remains fixed. Much of the code in the kernel, however, is hardware-specific, and that code is where a lot of bugs tend to be found. Hardware-specific code can only be tested in the presence of the hardware in question. Outfitting a testing lab with even a fraction of the hardware supported by Linux would be a massively expensive undertaking.

So the wider Linux community is likely to remain the testing lab of last resort for the kernel; the community as a whole, after all, does have all that hardware. And the truth of the matter is that helping with testing is part of the cost of free software (and of the proprietary variety as well). So the best results might be had by trying to get more widespread testing earlier in the process. Getting Linus to distinguish between intermediate and release candidate kernels might help in that regard. If that can't be done, then, perhaps, going with point releases may be required.

Index entries for this article
Kernel	Development model/Kernel quality
Kernel	Regression testing
Kernel	Releases

Hardware driver testing should be done by manufacturers

Posted Dec 9, 2004 10:14 UTC (Thu) by walles (guest, #954) [Link] (1 responses)

Regression testing of hardware drivers should be done by the manufacturers of said hardware. But until they start cooperating, I agree with the article that that probably won't happen.

Hardware driver testing should be done by manufacturers

Posted Dec 9, 2004 10:47 UTC (Thu) by hensema (guest, #980) [Link]

Feature/bugfix releases should not be needed. -preX kernels are feature releases and -rcX kernels are bugfixes. But Linus only releases -rcX kernels so one nevers knows when a kernel is supposed to be stabilizing.

A -rcX kernel should really be an invitation to interested people to give the kernel a test drive. Those interested people might not be that interested in running kernels having new and buggy features.

Test against virtual hardware?

Posted Dec 9, 2004 16:52 UTC (Thu) by AJWM (guest, #15888) [Link] (3 responses)

Hardware-specific code can only be tested in the presence of the hardware in question.

Or against a (damn good) emulation of it. This is, after all, how firmware and software gets developed for hardware that is itself still in the design process.

Yes, it would no doubt be a rather large effort to develop suitable virtual hardware "devices" to be plugged into a virtual machine for testing, if those devices have to mirror exactly the idiosyncracies of real-world hardware. But it's not impossible, and it's the kind of project that can be approached in a piecemeal and distributed way that's ideal for the bazaar.

Start with one of the existing open virtual systems, make the virtual devices pluggable modules, and then tweak the virtual devices to act like specific real-world hardware rather than some idealized hardware. Once a virtual gizmo is thoroughly tested against its real counterpart in terms of bug-compatible behaviour, you can then run regression tests against that hardware on a virtual machine.

There are undoubtedly classes of bug that this won't catch, but they'll also be of the sort that are less likely to occur in the field anyway.

-- Alastair

Test against virtual hardware?

Posted Dec 9, 2004 19:25 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

Electronics companies treat their simulation/emulation models for their devices as their deepest, darkest secrets; often such models can't even be shared with other employees of the same company without considerable bureaucracy.

Maybe the best thing is to equip a testing laboratory (say OSDL) with a whole lot of oddball hardware, so they can do the tests. This will cost money, but money could be raised (and we might be able to persuade hardware manufacturers to contribute some hardware).

Test against virtual hardware?

Posted Dec 9, 2004 22:43 UTC (Thu) by khim (subscriber, #9252) [Link]

The fact is: the first silicon is almost never sold. So there are a lot of stuff you just can not test with emulation. And kernel bugs tend to be so subtle I doubt you can do anything with emulation.

Test against virtual hardware?

Posted Dec 16, 2004 9:07 UTC (Thu) by alexs (guest, #13637) [Link]

just one word: FAUmachine

and some eplanation:
that is a usermode machine emulator
which is able to emulate misc hardware
starting from IDE devices (on top of the bus interface protocoll)
and ending with sound system emulations.
the whole thing has an optional VHDL frontend(!)
that allows rather in deep autmation
but it is useable as a pc emulator like VMware as well.

lets see what is possible in the long term.

Distributed Kernel Regression Testing

Posted Dec 10, 2004 18:13 UTC (Fri) by jabby (guest, #2648) [Link]

Why not do what the internet and the community do best? If OSDL developed a platform for doing distributed kernel testing, I'll bet a bunch of people would sign up for it.

If I understand correctly, one can safely run a new kernel on top of a known stable kernel with User-Mode Linux (UML). I believe the test kernel still has access to the hardware via the host kernel. Since the UML-ized kernel is just a process under the host kernel, a fatal error will not bring down the whole system. Perhaps the host environment could even monitor the UML process and do some sort of base-level reporting when something goes wrong.

This just needs to be packaged up in a nice way so that the willing can join a network of volunteer kernel testers. Put the UML folks in touch with the OSDL folks and I think we might have a solution.

Jason

i would like the point releases plus pre/rc clarification

Posted Dec 16, 2004 9:16 UTC (Thu) by alexs (guest, #13637) [Link]

let people do point releases so that the kernel users are served
until the next kernel tree has undergone a reasonable testing time.

point releases should be fixes for anything critical,
like security leaks, kernel oopses and general disabilities
of prior working components. for me when i am a user i want
to have it going out of the box with no risk because i am
not able to track down any single needed patch from lklm
which is known to have a mailing rate of 100/day.

other than that, Linux and his core crew should give a bit
more information to the outside world when the kernel is
open for feature addition and when it is only open for fixing
some period prior to changing the state because that will
help massively in getting into the user testing cycle
when needed and getting out of it when large scale testing
makes no sense for production systems but only for systems
that are test systems that were built for getting broken!

-Alex.