Testing for kernel performance regressions

Posted Aug 4, 2012 22:12 UTC (Sat) by drdabbles (guest, #48755)
Parent article: Testing for kernel performance regressions

One of the major problems with testing the kernel is, as the author notes, the lack of a standard test suite. Picking test suites is like picking religion in most cases- You may be morn into it, you may choose it, or you may not believe in it at all...but everyone has a strong opinion.

The next problem is that there is little incentive for the hardware vendors that contribute directly to the kernel (I'm looking at you, Intel) to also contribute a test suite for their contributions. Moreover, it's quite conceivable that contributing a test suite would allow people to reverse engineer the hardware in question to some extent. I don't see this as a problem, but then again I don't make billions of dollars a year selling chipsets embedded on nearly every device in existence. There may also be regulatory concerns here, much like open-source WiFi drivers had to contend with here in the US several years ago. So, it's a sticky situation.

Additionally, you have the problem of how to actually execute tests. Software bugs aren't always painfully obvious. You don't always get a panic or segfault when a program or driver messes up. In fact, sometimes the program runs perfectly unaware that it has completely ruined data. This problem of subtle bugs can be seen in audio chipset drivers frequently. Sometimes an upgrade causes audio artifacts, sometimes the ports are reversed, and sometimes the power saving mechanisms act wonky. But only after a suspend from a warm reboot where a particular bit in memory wasn't initialized to 0. These things are extremely hard to detect with a test suite, because the suite has no idea if the audio is scratchy or if the port a user expects to be enabled is working properly.

Finally, if you could overcome the issues above, you have the case where suspend/resume/reboot/long run time causes a problem. To test this, the test suite needs complete access to a computer at a very low level. Virtualization will only get you a small portion of the way there. Things like PCI pass through are making this kind of test easier, but that in itself invalidates tests on a very basic hardware access level. This is where the idea of hardware vendors contributing resources to a test farm becomes a great idea. And as a comment above mentioned, the Linux Foundation could create some incentives for this. A platinum list of vendors / devices would be excellent! My organization ONLY uses Linux and *BSD in the field, so having that list to purchase from would be a pretty big win for us.

I think the solution will have to be several layers. A social change will need to be made, where developers don't dread writing test suites for their software. A policy change may be needed such as, "No further major contributions will be accepted without a test suite attached as well". And finally, the technical requirements to actually execute these test suites.

The good news is that booting a kernel with TESTSUITE=yes option could kick the kernel into a mode where only test suites are executed would be pretty easy. The box would never boot the OS, but would sit there running tests for all built (module or in-kernel) components, hammering on hardware to make drivers fail. Passing a further option that points at a storage device could be useful for saving results to a raw piece of hardware in a format that could be uploaded and analyzed easily.

Testing for kernel performance regressions

Posted Aug 4, 2012 23:44 UTC (Sat) by pabs (subscriber, #43278) [Link] (2 responses)

Don't pick one test suite, pick all of them!

Testing for kernel performance regressions

Posted Aug 4, 2012 23:54 UTC (Sat) by drdabbles (guest, #48755) [Link] (1 responses)

At some point, doing that becomes a maintainability nightmare and an impractical approach. Less time should be spent on setting up the test environment than running the tests.

Having to install 2, 3, or 4 test suite packages just to run the tests means nobody will ever actually run them.

Perhaps a solution like GIT- built specifically for the Linux kernel use case, could be helpful.

Testing for kernel performance regressions

Posted Aug 5, 2012 2:56 UTC (Sun) by shemminger (subscriber, #5739) [Link]

OSDL tried this, and trust me it failed. The problem is that lots of hardware is expensive to maintain and most tests give no meaningful result. They invested at a least 10 person/years of effort in getting a working test infrastructure and running tests, and the result was only a few bugs that were worth the attention of kernel developers.

Random testing is often better than organized testing! Organized testing works for benchmarks, but the developer in tawain who boots on a new box and reports that the wireless doesn't work is priceless.