Updates on the KernelCI project
The kernelci.org project develops and operates a distributed testing infrastructure for the kernel. It continuously builds, boots, and tests multiple kernel trees on various types of boards. Kevin Hilman and Gustavo Padovan led a session in the Testing & Fuzzing microconference at the 2018 Linux Plumbers Conference (LPC) to describe the project, its goals, and its future.
KernelCI is a testing framework that is focused on actual hardware. Hilman is one of the developers of the project and he showed a picture of his shed where he has 80 different embedded boards all wired up as part of the framework. KernelCI came out of the embedded space and the Arm community; there are so many different hardware platforms, it became clear there was a need to ensure that the code being merged would actually work on all of them. Since then, it has expanded to more architectures.
Another goal of the project is to be distributed. No one lab is going to have all of the hardware that needs to be tested. There is a centralized build facility that tracks multiple trees, including the mainline, linux-next, stable trees, and maintainer trees, and builds kernels for the various labs. Hilman's is just one of around ten different labs currently, he said; all of the reporting is centralized at kernelci.org.
Right now, most of the testing is just building and booting the kernels, which actually breaks "quite often". There are more than 250 unique boards and systems that cover 37 unique system-on-chips (SoCs). Over the last few years, KernelCI has done more than four million boots.
If a kernel boots to a shell, that is considered a test "pass". For stable trees and the mainline outside of the merge window, roughly 98% of the kernels pass, but it is "much worse" for linux-next. In particular, linux-next for non-Intel hardware is not all that stable. Arm is getting lots better, but there are still problems; generally, problems in linux-next are caused by some dependency that has been missed, Hilman said.
The current system will send mail to the architecture (or sub-architecture) maintainers when things break. There is work being done to bisect problems so that a particular commit (thus developer) can be identified and notified.
The kinds of problems that KernelCI finds are "all over the place", Hilman said. Many are dependency related, where the driver has changed but the device-tree changes did not make it into the tree, for example. Those are the kinds of problems that are expected to be caught in linux-next. Beyond that, the kernel size is increasing, so the change in memory layout that results can sometimes cause the boot to fail. There is a mix of lab infrastructure problems and kernel problems as might be expected.
Beyond build and boot
When the project got started, it was meant to help find problems where the default configuration (defconfig) for a particular SoC or board would not build. Once that part was mostly handled, KernelCI moved on to testing whether those kernels would boot. Now that is working well, so the project is starting to add testing after the kernel has booted. Basically, the developers wanted to handle a breadth of hardware first and now they are getting to the depth part by running things like kselftest and the Linux Test Project on a subset of the hardware.
Padovan said that his employer, Collabora, has been helping with KernelCI development recently. One of the areas that he and others have been working on is to add more test suites, including tests for video and display along with some tests that look at basic functionality of some subsystems (e.g. USB, suspend/resume). There has also been work on better reporting of the errors, both via email and on the web site. Hilman noted that getting a useful report to the right developer is a more difficult problem that is still being worked on.
An attendee asked about getting a custom kernel tree tested as part of KernelCI. Hilman said that can be done with a request to the project. KernelCI is not interested in testing vendor trees, but any upstream-focused tree can be added. In answer to another question, Hilman said that patches posted to the mailing lists are not being tested currently, but it is something he would like to see added—though it may still be a ways off.
A standardized Debian-based root filesystem for all architectures is also in progress, Padovan said. An attendee asked if any of the tests involved systemd, which tends to break more readily when the kernel does unexpected things. The root filesystem is fairly minimal, but there are some basic tests that involve systemd, Padovan replied. A lot of the build infrastructure for KernelCI is handled by Jenkins; that has recently moved to using Jenkins Pipelines. There has been a lot of work on documenting the project on its wiki as well.
Auto-bisection is under development too. The email report used to just say that the testing failed, but now auto-bisection tries to find the commit that caused the problem. It is similar to what the 0day testing infrastructure does, Padovan said, just on more hardware. Auto-bisection was in beta at the time of the microconference, but has since been announced on the kernel mailing list.
The reliability of the auto-bisection was the subject of an attendee query. Padovan said that it can certainly fail, for example by never ending or by pointing to a commit for a different architecture, so there is a manual step required at times. In addition, a lab infrastructure failure looks like a boot failure, which can lead to a bad bisection.
That led to a question about the reliability of the lab infrastructure. Hilman said that the reliability is not really dependent on whether it is a home lab versus a corporate lab; it has more to do with how closely the lab is monitored. His lab is well monitored because he sits right next to it most of the time; other labs have to get reports of problems before they get fixed. He wishes they had kept statistics on all of that. He did also note that the problem is sometimes the hardware under test itself: it might be flaky, need a firmware update, or the like.
There is a "decent mix" of new and old hardware being tested. When board companies come out with new boards, they often send them to one of the labs. If someone wants to start a new lab, instructions for setting that up have recently been added to the wiki, Hilman said. He suggested that those who are interested in the project ask questions on the mailing list or on the Freenode #kernelci IRC channel.
New Linux Foundation project
Hilman said that the project gets a lot of requests for new features, but does not have the ability to handle them all—more developers are needed. To that end, KernelCI is becoming a Linux Foundation project soon. Founding members are being recruited now. Once the project and its funding are established, there are plans to update the user interface as it is "getting a bit dated". It also does not provide ways to mine the data that is being collected. "We have a lot of data that we are not doing much with", such as boot time, he said.
Adding more architectures and toolchains is planned, as is more test suites. There is a lot of testing on real hardware that KernelCI is doing, but there is clearly room for more.
[I would like to thank LWN's travel sponsor, The Linux Foundation, for
assistance in traveling to Vancouver for LPC.]
Index entries for this article | |
---|---|
Conference | Linux Plumbers Conference/2018 |
Posted Nov 28, 2018 4:10 UTC (Wed)
by ndesaulniers (subscriber, #110768)
[Link]
Posted Nov 28, 2018 7:44 UTC (Wed)
by mupuf (subscriber, #86890)
[Link]
This is especially true when we are talking about performance bisection where performance can go from 100fps -> 90 fps between Linux 4.18 and 4.19. But actually, the performance went more like this (in a linearized history): 100 -> 120 -> 30 -> 60 -> 100 -> 20 -> 90 fps. With git bisect, one would think that mapping bad to any revision with a performance lower to 95 fps would do the trick, but it would only find the first regression, and leave the rest for humans to deal with... The root of the problem is that once we start a bisection job, we cannot change the "question".
For automated bisecting, we found that the most interesting types of bisections are recursive. Every time the kernel (or another project) would behave differently (for any sort of metric), we would just re-execute our tests on a version roughly mid-way between the differing versions. This allows us to bisect rendering, performance, or unit tests without issues. Also, when running test suites executing tens of thousands of tests, you do not want to bisect each test regressing individually, but instead group them together when they all differed between two versions. This saves a tremendous amount of time!
In the end, Petri Latvala and I ended up writing up our own implemention of git bisect to support the above goals. This allows us to annotate the git history with all the impact it had on the different tests of interest, which makes it super simple to see the commits of interest and what they changed across many machines, test suites and benchmarks. This was presented at XDC 2014: https://www.x.org/wiki/Events/XDC2016/Program/peres_ezben... (https://youtu.be/KIHrjgZJHZA?t=5h41m55s)
Our repo is now hosted here: https://gitlab.freedesktop.org/ezbench/ezbench
Posted Nov 29, 2018 22:43 UTC (Thu)
by zoobab (guest, #9945)
[Link] (2 responses)
Posted Dec 2, 2018 17:06 UTC (Sun)
by bamse (subscriber, #105078)
[Link] (1 responses)
But automating something only supporting SD-boot might be problematic...
Posted Dec 2, 2018 17:13 UTC (Sun)
by zoobab (guest, #9945)
[Link]
I also have seen an ftdi controlled board:
https://hackaday.com/2014/06/08/the-in-circuit-sd-card-sw...
Updates on the KernelCI project
Is the bisection based on git bisect?
Updates on the KernelCI project
Updates on the KernelCI project
Updates on the KernelCI project