Kernel regression tracking, part 1
Leemhuis begin by pointing out that he started doing this work even though he does not work for a Linux company; he is, instead, a journalist for the largest computer magazine in Germany. He saw a mention of the gap that was left after Rafael Wysocki stopped tracking regressions, and thought that he might be a good fit for the job. This work is being done in his spare time. When he started, he had thought that the job would be difficult and frustrating; in reality, it turned out to be even worse than he expected.
Why is it so hard? The first problem is that nobody actually tells him about regressions, so he has to hunt them down himself. That means digging through a lot of mailing lists and bug trackers. Wysocki noted that things are worse than they were years ago when he did the job, there are a lot more information sources. It is more, Wysocki said, than any one person can follow.
Leemhuis went on to say that a lot of regressions are also
fixed without him even noticing. Nobody tells him about progress toward
fixing regressions, so that, too, must be tracked manually. He had asked developers to include a special
identifier in discussions on regressions, but nobody has done it. That is
unfortunate, since he had thought it would be a useful mechanism; perhaps,
he said, he should have tried harder. Ben Herrenschmidt agreed, saying
that it can be hard to get people to change their established workflow to
incorporate a new mechanism. James Bottomley noted that maintainers would,
in general, rather avoid having their bugs termed "regressions", since that
increases the pressure for an immediate fix.
Leemhuis raised the idea of creating a dedicated mailing list for regressions, with reporters asked to copy their reports there. Wysocki agreed that this might be useful, but said that the information on how to report regressions properly needs to be better communicated. Laura Abbott concurred, saying that the documentation in this area should be improved.
Herrenschmidt noted that most bug reports come from distributor kernels rather than the mainline. For distributions like Fedora, which ships something close to a current mainline kernel, these reports can be relevant, though are still a version or two behind the current development kernels. Reports of bugs in enterprise kernels, instead, have little value. Bottomley added that Linus Torvalds is mostly interested in mainline regressions; the resources just don't exist to track regressions in distributor kernels as well.
There was general agreement that only mainline regressions should be tracked, but Ted Ts'o said that the community could look for volunteers to track regressions in older kernel versions. The work is still useful, he said, and would train others to help with regression tracking. The problem with this idea, Bottomley replied, is that one has to be an idiot to want to do this work — an idea that Leemhuis seemed to concur with. There won't, Bottomley added, be a flood of volunteers in this area. Matthew Wilcox's suggestion that the situation could change because there are a lot of journalists being laid off was not seen as entirely helpful.
Abbott said that, in her role as a Fedora kernel maintainer, she sees a lot of bug reports, but many of them are of low quality. They need to be filtered before being passed on to any sort of core regression list. Arnd Bergmann added that Linaro has been doing more testing recently and finding regressions in linux-next. But Leemhuis said he is really only interested in regressions that make it to the mainline at this point.
Leemhuis went on to say that, while Wysocki used the kernel's Bugzilla tracker to handle regressions, it "looks like double-entry accounting" to him and he has avoided it. There is a lot of overhead associated with working in Bugzilla, and kernel developers tend not to like it. So he has been using the mailing lists instead, but perhaps that was the wrong decision?
Wysocki replied that he used Bugzilla because it was suitable for him; it provided a useful archive of the discussions around regressions. Ts'o said that the real problem is that Torvalds will not dictate a single bug-tracking system for the kernel, so the information is scattered around the community. The kernel Bugzilla is not perfect, he said, but it has the advantage of actually existing and being available. Wysocki added that there needs to be a database somewhere; it should be possible to point people to a definitive entry for a regression. Takashi Iwai said that, for distributors, the most important thing to have is an overview of the situation; that is missing now. There is no comprehensive list of problems, so distributors must go through the time-consuming task of polling a number of different bug trackers.
Wilcox asked if distributors use the regression list for decisions on which kernel versions to ship, or whether those decisions are purely based on time. Abbott replied that Fedora tries to ship the latest mainline kernel, but the decision on pushing a specific kernel does depend on the current regressions. A significant Intel or AMD graphics regression will cause a kernel to be held back, she said, while "an obscure USB dongle" problem will not.
Ben Hutchings said that the situation at Debian is similar, at least outside of the long-term support releases. Iwai said that openSUSE Tumbleweed ships the latest kernel, meaning that regression reports are relevant to the current mainline release, not the development kernel that the kernel developers are working on currently. There are, he said, not many people testing the -rc kernels. Jiri Kosina added that SUSE tracks the "Fixes" tags in patches to see which bug fixes are relevant to the kernels they have shipped; those fixes will be backported if needed. That has led to a reduction in the regressions reported with openSUSE kernels.
Leemhuis asked if he should query developers via email more often the way Wysocki did; Wysocki replied that he didn't do that — his scripts did. Mark Brown said that was a good thing, since the scripts were more polite than their author. Overall, there didn't appear to be any opposition to more email if that's what is needed to improve regression tracking.
As the discussion came to a close, it was noted that regression reporting is hard for most users. They don't know where to send their reports, and there is little information out there to help them. The noise on the mailing lists does not help. The kernel Bugzilla is especially problematic since it is the wrong place to report many bugs, but it's not clear which ones or where they should actually go. Ts'o said that, if it were up to him, he would designate the Bugzilla as being for all kernel bugs, and that subsystem maintainers would simply be told to cope with it. In the absence of such a policy, users will continue to struggle.
The final suggestion came from Abbott, who said that perhaps users who send email to the linux-kernel list (and nobody else) should get an automatic response. That response would inform them that email sent only to the list is unlikely to be read by many people and would thus probably not get a response. It would include suggestions regarding how to more successfully report bugs. This idea was generally well received.
This topic was revisited during the invitation-only Maintainers Summit two days later.
[Your editor would like to thank the Linux Foundation, LWN's travel
sponsor, for supporting his travel to this event].
Index entries for this article | |
---|---|
Kernel | Regression tracking |
Conference | Kernel Summit/2017 |
Posted Oct 31, 2017 21:05 UTC (Tue)
by roc (subscriber, #30627)
[Link] (5 responses)
Requiring bug reporters to directly email maintainers about specific bugs seems really bad for everyone. It makes the bar for reporting bugs very high, since you need to track down the right maintainers, which is practically impossible for end users, and if you guess wrong then your bug is likely to be ignored. I assume it also sucks for maintainers, since it must be difficult to share the workload between maintainers or if the maintainers change, plus every maintainer has to implement their own issue tracking in their email client.
It's amazing how far Linux has come with such immature development practices across the board. This nonsense is tolerated because Linux is so successful, but in a less successful project it would be scorned. If a serious alternative open-source kernel ever arises with more rational development practices and takes share from Linux, people will look back and say "how did they ever think developing *that* way was a good idea?"
Posted Nov 1, 2017 10:37 UTC (Wed)
by pizza (subscriber, #46)
[Link] (4 responses)
So is Linux successful in part due to this methodology, or in spite of it? Or both?
Posted Nov 1, 2017 10:50 UTC (Wed)
by roc (subscriber, #30627)
[Link] (3 responses)
I just don't like to see questionable Linux development practices justified by "Linux is successful, so this must be right".
Posted Nov 1, 2017 12:22 UTC (Wed)
by Paf (subscriber, #91811)
[Link]
Posted Nov 1, 2017 21:33 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (1 responses)
Is it? Can you point to some evidence please.
Posted Nov 2, 2017 0:36 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Whether maintainers are tracking regressions in their heads, in a text file, writing notes on a napkin, or however, they are doing it, just in a way that's inaccessible to others.
Posted Nov 1, 2017 3:05 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (10 responses)
How is centralized tracking meant to help? I don't imagine that its use as a management-metric would really matter to most people. Seriously: who cares?
Posted Nov 1, 2017 7:01 UTC (Wed)
by corbet (editor, #1)
[Link] (4 responses)
Posted Nov 1, 2017 16:14 UTC (Wed)
by knurd (subscriber, #113424)
[Link] (1 responses)
Posted Nov 1, 2017 22:14 UTC (Wed)
by neilbrown (subscriber, #359)
[Link]
This is not an example of "maintaining a regression list is useful". This is an example of "members of the community supporting each other to encourage change". James reported a regression, the maintainer disagreed, someone else (Thorsten, and eventually Linus) joined in to make the case.
We always need more competent people to follow issues in various fora, to review not only patches but also bug reports and design discussions and anything else. In this case Thorsten Leemhuis joined in and pushed things along. This was a valuable contribution to be applauded, but it is not a contribution that needs to be centralized; it just needs to be done. Were you following the thread at the time? Maybe even you could have pushed things along.
Posted Nov 1, 2017 21:29 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (1 responses)
Given that mainline is the only focus discussed, and given the current development model, this seems to mean "should Linus release an -rc7, go straight to -final". If that is ever a hard decision, the just default to -rc7. In fact, I wonder why we don't have a fixed N-week cycle (with variation on if Linus' holidays require it).
> (2) increase the odds that regressions get fixed rather than falling through the cracks
Does it though? And is it the most beneficial way to achieve that?
My core point is that regression tracking is best done in a distributed fashion (like everything else in the community except "being Linus" which is centralized). If you find a regression then it is your responsibility to push for a solution, just as if I find a regression it is mine. If we both hit the same regression then we might end up working together and pushing harder for less individual effort.
Posted Nov 1, 2017 22:59 UTC (Wed)
by Paf (subscriber, #91811)
[Link]
Not “we don’t need you”, but “we know you have little time, so here’s help”.
Posted Nov 1, 2017 9:44 UTC (Wed)
by roc (subscriber, #30627)
[Link]
Posted Nov 1, 2017 10:52 UTC (Wed)
by Funcan (subscriber, #44209)
[Link]
Posted Nov 1, 2017 12:27 UTC (Wed)
by Paf (subscriber, #91811)
[Link]
That’s a slightly different argument, though. I, like you, don’t really understand tracking regressions vs new bugs. So little of the kernel is actually providing totally novel functionality, outside of drivers, that I think most bugs could be considered regressions in a certain light. I mean, when you do the next rewrite of path lookup, that’s great and useful work, but it doesn’t provide new end user functionality, just performance. So I guess any bugs in it are regressions...?
Posted Nov 1, 2017 18:39 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link] (1 responses)
Posted Nov 1, 2017 21:51 UTC (Wed)
by neilbrown (subscriber, #359)
[Link]
How else can you ever really know? It is standard-operating-procedure, when you hit a problem, to make sure you are running the latest version of everything - isn't it? It is only when you experience a bug on the latest software that any of this becomes important.
Posted Nov 2, 2017 14:24 UTC (Thu)
by jani (subscriber, #74547)
[Link]
MAINTAINERS has "B:" entry to specify the preferred channel for bug reports per subsystem. That's a start. But it needs to be used more, and then made more accessible to bug reporters.
> Ts'o said that, if it were up to him, he would designate the
I don't think you can "simply tell" maintainers to do anything.
For drm/i915 we prefer bug reports at https://bugs.freedesktop.org/ because it's much more likely the graphics bugs get reassigned between kernel and userspace components than between kernel components. We cope with https://bugzilla.kernel.org/ by telling people to file the bugs at fdo instead.
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Or maybe I should just say [citation needed].
Kernel regression tracking, part 1
Kernel regression tracking, part 1
As a developer or maintainer, my only interest in regressions is fixing them. Once fixed they don't need to be tracked - though they are usually recorded using Fixes: tags.
As a user, I still just want it to be fixed, though finding a work-around, or reverting to an old version are certainly options. So I look for someone to complain to and make a noise.
I do believe the point is to (1) have a sense for how ready a given kernel is, and (2) increase the odds that regressions get fixed rather than falling through the cracks.
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
You don't need a maintained list of regressions to get things fixed (and I doubt it helps much). You need people to care and report and contribute and persist.
Kernel regression tracking, part 1
The more people who take responsibility, the more data, testing, and expertise is available, and the more likely it is that a fix will be found. I think that reporting a bug and following through to a solution is a good way for people of any skill level to feel connected with the community. Giving people responsibility is an important first step to them taking it. If a "regression maintainer" takes that responsibility, we say to the community "we don't need you".
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Regressions can sort of... sneak in. They are often unintended side-effects of changes, and can gradually accumulate because no one is looking at an area (notably of performance) any more. And then, gradually, something that used to be all tuned up doesn’t work well any more.
Kernel regression tracking, part 1
Kernel regression tracking, part 1
Surely the best way to find out if the regression is known is to use "your favorite search engine". That doesn't require a centralized regression list (except that one automatically maintained by the engine's web crawler).
Kernel regression tracking, part 1
> reporting is hard for most users. They don't know where to send
> their reports, and there is little information out there to help
> them. The noise on the mailing lists does not help. The kernel
> Bugzilla is especially problematic since it is the wrong place to
> report many bugs, but it's not clear which ones or where they
> should actually go.
> Bugzilla as being for all kernel bugs, and that subsystem
> maintainers would simply be told to cope with it. In the absence
> of such a policy, users will continue to struggle.