[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

Linking commits to reviews

By Jake Edge
September 20, 2017

Linux Plumbers Conference

In a talk in the refereed track of the 2017 Linux Plumbers Conference, Alexandre Courouble presented the email2git tool that links kernel commits to their review discussion on the mailing lists. Email2git is a plugin for cregit, which implements token-level history for a Git repository; we covered a talk on cregit just over one year ago. Email2git combines cregit with Patchwork to link the commit to a patch and its discussion threads from any of the mailing lists that are scanned by patchwork.kernel.org. The result is a way to easily find the discussion that led to a piece of code—or even just a token—changing in the kernel source tree.

Courouble began with a short demo of the tool. It can be accessed by typing (or pasting) in a commit ID on this web page, which brings up a list of postings of the patch to various mailing lists; following those links shows the thread where it was posted (and, often, discussed). Another way to get there is to use cregit; navigating to a particular file then clicking on a token will bring up a similar list that relates to the patch where the symbol was changed. Note that the Patchwork data only goes back to 2009, so commits before that time will not produce any results.

[Alexandre Courouble]

So, email2git allows those interested to get a look into the design decisions that went into a particular chunk of code. Without it, doing so manually is not particularly easy. There are several use cases that he presented, starting with security researchers, who want to understand the thinking when a patch was made. It can also be used in bug fixing and by newcomers to the kernel community. In addition, email2git is being used as part of a recently announced Linux Foundation project: Community Health Analytics Open Source Software (CHAOSS).

Email2git takes commits from the mainline Git repository and tries to match them up to patches that Patchwork has picked up. Patchwork scans around 70 mailing lists to extract patches and the discussion threads that follow. It provides a user-friendly online interface, though email2git accesses the Patchwork database directly. Cregit is used to find changes at the token level, which is more accurate than git blame, he said.

There is not any kind of direct mapping from patches posted on a mailing list to commits in the mainline Git tree, so email2git needs to find those matches. Initially, he used a method from some research papers that effectively did an exhaustive search comparing the diff output in a commit to that in all of the posted patches until a match is found. That did not scale once he started working with the entire data set, which is some 500K mainline commits and 1.4 million patches posted since 2009. Some kind of heuristics were needed to narrow down the search space.

Courouble ended using three pieces of the patch to match them to a Git commit. The first is the subject of the email, which is often carried over into the Git commit summary. That heuristic alone finds 55% of the commits directly. Step two is to look at the patch author; he has created a map of all patches from a given author, so those can be tried next. The third heuristic is to match up the files that are affected by the patches. Each of the last two steps does comparisons of the diff in the patch and commit to make a matching decision.

The results for different kernel directories vary fairly widely. Some, like mm, kernel, tools, and virt can match 60-90% of the commits. Those are likely to be subtrees where the email subject winds up in the commit, he said. On the other end of the scale, the net directory has less than 30% matches; it turns out that Patchwork does not track the relevant mailing list. Other subtrees fall somewhere in between.

There are a number of limitations of the current system. It is only tracking the mainline for one thing, there may be other trees of interest. It is dependent on Patchwork, which is a great resource but only goes back to 2009, so some data is missing. The mbox format of the data can be inconsistent; there are patches with dates from 1970 and 2040, for example.

For the future, the plan is to make the match data available through other means, such as via a REST interface. In addition, running an instance of Patchwork in-house would allow extracting more data from other lists and perhaps going further back in time. Adding tracking for linux-next, improving the algorithm to do incremental processing, and handling patch series that are discussed in multiple threads are all on the radar. Courouble is also interested in getting feedback and ideas for other features from kernel developers.

A few of those were offered up in the Q&A. Someone suggested that it would be nice to get the "0/X" patch associated with the thread. Courouble seemed surprised to hear that Patchwork did not track those, but thought it should be added. There were also suggestions that providing guidelines on how patches move from the mailing list into the Git repositories so that they can be more easily tracked or perhaps adding Git patch IDs into the mix might help.

[I would like to thank LWN's travel sponsor, The Linux Foundation, for assistance in traveling to Los Angeles for LPC.]

Index entries for this article
ConferenceLinux Plumbers Conference/2017


to post comments

Linking commits to reviews

Posted Sep 20, 2017 20:03 UTC (Wed) by johill (subscriber, #25196) [Link]

I guess he just needs to link the ozlabs patchwork in, for net/ in particular:

https://patchwork.ozlabs.org/project/netdev/

Linking commits to reviews

Posted Sep 20, 2017 20:06 UTC (Wed) by johill (subscriber, #25196) [Link] (1 responses)

Oh, also, later versions of patchwork do in fact track the cover letter, see e.g. https://patchwork.ozlabs.org/cover/816247/

Linking commits to reviews

Posted Sep 21, 2017 1:23 UTC (Thu) by ajdlinux (subscriber, #82125) [Link]

Patchwork 2.0 (very recently deployed on ozlabs.org) does full series tracking, which could be integrated into this I suspect.

(Hopefully we'll iron out the remaining wrinkles in pw 2.0 and then convince the kernel.org admins to migrate over. Be it known that patchwork.ozlabs.org is still the original and the best Patchwork. :)

Linking commits to reviews

Posted Sep 22, 2017 19:41 UTC (Fri) by jani (subscriber, #74547) [Link]

I guess I should note that in drm/i915 and in large parts of drm in general our tooling routinely adds a commit message Link: tag with a message-id based URL back to our patchwork instance at freedesktop.org. For any drm/i915 commit you should be able to immediately find the mailing list discussion, including previous versions of the patch, and the CI results on the series.


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds