CHAOSS project bringing order to open-source metrics

September 3, 2019

This article was contributed by Sean Kerner

Providing meaningful metrics for open-source projects has long been a challenge, as simply measuring downloads, commits, or GitHub stars typically doesn't say much about the health or diversity of a project. It's a challenge the Linux Foundation's Community Health Analytics Open Source Software (CHAOSS) project is looking to help solve. At the 2019 Open Source Summit North America (OSSNA), Matt Germonprez, one of the founding members of CHAOSS, outlined what the group is currently doing and why its initial efforts didn't work out as expected.

Germonprez is an Associate Professor at the University of Nebraska at Omaha and helped to start CHAOSS, which was first announced at the 2017 OSSNA held in Los Angeles. When CHAOSS got started, he said, there was no bar as to what the project was interested in. "We developed a long list of metrics, they were really unfiltered and uncategorized, so it wasn't doing a lot of good for people," Germonprez admitted.

Learning from initial mistakes

A number of lessons were learned by the CHAOSS project team members after the first year that have guided the project in the years since. Among the somewhat obvious lessons learned was that just collecting metrics related to open-source development and dumping them into one bucket, is an approach that doesn't work, he said.

One area where there is a lot of interest in metrics is for diversity and inclusion of open-source projects. Germonprez said that type of data isn't included in digital trace data, which is data that can be derived from a Git repository or even an email list. Rather, diversity and inclusion data is something that requires a researcher to go out and ask questions in order to get the required answers. "So one of the things we started to realize is that some metrics are easy to get, and some are more challenging to get, but that shouldn't preclude you from wanting to get those metrics," Germonprez said.

Perhaps even more confusing for the CHAOSS project was the realization that different people had different interpretations of what certain terms meant. For example, how different people defined "code commits" varied and, in general, he said that the concept of code contribution was understood in a number of ways. The CHAOSS project leaders came to realize that there was a need to standardize the way the project talked about metrics to make sure the terms are clearly articulated.

Fundamentally, though, the big takeaway from the initial CHAOSS metrics efforts was that there is more to learn; that listening to the community and asking for feedback, rather than just collecting metrics, is the right path forward. "We're a community that doesn't have all the answers, we really don't," Germonprez commented. "I think maybe some people thought we did and we were going to make this project and just provide software that you could push a button and say, green, it's all healthy. But that's not going to happen. So we spend a lot of our time listening."

CHAOSS in 2019

To help bring order to the (ahem) chaos of collecting metrics, CHAOSS now has five working groups, each of which represents an attempt to think about metrics in a more categorical way. The groups are: Diversity and Inclusion, which looks at participation, Evolution which looks at how projects change over time, Risk which is focused on metrics pertaining to risk factors when using open-source software, the Value working group, which looks at metrics for determining economic value, and, finally, the Common metrics working group, which combines the metrics from the others in different ways. As he put it: "Common is a working group that looks at metrics that may have kind of a cross-cutting interest in a variety of different working groups. So for example, Common is looking at organizational affiliation and that may be a metric that you care to look at with with respect to Risk or Evolution."

On August 6 the project released the first version of its metrics in a 105-page document [PDF]. Germonprez explained that the rationale for publishing the metrics document was to help make open-source metrics consumable and deployable. The overall goal is to help understand what the pain points are for open-source projects and provide the metrics that represent the information that a project needs to be able to make decisions. "These are the first metrics that we're putting forward, to try to provide better transparency and actionability inside of your organization's projects," Germonprez said.

After the experience of the first few years of CHAOSS, he became convinced that most projects had little or no understanding of their own metrics, with no real indication of the project's health. The CHAOSS metrics are an attempt to move a project from having zero metrics to a starting point where it can figure out what's needed. "When we started we were just collecting metrics for metrics sake and we realized that was actually completely backwards," Germonprez admitted. "We weren't really taking any time to understand the goals and the questions and other metrics that address issues."

Each working group within CHAOSS has focus areas and defined goals. For example, under Diversity and Inclusion one focus area is on governance, with the goal of identifying how diverse and inclusive governance is for a given project. One of the metrics being used for that goal, is to look at the code of conduct for the project and identify how it can be used to support diversity and inclusion. "This is not about necessarily doing software contributions, it's not necessarily about helping people do deployments out in the field, it's really just us saying, these are the goals that we're trying to achieve, these are the questions to address those goals, and these are the metrics that we'd like to see to address those goals," Germonprez said.

While the focus of Germonprez's talk was about the new metrics release, CHAOSS does in fact have several software projects as part of its portfolio. Grimoire Lab provides software development analytic capabilities to help collect and visually display data. The Augur project is a rapid prototyping tool for metrics. "It's one thing to come up with a metric. It's another thing to deploy the metric," Germonprez said.

It is somewhat ironic that an effort with the name CHAOSS really is all about bringing order to the myriad variables and data points that make up any open-source effort. The metrics effort is still a work in progress, but it does serve to lay the groundwork to help organizations and project developers think about what metrics are and how they can be used to help support larger goals. It will be interesting to see in the years ahead how the metrics project continues to mature and, perhaps more importantly, how, and if, projects find ways to gain full value from them.

Index entries for this article
GuestArticles	Kerner, Sean
Conference	Open Source Summit North America/2019

Measuring the metrics

Posted Sep 4, 2019 6:16 UTC (Wed) by shemminger (subscriber, #5739) [Link] (1 responses)

It would be really useful to evaluate these metrics on known good vs known unhealthy projects.
For example, the Linux kernel versus some of the critical infrastructure projects that are on life support.

It doesn't take fine tuned metrics to find the dead ones.

Measuring the metrics

Posted Sep 13, 2019 21:09 UTC (Fri) by GeorgLink (guest, #134403) [Link]

Hi shemminger,

Yes, that would be interesting. When I was at the University of Nebraska at Omaha, working with Matt Germonprez, we worked with a student who was doing that kind of analysis. If I recall correctly, it is more complicated and we could not determine any pattern. A possible reason might have been that open source projects work very differently and although they use similar tools, the way they use them is different. Another explanation for why we couldn't find a pattern may have been that we were not looking at all of the relevant metrics. Also, we discussed comparing the data across months vs. releases vs. other ways to slice the data and try to determine ebbs and flow of contributions.

I'm sharing this experience not to discourage anyone from trying it again, but to let you know what I know has been tried so far. That said, if anyone is interested in pursuing this kind of analysis, I would love to have a conversation in the CHAOSS project and help with it as best we can.

Best,
Georg