User Story:: As a PM/UX-Person I want to use an inequality score as part of assessing community health
Based on the input of @GoranSMilovanovic, @guergana.tzatchkova and me, I suggest and document the following:
Data retrieval
To calculate the inequality of account edit contributions, we need to count edits for accounts. This means we need a way to select the accounts and a way to count edits.
Account selection: All accounts that have been active in a certain timeframe. I suggest two levels of granularity: Month and Year.
Counting Edits: I suggest two ways to count the edits: Edits done within that timeframe and total edits ever done, measured at the end of this timeframe. This is equivalent to measuring income and wealth, respectively in economics.
This gives us 4 tables:
Month | Year | |
---|---|---|
Edits-over-time | edits | edits |
Total-at-time | edits | edits |
These tables could either be just a list of edits ("long form") or be aggregated to a two column table that shows how often which edit count was in the set: We counted X accounts to have an edit count of Y ("aggregated form")
Calculate hoover score
The score could be calculated with existing R packages. If they are not fast enough, we might need to consider optimization. The formula for hoover is quite simple and vectorizable afaic , so I guess it might run just fine. Let's keep in mind, that we need to run the scoring monthly and yearly, so it is not a continuous load.
The resulting hover scores should be appended to tables of hoover scores. Again, these will be 4 tables:
Monthly | Yearly | |
---|---|---|
Edits-over-time | hoover | hoover |
Total-at-time | hoover | hoover |
and each table has two columns, one for the point in time and one for the according score at this moment (which way of measurement needs to be stated in the table’s name or in some metadata or in an extra "type-cell" for each row (which would be quite redundant)
Presenting the scores
In the simplest case, we only provide the data and can see how it changes over time by importing it to excel or the like. Even better if we have a tool that visualizes the data:
(The wireframe has no timeframe selection, as the data will be small enough to just scroll back)