This is Release 1.0, a production version using naïve randomness. Version 1.1, using simulated annealing, is in development.
Used by: Harvard Consulting on Business and the Environment, Harvard Undergraduate Data Analytics Group, Harvard Computer Society, and other clubs.
Let's say that we are building teams, where
Each of the
For instance, consider a set of rankings like this (identifying information omitted):
- As you can see above, some teams are very popular (like Team 7), whereas others are less so (like Team 1). How can we assign teams such that every person will be relatively satisfied with their rankings?
- How can we make our algorithm robust, so that even if the number of teams and prospective people don't quite align, it still works?
Simply put, the algorithm here uses a stochastic greedy approach, which starts with the least favored teams upward:
- The total ranking sum for each team is computed.
- Starting from the highest sum (least favored) downward, the top
$c$ people who have pref'd that option, as well as anyone with the same ranking for this team as one of those$c$ people, are selected. - Out of that pool,
$c$ people are randomly selected for that team and taken out of contention for other teams' selections. - This is repeated for every team. Once that is done, a complete set of team assignments has been given.
- This repeated a set number of times (e.g. 100).
I'm not sure if this problem is NP-complete, although it certainly seems like some variation of an NP-complete linear programming problem. As such, the Team Selection Algorithm™ uses some simplex-like stochastic heuristics to work.
This algorithm works better for data that is at least somewhat well-distributed (e.g. not everyone ranked T1 as their first choice). In my testing, out of 48 people and 12 teams, in taking the best choice out of a 100-time simulation, everyone received their fourth choice or higher. The distribution of preferences in final team assignments for the sample data of 48 people is given below:
There are two sample files provided, one with teams of 4, and one with teams of 1.
To run each, you will have to adjust some of the run parameters at the top of the file:
- NUM = # of people in the team
- MINVAL = # of teams
- NUM_TRIES = # of iterations to run algorithm
- FILENAME = csv file name
- NUM_COLS_METADAT = number of columns before rankings (5 in both)
- PREFIX = output file prefix
Originally made as the team selection algorithm for Harvard Consulting on the Business and the Environment. Used for F2022 recruiting cycle. Available on an MIT license.