I’m trying to write an algorithm (specifically in Ruby) that will rank teams based on their record against each other. If a team A and team B have won the same amount of games against each other, then it goes down to point differentials.
Here’s an example:
A beat B two times
B beats C one time
A beats D three times
C bests D two times
D beats C one time
B beats A one time
Which sort of reduces to
A[B] = 2
B[C] = 1
A[D] = 3
C[D] = 2
D[C] = 1
B[A] = 1
Which sort of reduces to
A[B] = 1
B[C] = 1
A[D] = 3
C[D] = 1
D[C] = -1
B[A] = -1
Which is about how far I’ve got
I think the results of this specific algorithm would be:
A, B, C, D
But I’m stuck on how to transition from my nested hash-like structure to the results.
My psuedo-code is as follows (I can post my ruby code too if someone wants):
For each game(g):
hash[g.winner][g.loser] += 1
That leaves hash
as the first reduction above
hash2 = clone of hash
For each key(winner), value(losers hash) in hash:
For each key(loser), value(losses against winner):
hash2[loser][winner] -= losses
Which leaves hash2
as the second reduction
Feel free to as me question or edit this to be more clear, I’m not sure of how to put it in a very eloquent way. Thanks!
5
Rating the relative strengths of teams/contestants is sort of a solved problem, so why re-invent the wheel?
There is the Elo rating system, originally devised for chess, and there is also the Glicko (pdf) rating system.
Both systems try to model the probability that one contestant will win against another, and produce a score representing their relative strength. I.e. higher rating means a stronger player.
The Elo system is zero-sum, and has a few thorny edge cases.
The Glicko system is not zero-sum, but it also models the confidence of the score as well; i.e. a player with a single data point isn’t weighted as high as a player with many data points. It tries to fix some of the downsides of the Elo system.
Based on your question you probably want one of these systems, even if you don’t realize it yet.
1
You appear to be dealing with a feedback arc set. Because each team plays more than once, you’re looking at a more complex situation than the minimum case though you could conceivably do the number of wins as a weight and still wind up with a restricted case that has a valid approximate solution in O(log|V|loglog|V|) time.
If this is a real life problem, then you must expect inconsistent data, like A beats B, B beats C, C beats A. You already showed inconsistent amounts of information: A played B and D three times each, but didn’t play C at all.
I’d make a model describing how likely a team is to beat another team, and find values that agree best with the model. For example “every team has a strength from 0 to 10. If a team is stronger by an amount x ≥ 0 than another team, then it will beat that team with probability 1 – exp (-x^2) / 2”. So with identical strength chances are even, but a much stronger team is very likely to win most games.
You could start with an initial guess that every team has a strength of 5, and estimate the chances to get the result of your data. Then you modify the data randomly and check if that improves the chances of getting your data.