Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Let's say I have an application that, from a pool of items, presents a random two to the user. The user selects one as the "winner" and the result would be recorded like this:

INSERT INTO results (winner_id, loser_id) VALUES (26, 45);

What would be the best way to use the results table to rank the items?

This question has a theoretical/subjective component--i.e., there's more than one way to think about ranking them.

For example, this isn't that much different from, say, a season of baseball. Baseball chooses a winner simply by comparing win percentages. But in baseball you could end up with Cleveland with a better winning percentage than Detroit, yet Detroit beat Cleveland every time they played this year. So from another point of view besides winning percentage, you could say that Detroit is better than Cleveland.

It's that other point of view I'm interested in. If you start with two items--A and B--and know from the results table that B beat A 30 times and lost 5 times, you know B should be ranked above A. Then you look at C, which was 8-10 against B and 12-6 against A, and place it between B and A. And so on.

It seems like I'm describing a fairly simple formula, but when you start extending it, it seems like it would get really computationally heavy for large sets of items, and it would be problematic for an item that doesn't fall between two items that it would be expected to fall between based on the rankings up to that point.

Do you think such a ranking system is feasible? Does this concept already have a name? Would it be possible to set up a system of queries that could reasonably provide results in realtime in a web application for, say, 10,000 items?

Questions on Database Administrators Stack Exchange are expected to relate to database administration within the scope defined by the community. Consider editing the question or leaving comments for improvement if you believe the question can be reworded to fit within the scope. Read more about reopening questions here.
If this question can be reworded to fit the rules in the help center, please edit the question.

This really should be moved to StackOverflow. Or better yet, CSTheory.stackexchange.com
–
RichardAug 16 '11 at 18:08

2 Answers
2

This will always have the problem where A beats B, B beats C and C beats A. I don't know of any deterministic ways to define who ranks top in this case. Rock, paper, scissors and similar games exist because of this.

You're likely to get some form of circular dependency like this for any real sport, particularly when looking at a small number of games. There may be some algorithm to resolve this, but I'm not aware of it.

Assuming there is some algorithm to do what you'd like, changing your data model slightly will make it easier to do:

RESULTS (team_a_id, team_b_id, team_a_wins, team_b_wins, ties);

This will reduce the number of records you have to inspect as there will only be one entry for each pair of teams, instead one for each game for each pair of teams.

I should add though, circular results wouldn't be a bad thing. In rock-paper-scissors, the result would be a 3-way tie for first, and that would be fine. On the other hand, the number of entrants and matches would be high enough that I would expect a perfect circular result to be unlikely.
–
Chris VandenHeuvelJun 29 '11 at 16:22

Bear in mind you could get a situation where team A wins all their games against teams B through Y. Team Z loses all their games against teams B through Y. Team Z always beats team A however. How does this rank?! How common this is depends on your specific data. Note that A->B->C->A ties may be resolved by storing the (cumulative) scores between the teams on your results table too.
–
Chris SaxonJun 30 '11 at 14:42

I would say A would not be my clear winner since it always loses to what would otherwise be the worst entrant. If B only ever loses to A, then B would be my winner. Somehow, A is "weighed down" by all the losses to what would otherwise be the worst team. At the same time, Z is bumped up through its wins against A. I suppose I would intuitively expect them to meet in the middle of the ranking. Or maybe at 25% and 75%? It's a great question because it illustrates a result that would be quite different than total win percentage ranking, and that difference is exactly what I'm after.
–
Chris VandenHeuvelAug 18 '11 at 12:55

Ranking algorithms is a very deep subject. What type of ranking you choose depends upon many, many factors. Factors like:

Do you want to rank the players/teams based solely on win/loss record?

Is this a ranking system that keeps score? If so, do you want to take into account how badly someone was defeated? (E.g. a team winning by 75 points is much better than a win by 2 points.)

Is a tie possible? If so, how does your ranking handle that?

Is a win/loss by default possible? If so, how does your ranking count that?

Do you need to consider anomalous wins/losses?

There's too many factors to consider to just say "Here's how to implement a ranking system". Each ranking system is unique to the sport/activity that it's trying to rank. And, at that, there are generally many ranking systems per sport/activity.

If you're trying to model something specific, you need to include more information. Otherwise, you're opening yourself up to the entire field of statistcal mathematics.

Regarding databases, the complexity of the database design will be directly dependent upon the ranking algorithm you use.

Let's say this system would be used to rank submissions in a logo creation contest. Judging takes place through a website that presents random matchups of two entries and asks for a vote. Assume a high number of votes. Then the answers to your first question would be: The primary factor is win-loss records between individual entries. Overall win percentage may be considered to make the algorithm work, but it's not the primary factor. And to the rest, the answer is no.
–
Chris VandenHeuvelAug 18 '11 at 12:27

In this scenario, I'd use a simple Win % attached to the individual entry rather than try to keep track of all the votes. Again, though, this is algorithm design, which really belongs elsewhere.
–
RichardAug 18 '11 at 12:42