How Efficient Is This Compared to the Alternatives?

The strategy I just showed you takes one pass through the table. That is, it's an O(n) algorithm in the size of the table (as I showed, the best case is O(1), but the worst case is bounded above by n). This is as good as you can get when you have to calculate a rank for each row.

The alternatives I know of are quadratic, or O(n2) in the size of the table. For example:

-- This query is illegal in MySQL, but even if it weren't,
-- it would take a very long time to run.
update score_ranked as o
set rank_in_game = (
select count(distinct score) from score_ranked as i
where i.score >= o.score and i.game = o.game);
-- This is legal, but creates an un-indexed temp table (even worse!)
-- inside the subquery. Don't try this at home!
update score_ranked as o
set rank_in_game = (
select count(distinct score) from (select * from score_ranked) as i
where i.score >= o.score and i.game = o.game);
-- This is still a cross join because of the <=
update score_ranked as o
inner join (
select l.game, l.score, count(distinct r.score) as rank
from score_ranked as l
left outer join score_ranked as r on l.game = r.game
and l.score <= r.score
group by l.game, l.score
) as i on o.game = i.game and o.score = i.score
set o.rank_in_game = i.rank;

If that were the best you could do in SQL, what are your other options? One is to simulate cursors in a programming language like PHP. This means selecting the entire table and looping through the rows, calculating each row's rank and sending an UPDATE back to the MySQL server. I advise against this. It's not quadratic, but it's death by 100,000 cuts.

Failing that, you have to abandon rank columns, and settle for all the COUNT() queries and limits with offsets. In my experience as a consultant, this is the path most people take, and it doesn't scale. If it did, they wouldn't have called me for help.

Think about it this way: the leaderboards (reads) for Design #1 are O(n) in the size of the table, and writes are O(1). If I eliminate COUNT and LIMIT with OFFSET queries, the reads are O(1) and writes are O(n). If I postulate that there are many more reads than writes, I've probably made the right decision.

Does the Rank Column Really Help?

It's not enough to just rely on gut feeling, look at query plans, or run queries and see how long they take. Profiling and benchmarking are essential to understanding whether "optimizations" actually help. The first thing I always do is profile the queries on an otherwise quiet machine so I can measure how much work they really do. This is why I wrote the MySQL Query Profiler.

I profiled the three sets of queries to test the leaderboards. Because gamer 1010 is fairly high in the rankings in those leaderboards, I also profiled gamer 755, who is very low in the rankings, to get a sense of the two extremes. Here is a sampling of the most relevant results:

Design

Design #1

Design #2

Design #3

Gamer

1010

755

1010

755

1010

755

Time

0.03

0.2

0.05

0.2

0.2

0.01

Queries

4

4

3

3

2

2

Optimizer Cost

21793

21793

5177

5163

2

2

Table locks

5

5

5

5

3

3

Table scans

1

1

2

2

1

0

Index range scans

2

2

1

1

0

1

Rows sorted

320

10000

40

18

20

8

Rows read

30943

59988

20403

20051

27482

39

Fixed reads

320

10000

20

9

20

8

Next-row reads

20002

20002

42

20

8728

9

Bookmark lookup

10010

10010

32

22

8734

14

Full scans initiated

1

1

0

0

0

Next in index

610

19975

20309

20000

10000

8

Temp table inserts

10000

10000

40

18

8727

8

It looks like some of the queries might not be using indexes the same way on the two gamers, but I don't want to get into early optimization. I just want to look for a general improvement with the ranking columns. The profiling data shows why it's not a good idea to rely only on query plans and execution times; the query times aren't that different, but the other numbers show how different the workload is.