P.S. Surely there's a timing problem with citing your data. Both goratings and Dr. Bae Taeil's data show Iyama getting even stronger in the past year or two. But his international record reflects more than 5 years of play.

That is true. Also note that 12-21 is not very statistically significant in terms of proving inferiority. Assuming a uniform prior over the win rate, the posterior probability that a player with 21 wins is stronger than a player with 12 wins is only 94%.

thanks, I understand. It would still be interesting to see if the ex-ante winning probabilites are well calibrated (in that sense --- that black wins about X% when the ex-ante winning probability is X%). It is not obvious that this will be the case, even if the model is super good at "getting the sign right", i.e. predicting which player will win.

Just to be clear, this is not a criticism of the methodology or anything like that I would be happy to do some analysis if you'd give me some data, pm me if you're interested.

The reason this is interesting is that if the ex-ante probabilities are sufficiently well calibrated, it is meaningful to look those numbers --- otherwise it's not, and they should just be thought of as some "intermediate" quantities inside the model, which are used for calculating the final output.

Amusingly, the US's Andy Liu is currently rated 175th--the power of winning four straight games with your only loss being in 2011. He'll surely drop a little once the Kansai tournament is over (and you can't read much into Western players' ratings anyway, because of the lack of games and possibility of selective inclusion).

It is amazing. The only 20th century player I know to compare it to is Go Seigen[1]. One difference is that Lee Changho was eventually surpassed as he aged and Lee Sedol came into his prime. Would Sakata have been able to surpass Go in the 1960s if the latter's career hadn't been cut short by an accident? We'll never be able to do more than guess (but check out this interview: http://www.andromeda.com/people/ddyer/a ... unter.html).

I see in your paper that you applied your algorithm to the KGS database. If you don't mind, I had a question about the evaluation of the results.

In your article, you say that "WHR significantly outperforms the other algorithms". The previous table shows a prediction gain of 0.6% compared to a simple ELO system. It looks like a small gain to my untrained eye; nevertheless I understand that the prediction rate cannot increase too much over 50% (if a game is a true 50/50 coin flip, no algorithm can do better than 50%). As a consequence the prediction rate seems to be dependent of the database (many lopsided matchups, e.g. 90%/10%, would increase the prediction rate of all systems). Therefore it seems hard to me to get a good idea of how significant the prediction gain actually is.

I thought about the following : if we took a large "fake" database with win ratios actually distributed as a gaussian centered on 0.5 and some well-chosen variance (corresponding to the apparent variance in the KGS database), what would be the prediction rate of the "perfect algorithm" (that predicts the correct win probability every time)? It seems to me this would be the best theoretically possible prediction rate. I hope my question makes sense.

You can see that he's doing a bit better as time goes on. Still, his international results from 2013-2015 are mediocre relative to his current rating. If you split the games into 2002-2010 and 2011-2015, you get records of 3-12 and 9-9 respectively, though there's no principled reason why you should include 2011 and not 2010 (which gives you 10-12). 2010 is notable because it's the first year he plays someone rated lower than himself.

I'd like to calculate how he's performing relative to the model's predictions, but I actually have to look up something to do that .

I thought about the following : if we took a large "fake" database with win ratios actually distributed as a gaussian centered on 0.5 and some well-chosen variance (corresponding to the apparent variance in the KGS database), what would be the prediction rate of the "perfect algorithm" (that predicts the correct win probability every time)? It seems to me this would be the best theoretically possible prediction rate. I hope my question makes sense.

Rating algorithms must be tested on real data. You can generate artificial data based on some model, and then the best rating system would be the rating system that assumes this model. But the fact that an algorithm is the best to predict the artificial data does not imply that it will be the best to predict the real data. The only way to measure the ability of an algorithm to predict real game outcomes is to measure how well it predicts real game outcomes.

Rating algorithms must be tested on real data. You can generate artificial data based on some model, and then the best rating system would be the rating system that assumes this model. But the fact that an algorithm is the best to predict the artificial data does not imply that it will be the best to predict the real data. The only way to measure the ability of an algorithm to predict real game outcomes is to measure how well it predicts real game outcomes.

Thank you for your answer, but I think you misunderstood my point. The goal would not be to test the algorithm itself, it would be to get a rough idea of the theoretical best prediction rate that could be achieved ("rough" because it would be an approximation of the KGS database).

Going from 55.7% to 55.8% of prediction rate would look very different if you can give a convincing argument the prediction rate should be capped at around, say, 57%, as compared as if you could (theoretically) reach 100%.

I can't see how a player that nearly never plays international games can be ranked #3 in the world. I like Iyama but he doesn't deserve that spot. All his competitors in Japan are out of top 20 in the world(most would actually think top 50) so how can his vicotories count so high?

Continual victories against his domestic opponents will continue to increase his rank, even if marginal, unless there's a cap in the algorithm where one will no longer gain points unless he plays someone strong enough. I haven't read the paper for the WHR algorithm.

He is on a hot streak. Unless I missed a game, still might hold all 7 big titles in Japan at the same time. But I would like to see Japan compete better at an international level.

Rating algorithms must be tested on real data. You can generate artificial data based on some model, and then the best rating system would be the rating system that assumes this model. But the fact that an algorithm is the best to predict the artificial data does not imply that it will be the best to predict the real data. The only way to measure the ability of an algorithm to predict real game outcomes is to measure how well it predicts real game outcomes.

Thank you for your answer, but I think you misunderstood my point. The goal would not be to test the algorithm itself, it would be to get a rough idea of the theoretical best prediction rate that could be achieved ("rough" because it would be an approximation of the KGS database).

Going from 55.7% to 55.8% of prediction rate would look very different if you can give a convincing argument the prediction rate should be capped at around, say, 57%, as compared as if you could (theoretically) reach 100%.

Keep in mind that every prediction needs to be based on data. Every rating algorithm I know of only uses past records, as it's ostensibly the most efficient option. A more convoluted model (thus adding variance, allowing for better prediction in the first place) would have to take more data into account than pure W/L and rating difference. These additional parameters could be game-related (winning margin - a terrible idea, i know), individual characteristics (e.g.: average thinking time) for the relevant match characteristics (here: time setting) and so on and so forth. Of course you would first have to have those data available, then invest additional computing power,... but there's only so much you can squeeze out of a single binary variable.

One fact I find remarkable: before 1988 all the top three were Japanese; since 1994 not a single Japanese player has been in the top 3. This is an amazingly quick turnaround.

Also interesting is that in 36 years only 5 players have been at the top.

Something has now changed, and Cho Hunhyeon first hits 1st place in 1984. I supposed some older games might have been added? Unfortunately, there still aren't any of his games against Japanese players from the early 80s.

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum