Decision: case of using computer assistance in League ABased on a protest against the PGETC League A Round 4 match Italy vs. Israel Board 3 game Carlo Metta vs Reem Ben David, and after an extensive study and consultation, the League A referee has decided that Carlo Metta used Leela computer software to assist him in the game. This violates the rule of fair play (§3.2).

The penalty is that all games by Carlo in the 8th PGETC League A are forfeited and the player is banned from the 8th and 9th PGETC play.

I know Carlo since a very long time and we played many many games in tournaments...in the last 2 years he didn't play online and played instead only against Leela also studing and reviewing with it. I was at first against this idea but when i saw him in life tournaments i had to change my mind: he improved like hell and his style was even more solid then usual. Now if we can see similarities between him and Leela i am not surprised at all, but those small differences can actually be extremely big; one mistake is enough to loose a game. In some games of the league in fact he was loosing and won because of an opponent's blunder (that is definetely not leela style ^^').I think even analizing that indicted game, almost all the moves were nothing particular and even the special ones were totally "Carlo style" as i know him; the fact that many were all top5 moves of leela in that game means just carlo is not a kyu player(and it works also for his opponent in that game ^^' crazy)...

Above all tecnical stuff (that actually cannot really prove anything) the italian team was very excited about the miracle of reaching A league; we knew that the level was higher than our but for us it was a big honor and occasion to play with strong players and learn (i did lose all my games and still i am very gratefull for the experience ). But the others managed to fight very well, especially the last boards, especially Carlo. I would bet my life he didn't cheat because i know him and he is exactly like all the others go players: he wants to play and learn.I am sorry for the referees because is a difficult situation and even after a long study they made a big blunder, but being at their place i doubt i could do better.But i am even more sorry for Carlo that after giving so much for studing go game and working so hard for the italian go federation as organizer, he is now ashamed and feeling bad about this crazy story and also losing motivation in this period when he was finally having very nice results..

Me and all the italian team(pretty sure), will support Carlo with all we have as he is victim of a new, difficult to deal with, system and doesn't deserve it at all.

The situation is actually disgusting me but i can understand everyone.. please just try to be respectfull to a great, polite and honest go player.Thx

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices. I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. I have no reason to believe my opp used Leela (and I didn't) and all his moves seemed plausible 3-4d moves (and I won), so my tentative conclusion from this small test is that if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment. Of the 38 moves I looked at in my game I classified 13 of them as "only moves", as in a dan player like myself can say it's the only move in less than a second e.g. capture after atari (when no other sensible choice like connect). This classification is somewhat subjective, but excluding these from the count would give a higher quality metric.

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices. I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

In chess the answer has been shown to be an unequivocal yes. Perhaps Go players are smarter, though.

After Israel's protest some extensive research on the games was done and finally it was concluded that 98% similarity to Leela Go playing software is too much.

My inner scientist says that's not the right question. Confirmatory evidence (he plays like Leela) is weak. That could be the result, as mentioned by Alessandro Boh Pace, of training using Leela for two years. The main question is this: how different was his play in these games from his play in recent FTF tournaments? (If those game records are unavailable, there are other ways of testing how close his go decisions accord with Leela's.)

I took a look at the linked game record. Nothing seemed particularly distinctive until Black 109. Black's play from that point up to Black 153 does, however. Would Leela have played that whole sequence as Black? If so, that's pretty good evidence, but still confirmatory.

----

As I have said for quite a while now, humans can learn to imitate the strategy of the top bots, even though the bots cannot explain it. But the decision to play Black 109 rests upon fairly specific tactics, as well. The play is not particularly urgent on its face. That is why it seemed distinctive to me.

Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. I have no reason to believe my opp used Leela (and I didn't) and all his moves seemed plausible 3-4d moves (and I won), so my tentative conclusion from this small test is that if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.

Minor point: it's 19 moves, not 38.

But yes, the fact that Carlo made exactly the same moves as Leela for 15 moves straight may be suspicious, but that's all. And, as I said, it is confirmatory evidence, which is weak, weak, weak. (Emphasis because most people overvalue confirmatory evidence.)

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices. I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

In chess the answer has been shown to be an unequivocal yes. Perhaps Go players are smarter, though.

Except perhaps for AlphaZero, IIUC, the main difference between humans and chess engines lies in the realm of tactics and calculation of variations. Those skills are hard to imitate. But the top AI go bots use neural nets to come up with different evaluations and strategies. These can be imitated, especially if you devote a couple of years to trying to understand them.

My inner scientist says that's not the right question. Confirmatory evidence (he plays like Leela) is weak. That could be the result, as mentioned by Alessandro Boh Pace, of training using Leela for two years. The main question is this: how different was his play in these games from his play in recent FTF tournaments? (If those game records are unavailable, there are other ways of testing how close his go decisions accord with Leela's.)

As a comment on the linked facebook thread, Jonas Egeberg wrote:

Jonas Egeberg wrote:

As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices. I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

The decision rests upon one game?????

That may be enough to require a replay or throw the result out. But confirmatory evidence is weak. For disciplinary action I would want evidence from at least 10 games. And further tests, besides. For instance, have a monitor for a game or two and see how many of Leela's moves Carlo makes.

As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

The go world will probably have to spend some time looking over Ken Regan's work in Chess. I can't vouch for it, though what I have read by him impresses me. Key things I can say: he claims that it is well within the realm of possibility to substantiate cheating allegations from a single game, but you usually have to go well beyond "how many of the engine's preferred moves did the human play?" to do so.

If you want to continue meaningful online tournaments a deep learning aproach to detecting cheaters is necessary nowadays.

What frightens me more: No more bathrooms at real life tournaments (The only kind I take part in, and my bad results show clearly I am not cheating ). I have to cut my coffee drinking routine at tournaments

It really isn't, at least not on an ongoing basis. I can understand making a preliminary decision based on this, as it one of the first (or is the first?) major accusations of cheating in Western play, but we have to do better in the long term. Accusing a player of cheating will cast a cloud over that player forever. With a few thousand games played each year, we need the chance of a false positive to be miniscule. 1% false positives won't cut it.

By the judges own statement, in several offline games played with no suspicion of cheating, this player was playing 35-40 moves that Leela picked out of a 50 game sequence. In this game, he played 49. That stinks, and if you force me to bet, my money is on cheating.

The shift from 40 moves to 49 is highly suggestive. But to make it stick, you need to know what the range of similarity different players show. To accurately estimate that value, you have to look at a lot of different players, over a lot of different games (without a great number of players, you run the risk that some players are much more similar to a particular bot). You also run the risk that certain positions lend themselves to much higher scores.

While the league manager seems to have worked hard to come to a reliable decision, I do not think that the approach is good enough for the long term.

The go world will probably have to spend some time looking over Ken Regan's work in Chess. I can't vouch for it, though what I have read by him impresses me. Key things I can say: he claims that it is well within the realm of possibility to substantiate cheating allegations from a single game, but you usually have to go well beyond "how many of the engine's preferred moves did the human play?" to do so.

the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.

I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.

As I understand, AlphaGo Lee was trained on high dan-level games to create a policy network that gave it a probability distribution from a given board position that a pro would play a given move. Couldn't one use a similar process to train "policy networks" for various dan levels? So then, given a board position, you could see the probability that a 3d player would play move X, which may be different than the probability that a 2d player would play X, etc.

If you had a reliable policy network for each level of play, then you could calculate that overall probability that a 2d player would play the sequence of moves that he played in the game.

At least at that point, we could say with some confidence that, according to this policy network, there's a 1 in <some-number> chance that this player with this rank played this game.

Once can envision future iterations of software where you can calibrate moves that are good, but just under a predetermined detection threshold.

I haven't played on IGS in ages, but if I recall, KGS tracks *when* moves were made. If IGS does the same, couldn't one see how quickly he is clicking in the game where cheating is alleged, and compare to other league games? Seems if he's using the computer there will be a lag between certain moves that won't be there without relying on Leela.

Once can envision future iterations of software where you can calibrate moves that are good, but just under a predetermined detection threshold.

I haven't played on IGS in ages, but if I recall, KGS tracks *when* moves were made. If IGS does the same, couldn't one see how quickly he is clicking in the game where cheating is alleged, and compare to other league games? Seems if he's using the computer there will be a lag between certain moves that won't be there without relying on Leela.

Based on the idea that your typical kyu player does not think?

_________________Dave Sigaty"Short-lived are both the praiser and the praised, and rememberer and the remembered..."- Marcus Aurelius; Meditations, VIII 21

the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.

I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.

In this case shouldn't the Parable of the Golfers be modified to ask how many golfers we need in order to observe someone sink their drive on all 4 par 3's on a typical course?

As a Bayesian, if you start with an expectation of 80% accuracy (the high end of what was observed in FTF games), how do you interpret 98 out of 100 (or 49 out of 50?)? [This is not argumentative; I simply would like to know!]

_________________Dave Sigaty"Short-lived are both the praiser and the praised, and rememberer and the remembered..."- Marcus Aurelius; Meditations, VIII 21

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum