Although I load the whole game sgf into Leela, when I ask it for what it wants to play for move X I haven't done any analysis for moves after X (I used a separate sgf replayer to know what the human played) so I don't think the fact the sgf contains that information is used by Leela, but I will check with a truncated sgf. (It's a manual position-by-position analysis rather than bulk analysis of the game like go review partner does). If you go forward from X and do analysis then these simulations of the game tree are used if you move back to X and continue analysis.

I do not know much about Leela interface. But, logically, your way for game analysis looks good. Still, I would prefer to use truncated sgf to be 100% sure that there is no influence from next moves.

I checked with a truncated sgf, the results are essentially (but not exactly, see Herman's point below) the same. The Leela interface (or my ineptitude with it) makes it difficult to input both black and white moves to an unfinished sgf and have Leela only offer analysis instead of actually playing moves of the opposite colour in reply.

HermanHiddema wrote:

So, given that Leela's preferred moves are non-deterministic like this, it is possible that the same move might on one run be Leela's top choice, and on another be outside the top 3 or outside the 5% margin?

Definitely possible, though my feeling from the analysis I've done so far is it would be rare for a #1 to drop so far. Shuffling around of #2/#3/#4, and win% crossing the 5% mark more common.

HermanHiddema wrote:

Given one of your test games, for every position between moves 50-150, let Leela analyse the position five times, independently (i.e. close and reopen the position between runs). Then record if the human move played was ever Leela's top choice.

Too much work for me to do manually though! As a little test, here's a pic of 3 runs (50k, 50k, 150k) on the same position on a full sgf and 3 on a snipped one to also test Dmytro's point. This position Leela has a strong preference for the #1 of d15 and didn't put much effort into analysing the other choices. In other positions I've seen much flatter distribution of the effort so I'd expect more variance between runs (and also with #nodes). In all 6 d15 is #1 and has by far the most simulations. d14 and d16 are always taking the next 2 positions, but d14 is #2 in 4 of 6, and is always within 5% of #1, even when in 3rd. In 2 of the 4 where d16 is 3rd it is more than 5% worse than #1. The order of moves outside the top 3 changes a bit, but with so few simulations is basically noise.

Attachment:

snipped samples.png [ 81.53 KiB | Viewed 3400 times ]

John Fairbairn wrote:

Maybe we could try an electronic vote here, too.

I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela". If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote. Then again we let uninformed people vote in much more important matters .

Kirby wrote:

* I don't think punitive measures can fairly be taken without absolute proof of cheating.

I think "absolute" is too strong, "beyond reasonable doubt" is good enough for me in this case (but I have oodles of doubt). For less important things like regular KGS games even less strong evidence is ok.Edit: skim reading some of drmwc's links from the bridge case I see "comfortable satisfaction" as an intermediate level of proof between "balance of probabilities" and "beyond reasonable doubt".

In bridge, Fantoni and Nunes were accused of cheating. The site BridgeWinners was involved in analysing the evidence and briging the case. They were world champion level players.

Bridge is attempting to become Olynpic, and so the ulitimate arbitration for European bridge is the Court of Arbitration for Sport. Their appeal ultimately ended up at CAS, and they won the appeal earlier this year.

I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela". If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote. Then again we let uninformed people vote in much more important matters .

But that's my point. In so many areas of our lives all of us (even those who think they are rational) makes decisions and assessments in naïve, uninformed way - and that is usually how we expect other decisions to be made, as you seem to concede.

I think most of use realise the process is flawed and does (not just "is likely to", but "does") lead to rather serious mistaken consequences: criminal convictions, bombing Iraq, destroyed reputations, etc, etc. Still, we accept that as a necessary compromise for practical reasons.

Drug-busting in sport has tried to go the other "scientific" way at the most enormous cost and major inconvenience to the lives of athletes, and terminal boredom or frustration for sports fans, yet it still doesn't work - unless you're a greedy or hungry lawyer.

A game like go can't afford much in the way of anti-cheating and anti-drugs measures anyway, so the most sensible regime seems to be that we tolerate unscientific suspicions but we try to alleviate the inevitable but occasional malign consequences by (1) not doling out treatments as harsh as Carlo's and (2) making people aware that they have to take their own steps to appear to be above suspicion.

A hard-nosed extension of the last point, for example, might be to say that Carlo was unwise to limit his study to copying Leela and unwise to tell other people about it, so he was the victim of his own mistakes even if he didn't cheat. The phrase "justice needs to be seen to be done" works two ways, after all.

It could be instructive to see how an average amateur go audience would vote on this, I.e. to see what the normal real-world expectations would be. It may not be a "good idea", but is it the "best idea"?

I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela".

Leaving aside my opposition to matching (confirmatory) evidence as well as matching in only one game, matching one of the top three Leela choices (unless it is not a good play) is a lousy metric. Why? Because, as Uberdude has shown, non-cheaters at that level also match quite frequently. What you would like is a more sensitive test, such as one by which non-cheaters match around 50% of the time. Matching Leela's top choice looks like it fits the bill pretty well. In Uberdude's admittedly non-random, insufficiently large sample, the matching to Leela's top pick among supposed non-cheaters has a median value of 48%. Using that metric Carlo matched 72% of the time.

Tell people this:

Quote:

In this one game, out of 50 moves in the middle (Black 51 - Black 99), 36 of Carlo's plays matched Leela's top choice for that play.

Research has shown that people are not so good at judging percentages or fractions, they are better at judging integers.

My guess is that most people would say, 36 matches out of 50? That's suspicious, all right, but not convincing.

_________________The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote.

Would it suffice to restrict the vote to amateur dan players? A "jury of his peers", as it were.

After all, *those* players are the stakeholders, here. Those are the players that will be being paired against Carlo in future matches and *those* are the players who stand to lose the most should this incident lead to either an atmosphere of mistrust and suspicion, to lax anti-cheater policies, or to overly draconian anti-cheater measures that lead to many false positives in the future.

Personally, I feel that this decision and the way it was handed down was far too focused on the local situation -- a single game in a single, online tournament -- than the whole board position. In my opinion, even if the evidence for cheating had been much, much stronger than it really is, one could *still* argue that this incident should serve only to open the conversation about anti-cheating measures and proper, responsible punishments for convicted cheaters.

The reason why cheating is against the rules is to prevent players from gaining an unfair advantage, after all, not to dole out punishments. If a player is suspected of cheating, surely future scrutiny and long-term observation achieves this far more elegantly than wrecking that player's reputation and tarring that player with a conviction?

Sure, you might never get the chance to convict because the player might never cheat again, foiling your efforts to catch him or her. I'd call that a resounding success, not a failure.

Even if I wasn't (also) a Bayesian, even if I didn't know enough about statistics to know what that means in the first place, I would still vote not-guilty, should I be asked, simply because the outcome of the vote has consequences and an unqualified "guilty" verdict must be the worst outcome for everyone. It does not create an environment where cheating is not a dominant strategy because cheaters will, ultimately, be caught. It creates a minefield: don't, whatever you do, resemble a bot or your Go-life is over.

It creates a minefield: don't, whatever you do, resemble a bot or your Go-life is over.

So to avoid being accused of cheating, maybe for my PGETC game next week I should run Leela to make sure I don't play too many moves similar to her suggestions . (I'll likely be playing Gilles van Eeden 6d, who scored 82% similarity to Leela's top 3 vs Viktor Lin; my best results against stronger players typically happen when I succeed in playing a solid honte style, which probably has a relatively high Leela match rate).

It seems that even with chess engines being much more advanced than anything we can do with Leela or Crazystone conclusions are hard to reach. Even tournament perfomances that are worlds apart (2000 elo to 2500 elo in this case, although one was blitz the other rapid) are apparently no smoking gun.

It seems that even with chess engines being much more advanced than anything we can do with Leela or Crazystone conclusions are hard to reach. Even tournament perfomances that are worlds apart (2000 elo to 2500 elo in this case, although one was blitz the other rapid) are apparently no smoking gun.

I'll likely be playing Gilles van Eeden 6d, who scored 82% similarity to Leela's top 3 vs Viktor Lin; my best results against stronger players typically happen when I succeed in playing a solid honte style

Reading over some of the posts in this topic it seems that we expect someone to use a computer for much of the game. All it takes to get an unfair advantage by cheating is to use it once in a difficult position to come up with a move or combination that you would not come up with or verify your thoughts. It seems with the technology here now that serious online tournaments will have to become a thing of the past. Because of the difficulty of monitoring the integrity of the players.

_________________Walla Walla GO Club -(on FB)

We play because we enjoy the beauty of the game, the snap and feel of real stones, and meeting interesting people. Hope to see ya there! お願いします!

Reading over some of the posts in this topic it seems that we expect someone to use a computer for much of the game.

It turns out that that has been, and still is, a pattern of cheating in chess. You have players whose every move is, according to top chess engines, neither a blunder nor a mistake, and who play only a few inaccuracies per game. Their rating never topped 2200 and suddenly they are playing like super grandmasters.

Such moves will typically be among the top three choices of any given chess engine, which is where I suppose the match one of the top three choices heuristic came from.

_________________The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

If i am not wrong Carlo just won against a european 6d in World Amateur GO Championship.He was table 3 of Italy, now represent his country in WAGC and win against a 6d.

Yes, he beat Ben DG 6d, the current French champion, a good win. Ben also lost to a 4d from Thailand so maybe he's jetlagged . I will be following Carlo in the thread about the WAGC. (maybe people missed it in the usually quiet Tournaments subforum)

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum