Since you obviously have not bothered to read my paper, on one hand you have two online games similar to Leela, and on another two live games with much lower level of play. Comparing those two games shows large difference in play. Draw your conclusion...

Since you obviously have not bothered to read my paper, on one hand you have two online games similar to Leela, and on another two live games with much lower level of play. Comparing those two games shows large difference in play. Draw your conclusion...

What if later one founds a live game similar to Leela and an online game of a lower level (this latter is already available, maybe even the former)? Will you draw the same conclusion?

As many told you before, human ability can fluctuate so much, you need to consider thousands games of hundreds players in different settings to establish a solid method. A single game can tell you everything you want if properly choosen.

...I think that go will experience a flowering in the coming years. I would not at all be surprised in the pros 20 years from now are two or three stones stronger than the pros of today. And they will get there in large part through imitating bots...

I too think that go will experience a flowering in the coming years, but I think that those pros will get there by having implants and essentially being cyborgs. ( I look forward to this, although by that time I will probably be too old for a surgeon to take the risk )

When cyborgs become the norm - and they will! - the whole question of cheating with bots will disappear. We will look back on this thread as a quaint reminder of the days when people were unaugmented.

Not only will go experience a renaissance, but other fields will too: art and music especially. ( The greatest benefit to humanity, however, will be that nobody will wander out into traffic while looking down at their phone )

If you meant to prove that others cheated, it would be goodthat you make proper analysis.

(Spelling corrected by me.)

And you were referring to this quote:

theoldway wrote:

Actually there are other PGETC players with several games almost completely Leela-like (even some famous and distinguished player). They are all cheaters? Or maybe in hundreds of PGETC games it is possible to observe these coincidences from time to time?

This is the main question we need to answer in the future.

You responded:

Bojanic wrote:

First, others did it is not an excuse.

I interpret "did it" as "cheated". But theoldway did not say or even imply that others cheated. He did say that they played like Leela and questioned whether they cheated.

Your comment suggests that you are taking similarity to Leela's play as evidence, even as a statement, that these other players "did it", i.e., cheated.

You continued:

Bojanic wrote:

I have found several more games in which deviations histogram is close to Leela. In some short games, one player dominated another. Since it was mainly fight, there was lot of similar moves to Leela, but also some of the different moves.I have one game I am very suspicious of, but in it some tenuki moves are different.

Those two Carlo's games are closest to Leela of all games. And since it is two games of one player, it is even more suspicious.

It certainly sounds like you are taking playing like Leela as evidence of cheating. And in that context, you are exhorting theoldway to make a "proper analysis". To which I responded that equating cheating with playing like Leela is not proper.

Bojanic wrote:

Since you obviously have not bothered to read my paper, on one hand you have two online games similar to Leela, and on another two live games with much lower level of play. Comparing those two games shows large difference in play. Draw your conclusion...

I did read your PDF file, if that is the paper you are referring to. I was disappointed in how much of the paper was devoted to similarity to Leela's play. I also pointed out that you had found an important piece of evidence, the mistake that both Leela and Metta made. And I praised you for focusing on tenuki.

In earlier discussions I had emphasized the importance of the differences between play in presumably non-cheating games and in possible cheating games, and you seemed to agree that that is important. In your recent posts, such as the one I quoted, you seem to be laying emphasis on similarity to Leela's play, instead.

_________________The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

Show us the game and your analysis, it can be an interesting example to study. What else is the sgf tag for here?

The game :

Winrate evolution graph :

Attachment:

WhiteWinrate.png [ 13.86 KiB | Viewed 1308 times ]

I wanted to attach the GRP analysis, but it say "The extension rsgf is not allowed.". I did the analysis with 5k playout and LZ #147 (if I remember correctly, but it's at least #145 or newer)

Note that :

1) the game was no komi, LZ consider 7.5 komi, so it's not totally accurate (but I don't know how to correct this easily)

2) there was inaccuracies after move 44, but considering they didn't drop my winrate under 97%, I considered them irrelevant

3) Leela 11 would give a different evaluation

4) as the game was basically over early, I had few possibility to make real mistakes.

Note that 3 & 4 cannot be used as an argument to prove that I didn't cheat with LZ. After all, if I used LZ, Leela 11 evaluation are meaningless, I could just have verified with LZ that my planned move don't loose too much winrate. And even if the game is over early, I could have cheated too, after all, I didn't make serious mistakes later.

As I discovered back in the 1980s, the internet is a hot medium, in McLuhan's terms. Back then, one of the most frequent online sentences was "You didn't read what I wrote." We can cut each other some slack.

_________________The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

First of all, beautifully played and excellent work for 3 kyu. I would have guessed that white is not kyu for sure.

Since the game was 'over' so quickly for the bot, but not over until much later for a human, I think the graph is useless in this case and we cannot draw any conclusion from it. In other words, we do not know if you made mistakes after move 40, because you may have done many but they were not relevant enough. Does that make sense?

What we can tell is that within 40 moves, white made 3 (probable) mistakes. If we were to scale it up to 100 or 150 moves (which is not really statistically allowed), that's around 8 to 12 mistakes - a lot more than we saw in the infamous other games. Also, when even the 5th-best move has a winrate of close to 100, a cheater would not need to play the best move. So just from that I would say your game is no indicator of the methods being used so far being right or wrong. But you wrote most of that yourself already

First of all, beautifully played and excellent work for 3 kyu. I would have guessed that white is not kyu for sure.

It's hard to guess the strenght of a player on a game alone. I mean, there is some indicators, but it can often be tricky. Here I think it looks like I played stronger than 3k because I dominated in this game. It's easier to play well when you've got momentum (but it's not necessarily easy)

maf wrote:

Since the game was 'over' so quickly for the bot, but not over until much later for a human, I think the graph is useless in this case and we cannot draw any conclusion from it. In other words, we do not know if you made mistakes after move 40, because you may have done many but they were not relevant enough. Does that make sense?

Correct. And that was an important part of the point I wanted to make : winrate change don't exactly measure move quality

Quote:

What we can tell is that within 40 moves, white made 3 (probable) mistakes. If we were to scale it up to 100 or 150 moves (which is not really statistically allowed), that's around 8 to 12 mistakes - a lot more than we saw in the infamous other games. Also, when even the 5th-best move has a winrate of close to 100, a cheater would not need to play the best move. So just from that I would say your game is no indicator of the methods being used so far being right or wrong.

Note that, while the first one (the "approach to the lower 3-4) was not considered by LZ, the two other mistakes where second choices by LZ. So before move #48, I only played one move that was not in the LZ top 3 choices, and only 4 that were not top 1. And after that, as you say, it doesn't really matter.

This was just a warning against the "similarity to bot play" metric without doing a statistical analysis of a significant number of games and without considering the state of the game.

Another warning : if a game is not lost by a blunder, you can always find a way the winner could have cheated to play exactly like in the game (Or a way to argue he didn't cheat, unless it's "always play top 1 choice by the engine"). That's very problematic and hard to avoid in this case, especially if you consider that the cheater try actively to avoid detection.

As I mentioned before LZ is stronger than 0.11 and also more opinionated so if one player gets a big lead according to LZ (which isn't so big according to weaker 0.11 or humans) then subsequent winrate changes are less useful at identifying player mistakes. And indeed here we can see Leela 0.11 not going to the high 90s so fast so more mistakes (according to Leela) visible.

For comparison, here is Leela 0.11 on 200k winrate histogram for Carlo vs Reem. It is interesting to compare this to the one in Bojanic's pdf (which I think was about 30-50k) to examine reproducability and the effect of more playouts. Overall we see smaller red bars (and no green) in generally the same places but some differences (e.g. mine shows Carlo making several moderate mistakes in a row around move 120 which Bojanic's does not).

Bojanic found games from other players the league this year with not much red on their GRP Leela histograms, and his suspicion/conclusion was that they were cheating with Leela too. My interpretation of this is that it more likely shows non-cheating players can be similar to Leela. An obvious way to resolve this is to examine games we know for sure people didn't cheat with Leela, the early seasons of PGETC before Leela existed are perfect for this: http://pandanet-igs.com/communities/eur ... rounds/1#1. I'm still setting up pnprog's analysis kit for a more rigorous analysis with lots more games, but for tasters here are some histograms of games from league A round 1 back in 2010 (at 50k nodes, I'll do 200k later). I also think we need to remember that just because Leela 0.11 says a move is bad (red) it doesn't mean it really is: so 7/8ds having more red doesn't mean they played worse than Carlo did with less red, it could be they played better moves that Leela just doesn't evaluate correctly (in the pro game I analysed Lee Sedol had a Leela top 3 matching at the low end of our mid-high amateur dans range, and Park Junghwan in the middle). Using a bot that is clearly much stronger (LZ) could help us distinguish these cases. My guess is LZ will agree most Leela 0.11 red were bad too, but with a sizeable minority actually ok/better. Another of my todos…

Edit: now have 2 histograms both at 50k and 200k to compare.

These are 50k

Attachment:

Hora winrate vs Eeden.PNG [ 54.7 KiB | Viewed 1198 times ]

Attachment:

Eeden winrate vs Hora.PNG [ 53.36 KiB | Viewed 1198 times ]

Attachment:

N Mitic winrate vs Bogatsky.PNG [ 34.63 KiB | Viewed 1198 times ]

Attachment:

Bogatsky winrate vs N Mitic.PNG [ 37.19 KiB | Viewed 1198 times ]

Attachment:

Audouard winrate vs Burzo.PNG [ 59.17 KiB | Viewed 1183 times ]

(Burzo actually won this game despite Leela thinking he was losing when I stopped analysis at move 180)

Attachment:

Burzo winrate vs Audouard.PNG [ 50.33 KiB | Viewed 1183 times ]

Attachment:

Pei winrate vs Tormanen 50k.PNG [ 47.86 KiB | Viewed 1181 times ]

Attachment:

Tormanen winrate vs Pei 50k.PNG [ 51.82 KiB | Viewed 1181 times ]

These are 200k (both players making consecutive equal sized mistakes moves 40-50 was interesting, Leela thinks both should be playing tenuki to more important point (c7/c8 area)

Bojanic found games from other players the league this year with not much red on their GRP Leela histograms, and his suspicion/conclusion was that they were cheating with Leela too. My interpretation of this is that it more likely shows non-cheating players can be similar to Leela. An obvious way to resolve this is to examine games we know for sure people didn't cheat with Leela, the early rounds of PGETC before Leela existed are perfect for this: http://pandanet-igs.com/communities/eur ... rounds/1#1. I'm still setting up pnprog's analysis kit for a more rigorous analysis with lots more games, but for tasters here are some histograms of games from league A round 1 back in 2010 (at 50k nodes, I'll do 200k later). I also think we need to remember that just because Leela 0.11 says a move is bad (red) it doesn't mean it really is: so 7/8ds having more red doesn't mean they played worse than Carlo did with less red, it could be they played better moves that Leela just doesn't evaluate correctly (in the pro game I analysed Lee Sedol had a Leela top 3 matching at the low end of our mid-high amateur dans range, and Park Junghwan in the middle). Using a bot that is clearly much stronger (LZ) could help us distinguish these cases. Another of my todos…

It seems like an enormous task to produce a tool that everyone can trust. One metric can be similarity to Leela(N) but that alone is probably not enoughYou need to be able to show that the player is finding moves above his level, I think that given our different strengths within go that is very difficult. One player may invade well; another uses influence well; another is a God of shape; etc Is this task easier in Chess? No idea.We do not even have an agreed definition of tournament performance rating yet. Plus GoR's winning percentages are suspect.Ultimately I consider it frighteningly hard to say definitively that a 2400 player + Leela(N) played at 2550 for X moves in N.

Uberdude,it is a bit more complex than just looking at histograms with same number of variations.First, it is better for analysis to look how move suggestions evolve in Leela, after some time and number of variations.Important moves I looked in Leela directly, and watched how they evolve. I noticed that after some 50k variations changes are rare and rather slow.Sometimes suggestion would come up after some time and stay there. In my paper I listed when I noticed first appearances of variations, if it is after 2k it is basically immediately.Sometimes suggestion appears early, stays on top for several 10k variations, and then change. If the game is already decided, you can chose early move suggestion and skip waiting.

In game Metta-Ben David, black's move 139 is very interesting. It is very strong attack on white, and cuts part of his group.It is interesting that it appears only after quite some variations. Now, this was important and difficult move, and it is expected in any case that player in this situation would want to do more calculations on it.

-----

Histograms of entire game that I inserted are actually less than 50k variations.Analysis with 50k or 200k I had to do in several sessions, so I don't have them in one piece.When doing them, I noticed, as you have now, that faster histograms differ slightly from more detailed ones, and that is why I kept them.

Now, regarding deviations histogram, it shows how much Leela thinks that move is better or worse than her's. It is not difference from her moves, moves can be totally out of her suggestions, but with similar chance of winning.After deviations histogram of one side reaches more than 80% chance of winning, you can play every possible move, and it would still be pretty much same as Leela's chance. For same reason, in game Master vs Alpha Zero, one program played completely stupid move inside other's territory - and other program equally stupidly replied there. It s not bad by their calculation.

-----

Here is histogram from a game of one European pro in PGETC R4:

Attachment:

LFHXVHTDDN.png [ 24.79 KiB | Viewed 1172 times ]

It is clearly very similar to Leela, but as I wrote earlier, it has to be examined in more details.That is why in paper I compared move after move, and wrote differences in xls file to be more visible.In this game fighting started early, and there was lot of forced moves on both sides, not surprisingly resulting in lot of Leela's top choices appearing.Tenukis also matched Leela's, but they were obvious even for me.Also something interesting - white had some 5 moves that were not on the Leela's suggestion at all - and they are not listed as bad moves. Please not that in two Metta's online games there was no move in middle game that was not in Leela's suggestions, actually in top suggestions.Overall speaking, I don't think that this game is similar to Leela's as one might think at first. If it lasted longer and if it was not forced so much, more differences would be visible.

I have found more similar short games, those are the games that are mentioned as games with similar percentage as Metta's, but are much different.Overall, short fighting games are not so good for comparation.

Last edited by Bojanic on Fri Jun 15, 2018 1:24 am, edited 1 time in total.

Bojanic found games from other players the league this year with not much red on their GRP Leela histograms, and his suspicion/conclusion was that they were cheating with Leela too. My interpretation of this is that it more likely shows non-cheating players can be similar to Leela.

As explained in previous post, it is not my conclusion.The game we mentioned in PM I analyzed in more details. It is not just histogram similarity - most of the fighting moves were Leela's top choice.Only moves that were not it's top choices were, quite interestingly, two tenukis (both were not so bad moves IMO).I analyzed two older similar games from same player, and in fighting they also contain most of Leela's choices (not unusual since it was forced), but there is less top choices and more mistakes.

Overall speaking in this case, I am very suspicious of using program assistance in fighting sequences, but it would be much more difficult to make analysis than Carlo's, since more games should be analyzed.Therefore I decided to wait to see what will happen to analysis of Metta's games.

Tryss,after move 44 Leela thinks you have 100% chances of winning, and you can play almost anything, she will not consider it a mistake.But if you open GRP file, and go move by move, you will see how much your moves are different than Leela's.

Bojanic, could you update your PDF such that it reflects the discussion since it was first published? I found it very hard to understand what work exactly you did and what not, and which data exactly you used (also for comparison with other players). That is very important. It would also be good to base it on a null hypothesis (i.e. no cheating) and work from that to your hypothesis that cheating in fact did occur.

You have probably read the rebuttal from the Italian professor, which was made public a while ago and which highlighted a lot of valid and very important ideas on how an analysis can and can not be done. If you could work that into your PDF, I believe that would make it a lot more solid.

Bojanic, could you update your PDF such that it reflects the discussion since it was first published? I found it very hard to understand what work exactly you did and what not, and which data exactly you used (also for comparison with other players). That is very important. It would also be good to base it on a null hypothesis (i.e. no cheating) and work from that to your hypothesis that cheating in fact did occur.

I will try to do it today.Although there was lot of time wasting, some of the members gave good contribution.

maf wrote:

You have probably read the rebuttal from the Italian professor, which was made public a while ago and which highlighted a lot of valid and very important ideas on how an analysis can and can not be done. If you could work that into your PDF, I believe that would make it a lot more solid.

I don't consider simple statistical method not accurate enough, either his or Cieply's. In statistics you have same value of forced move, to important middle game, it makes no sense. It can be only basis for additional research.It is better to examine move by move.

@Bojanic - in that forst game you give from Carlos, Leela 0.11 was not even available.

20K simulations is only 4 seconds on a modest GPU. Those are not the moves Leela would suggest whan the thinking time is normal and you run Leela 0.11 in analysis mode. On my laptop with 1050 GPU it will usually be over 300K.

What you call "tenuki" in game 2 is very subjective.

The evaluation digram for other AI's are very different. After 139 in game 2 I let AQ take white against Leela 0.11 and it wins 3-0,

Tryss,after move 44 Leela thinks you have 100% chances of winning, and you can play almost anything, she will not consider it a mistake.But if you open GRP file, and go move by move, you will see how much your moves are different than Leela's.

And? if I cheated using LZ, why wouldn't I play moves different from LZ if it doesn't makes me lose?

It's strange that I stop playing like LZ when I get close to 100% winrate !

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum