AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.

It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.

Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws

Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.

I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!

If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Besides, did RomiChess get its learn file by self play? Did you really not code anything in RomiChess than the rules of the game and let the engine figure out the rest by itself?

You are dealing with selective reporting and not the whole picture. Yes A0 has NN that performs the function of guiding the search using the information in the learn file. That is its function. The miracle is stored in the learn file.

Romi's learn file causes RomiChess to go against its normal evaluation function and play something different, something learned. So your quip, "Did you really not code anything in RomiChess than the rules of the game and let the engine figure out the rest by itself?", is really non sequitur. Anyway, I did not say AlphaZ is identical to RomiChess. I said that they both use the same, learn file trick to win games that they could not win otherwise. I have 11+ years of experience with reinforcement learning. How many years experience do you have?

If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

This is not what the paper describes. They used each game to adjust the weights in the neural network a tiny bit, in a direction determined by the game result. After that, they discarded the game and never looked at it again. The eventual NN that results depeds on all the game positions, for sure, but that result is not stored by position. It only contains knowledge of the kind "If you can capture a Queen with a Knight, it is on average a good idea to do so". There is no way whatsever to trace back from which positions this knowledge came. The N doesn't even remember which positions it has seen. Just the notable characteristics from the total average of all positions it has seen.

But is that correct? There is evidence that they keep a learn file with wins, losses, draws and a percentage chance of winning. Are there enough neurons to remember millions of these stats or is there more to it? In a human brain data storage and neurons are synonymous but in a computer brain neurons and data storage is not synonymous and yet if one wants to use the human brain function as a parable then memory and neurons would be talked about as they are synonymous. So yes the games as a separate entity have been discarded but the 'memory' of the games is stored--in a learn file.

If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

That is how an NN works. The only common factor with RomiChess is that there is some way of reinforcement learning, but the rest has nothing in common. That's probably what so many people got.

Are there enough neurons to remember millions of these stats

That's not now how an NN works. Memorising is one technique that we humans can do with our brains, but actually, it's the least powerful way even we humans learn. It's about pattern recognition without precise position match, which I guess is exactly what RomiChess does not perform.

Dirt wrote:Would this work to beat top professionals at go? I don't think so. I very much doubt that Google switched to Romichess' methods when training for chess.

In Go they would have to break the board up into chunks with recognizable patterns and play the highest percentage moves on those formations, but yes it would work.

If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

That is how an NN works. The only common factor with RomiChess is that there is some way of reinforcement learning, but the rest has nothing in common. That's probably what so many people got.

Are there enough neurons to remember millions of these stats

That's not now how an NN works. Memorising is one technique that we humans can do with our brains, but actually, it's the least powerful way even we humans learn. It's about pattern recognition without precise position match, which I guess is exactly what RomiChess does not perform.

And here are some quotes by some that seem to be in the know.

Truls Edvard Stokke
"Hey Michael, very interesting stuff, this seems like a table-based monte carlo policy evaluation. Impressive that you would independently discover such a thing on your own." " However this is indeed a first step towards the policy evaluation used in A0. " Then in his simulation of Ao on a pc he publishes a chart of a search tree with backed up values. And then in other subjects it is mentioned by more than one that A0 stores wins, losses, draws and a winning percentage and you guys don't argue against it. It can't store all that data in the NN. It has to be storing w,l,d,p data somewhere either in memory or on a hard drive. And to say NN does not work that way is ridiculous. NN can analyze stored data. I might not be 100% correct but what you guys are saying is, it is like those that tell me God does not work like that. Well I got news for you, God can work anyway he likes and so can NN. You might be right but don't say stupid things like NN does not work that way, lol. Is there an emoji for frustration?

If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Indeed. But we are not dealing with academics. We are dealing with a commercial company with a bad reputation. They have their mouth full of ethics but their actions are criminal, like the copying of books, like buying youtube while knowing its massive illegal copyrighted material and only God knows what they do with the data we publicly trust to the internet and thus to them, my name might as well be colored red after this post

Ranting aside, let's talk about what the paper doesn't reveal, the start positions of the 100 games. Those 50 start positions certainly can be learned as Mike described.