Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general-purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf.

...

At the end of the game, the terminal position is scored according to the rules of the game to compute the game outcome -1 for a loss, 0 for a draw, and +1 for a win.

This is done for training, not for play, so it is not actually a search, but a tuning method.
Why everyone confuses tuning with search?
They are still doing alpha-beta.

Wrong. Unless stated otherwise, the player is identical to the trainer. That is one of the reasons this is such an interesting advance, both a new eval and tuning method and a previously discarded search method.

Your statement is just based on their claims.
Could you explain this MTCS in a bit more detail to me, to tell you why it is simple alpha-beta.
Alpha-beta is synonynous with picking the best move; an aprroach that does not pick the best move is simply impossible theoretically.
Again, why don't you think a bit: 2800 elo on single core, what kind of a breakthrough is that?

Lyudmil Tsvetkov wrote:Would not SF on the very same hardware, if adapted, be still 400 elos stronger?

No, it wouldn't, for the same reason that Stockfish doesn't harness the power of available GPUs. These TPUs are quite a different design than CPUs. TPUs are a bit like GPUs modified in hardware to be more efficient in neural networks instead of graphics.

The big thing here is that they have a truck load of simple modules than can perform identical operations on different data at the same time. Perfect for neural networks - and for graphics, which is why graphic hards have long been used for neural networks purposes.

By contrast, conventional CPUs are good at performing different and complex operations on input data at a high rate. That is good for if/then/else-branching, which chess engines essentially do.

Software that was designed for CPUs cannot take advantage of TPUs. The TPU hardware would be useless for Stockfish. The other way round is also not that promising: trying to run a neural network on a CPU doesn't yield performant results.

Is there a problem to emulate alpha zero on a CPU(same algorithm on a CPU)?

I do not care if it is going to be 10 times slower and most engines cannot beat stockfish even with 10:1 time handicap.

Lyudmil Tsvetkov wrote:Eelco, how can you buy into the SCAM too?
The hardware advantage was 50/1.
It plays 1850-elo chess on a single core.

Above diagram is already way way won for black; Stockfish blundered already in the opening with Nce5, this is already lost.

Alpha beating me? Gosh, I will shred it to pieces.
It understands absolutely nothing of closed positions, no such were encountered in the sample.

It is all about the hardware, 2 or 3 beautiful games, with the d4-e5-f6 chain outperforming a whole black minor piece, one great attack on the bare SF king and one more, all the rest is just exceding computations.
Nothing special about its eval.

Just another crappy, misinformed, false bravado post from a guy who quit his job to pursue the quixotic dream of finding the perfect old-school end point evaluation for an alpha beta searcher. He can't accept the fact that his years of painstaking effort have been rendered moot by a project that was just a "meh, let's spend a few hours and see what happens" lark by the team that already conquered Go, a much more complex game than chess.

Only that "adapted" would mean "complete rewrite", and then it wouldn't be Stockfish anymore. The revolution in computing is both having the self-learning software framework and the hardware to meaningfully run it.

Guenther wrote:Rasmus do you really think he will understand that after all he said before?

I've never seen Lyudmil posting derailing stuff. The main point isn't even technical understanding, it's the perspective that equates computing with doing stuff on x86-like CPUs (or ARMs or whatever). It takes a while to let it sink in that this is a completely different way of computing.

I'm also somewhat astonished about the whole discussion whether the hardware was fair or not. As if Stockfish's defeat had been the point here. We're looking at nothing less than a revolution in computing, and people discuss whether some change here and there might have given a handful of Elo more to Stockfish.

Those are 2 completely different things: revolution in computing and revolution in artificial intelligence.
You rightly state this is a revolution in computing, but why do they only stress the AI part then?

it may be because of this Lyudmil ...

QML 35 days ago [-]
What makes this different from a minimax algorithm with alpha-beta pruning?

gwern 35 days ago [-]

Aside from MCTS being a different tree search method, there is no 'closing of the loop'. In regular MCTS, it is far from unheard of to do the random playouts with instead some 'heavier' heuristic to make the playouts a little better estimators of the node value, but the heavy playouts do not do any kind of learning, the heuristic you start with is what you end with; what makes this analogous to policy iteration (hence the names for the Zero algorithm of 'tree iteration' or 'expert iteration') is that the refined estimates from the multiple heavy playouts are then used to improve the heavy playout heuristic (ie. a NN which can be optimized via backpropagation in a supervised learning of board position -> value). Then in a self-play setting, the MCTS continually refines its heavy heuristic (the NN) until it's so good that the NN+MCTS is superhuman. Then at play time you can drop the MCTS entirely and just use the heavy heuristic to do a very simple tree search to choose a move (which I think might actually be a minimax with a fixed depth but I forget).

Aside from MCTS being a different tree search method, there is no 'closing of the loop'. In regular MCTS, it is far from unheard of to do the random playouts with instead some 'heavier' heuristic to make the playouts a little better estimators of the node value, but the heavy playouts do not do any kind of learning, the heuristic you start with is what you end with; what makes this analogous to policy iteration (hence the names for the Zero algorithm of 'tree iteration' or 'expert iteration') is that the refined estimates from the multiple heavy playouts are then used to improve the heavy playout heuristic (ie. a NN which can be optimized via backpropagation in a supervised learning of board position -> value). Then in a self-play setting, the MCTS continually refines its heavy heuristic (the NN) until it's so good that the NN+MCTS is superhuman. Then at play time you can drop the MCTS entirely and just use the heavy heuristic to do a very simple tree search to choose a move (which I think might actually be a minimax with a fixed depth but I forget).

I don't know who wrote this, but this is just a load of bollocks. It has absolutely nothing to do with how Alpha0 is trained or run.
I really wonder why it is so hard to read AGZ Nature paper??? Or are some simply incapable of comprehending it?

Mr. Kaufman, in the chess.com article on AlphaZero, you were quoted as saying "...AlphaZero had effectively built its own opening book...". Could you elaborate on that statement? Did it build its opening book in the sense that it stored moves in a lookup table for opening positions? Or did its training session compute values for its neural network parameters that were used in its search of the opening positions, in the same way that they would be applied to any other position?