This is an incredible demonstration that the AG Zero expert iteration method is a general method. If you go back to the discussions of AG Zero lo a month ago, there was a lot of skepticism that NNs would ever challenge Stockfish et al - they are just too good, too close to perfection, and chess not well suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well in chess: it works better as it only takes 4 hours of training to beat Stockfish. This is going to be an impetus for researchers to explore solving many more MDPs than just chess or Go using expert iteration... ("There is no fire alarm.")

See the thing is though, Giraffe's evaluation actually was better than Stockfish's evaluation function, but it took much longer, and thus wasn't able to search as deep as Stockfish et al. So in a way, the real triumph of the AlphaGo series was the TPU and GPU army.

Unlike in most algorithms where correctness and performance are independent, chess engines can't be evaluated without testing performance at the same time; faster is not just faster, it changes the results.

So there is a tradeoff between the depth of the search and quality of evaluation. For traditional chess algorithms, better evaluation was rarely worth the cost; it would slow down the search so much that it didn't pay for itself.

But this performance tradeoff (like all optimizations) critically depends on hardware. Change the hardware and you change which optimizations are "worth it".

AlphaZero is clearly good at using TPU's to maximum effect. But what would its performance be in a CPU only environment? Maybe dumb but deeper searches still win there? This evaluation hasn't been done.

This isn't to say that the AlphaZero evaluation is "unfair". Rather that chess engines evolved to be too dependent on their environment. Getting maximum use out of CPU's is a strength, but not being able to use TPU's or even GPU's is a weakness.

That's not clear, each (second generation) TPU is 45 FP16ish unspecific TFLOPs. A single board consists of 4 TPUs at 180 TOPs total. This is similar to the Dual P100 NVLINKed Quadro which is an absolutely killer HPC/DL card. I believe they have a similar Volta option, but that kind of HW is above my pay grade these days.

Further, they used 5,000 (first generation) TPUs at 90 INT8 TOPS each, page 4, to run the network during MCTS and 64 (second generation) TPUs to train this thing according to the methods. That's a nice mix of using INT8 for inference and FP16ish for training IMO.

In contrast, I personally own 8 GTX Titan XP class GPUs and 8 more GTX Titan XM GPUs across 4 desktops in my home network. I'd love to experiment with algorithms like this, but I suspect I'd get just about nowhere due to insufficient sampling. These algorithms are insanely inefficient at sampling at the beginning. So I guess I will seed the network with expert training data to see if that speeds things up.

That said, more brilliant work from David Silver's group! But not all of us have 5,000 TPUs/GPUs just sitting around so there's still a lot more work/research to make this more accessible to less sexy problems.

And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.

So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.

Say you need a week to train this, so $200K-$800K...

You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.

Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.

I think his point is if you devote X Flops to something then a fair comparison would be to also give X Flops to the competitor. The specifics of how an algorithm does not matter as much as the total resources used and outcome.

A more fair comparison would be to cap the hardware used at a certain cost. That's much more reflective of the real world. There are plenty of tasks that perhaps you could do more efficiently on a CPU for a given number of operations, e.g. maybe some graphics operations, but in practice it's completely irrelevant because a GPU gives so much more performance for the given cost. There's nothing special about an operation, but dollars do matter.

Sure, but the whole point of the above idea is to compare our 20W computers to what we can build that eats 20W. And don't give Silicon Valley ideas about disrupting the lucrative Mechanical Turk ecosystem by scaling it up with ideas borrowed from growing veal because some VC sociopath will take it seriously. Just sayin'...

And I'm saying that this 20W limitation isn't particularly meaningful, as many organizations have way more power at their disposal to throw at a problem than that. The economics of a given solution, on the other hand, is applicable at all scales.

This argument is about state-of-the-art chess, not chess as a mobile phone game. Humans are so bad at chess compared to the best programs now that even a smartphone app can't be defeated by people.

Also, mobile phones have Internet access, so there's no reason the algorithm has to run on the phone itself. It could run on TPUs in the cloud. It's common for many games to have server-side components. Though this isn't even necessary except maybe if Magnus Carlsen wants to play it.

I think you misunderstood. Sure, if you are willing to deal with the increased costs and lowered reliability you could write a chess program that required massive server resources.

But, I don't think a lot of people would pay for that vs. having a program that just runs on there phone and still beats them. So, in practice without a significant subscription fee you are going to be limited to cellphone hardware.

PS: In practice most games take about as much computing power from a server as a chat app as companies need to pay for that hardware. Remember 1,000,000+ X get's big unless you keep X very low.

Again, this entire article and discussion is about state-of-the-art chess. As in, literally working to "solve" the game and develop optimal strategy. I don't understand what relevance casual mobile chess games have. Computer chess is already very far beyond human capabilities, and it can't be pressed further just using mobile phone hardware (nor is that a reasonable restriction).

It'd be like in a discussion about SpaceX's BFR designs to colonize Mars, someone comes in and questions why they're using retropropulsion since the requisite control systems are infeasibly expensive for amateur model rockets. It's a completely different discussion.

That's not why this is relevant. Given equivalent hardware it's still a worse solution for chess. The value is you can get results of similar quality with vastly more compute power even without 1,000+ years of analysis.

Otherwise the only takeaway is this failed to improve the state of the art.

"Equivalent hardware" is only relevant if we're talking about cost. When measured by that metric, the TPUs are indeed superior. Raw operations is an irrelevant metric given the existence of economic purpose-specific hardware that can perform a lot more of the operations required for matrix multiplication than for general computation. GPUs work exactly the same.

Again, cost is relative to hardware you have. If you own a supercomputer already and you want to run chess on it for whatever reason it matters what the performance you get from each algorithm on that hardware. If your going to buy new hardware it's design depends on performance across every algorithm you expect to use.

So, the only case where chess performance per $ matters is if you are only going to ever use that hardware to run chess. In every other case which is the vast majority of the time you care about diffent metrics.

Some iPhones are manufactured with this, but again if you have paid for the hardware you care about performance on that hardware. If you have yet to buy anything then theoretical performance per $ becomes the meaningful metric.

Same with the Pixel 2. But the Pixel 2 appears to be a bit more powerful than the iPhone neural chip. The PVC is able to do 3 TOPS but we really need instructions supported and word size to truly compare.

But better evaluation gives you asymptotic speedups. You can give Stockfish several times its computation (which is already a lot, I mean, 64 threads, come on) and it doesn't make good use of it since it just runs into the search wall. If you gave Stockfish the equivalent in CPU power (and I'm not sure this is a fair hypothetical since part of the appeal of NNs is that they have such efficient hardware implementations, so it seems unfair to then grant a less efficient algorithm equivalent computing power by fiat), I'm not sure it would be restored to parity or superiority.

Compute absolutely matters. With tree search, there's a tradeoff between scoring cost and positions evaluated. AlphaZero can evaluate fewer positions because it uses a huge amount of compute to accurately score each position.

By starving the competitor of computing power, if you compare A and B you can't give A 10+x the compute power and assume a fair comparison. What's interesting is a demonstration that enough compute power let's NN reach beyond human level play. Though, I don't think that was ever really in doubt.

While I'm glad to see you're excited about this, take note that this is still an approach which requires that an exact model is known, the state is fully visible, and the reward is perfectly define-able and known. Progress in this setup isn't necessarily correlated with the kind of AI for which we'd need a fire alarm.

More seriously, it seems Deepmind and the AI community in general is having a Streetlight effect problem, i.e. looking for AI in what works now, rather than coming to terms with the hard challenges. This explains why there are so many papers on GANs. People are just doubling down on what works (where the streetlight is), rather than acknowledging that where we need to look for AI is dark. Since it's become such a cut-throat race to be the next one to say "we made a breakthrough!", it makes much more economic sense to solve simple problems and advertise them as huge challenges.

I wouldn't dismiss GANs so easily. Yann LeCun was singing odes to GANs - as the most interesting idea in the last decade. The interesting thing about GANs is that they don't use a predefined loss, but instead the discriminator acts as the loss function for the generator - thus, it is learning a loss fn instead of using human guesswork to create it. That's quite a powerful new idea. Applications of GANs include making simulated images look more real, which is essential for RL, generating 'artificial' training images for other tasks and using the discriminator as an image embedding generator or classifier.

I agree that the average Joe will misinterpret the significance of AlphaGo, to Google's benefit.

But most people in the research community already know how amazing it would be to make an affordable household robot or a search-and-rescue robot or a self-driving car. Many labs (including mine) are working on it. The streetlight adds a small bias, but the bigger problem is that we have no idea how to build human-level AI.

Vision is part of the puzzle--a large part in the case of self-driving cars. But blind people are way better than computers at everyday tasks, so I don't think that it's the Big Problem.

Translating to 3D is low-level and relatively easy. That's not the reason why we don't have household robots/self-driving cars.

Framing vision as "object attributes" and "correlate to prior knowledge" might be a good approach for current research. But humans do more--we understand what we look at. We form concepts and models of the world that allow us to adapt to very novel situations.

The main reason why we haven't solved vision, language, playing chess like a human, etc is that NNs are a poor approximation of human concepts. I agree that we probably need more compute and better compute.

Yes, but it doesn't seem like much of a problem? Exploiting a breakthrough before moving on to harder problems isn't cheating, it's the smart thing to do. It might even turn out to be the fastest way to make progress on the harder problems.

Let's break this down and consider things carefully. To informed researchers, what is most surprising here is not that the AlphaGo Zero algorithm beat stockfish but that MCTS managed to outperform Alpha-beta search. I'll venture a hypothesis as to why this was.

Informed skepticism would have discounted MCTS against alpha-beta search but wouldn't have put much stock into the idea that Neural Networks couldn't learn better features than what has been painstakingly handcrafted. We know that given sufficient data and an appropriate architecture, neural nets have achieved better local minima than humans. This shouldn't be surprising anymore. A structurally adapted searcher will always do better in its adapted to domain. A Cat is so good at being a cat, it doesn't even have to think about how to cat. Choice of optimization method, input pre-processing, loss function, hyper-parameters and architecture together define a search space, a structural prior and how to navigate.

Returning to alpha-beta vs MCTS, my view is that earlier work on the chess search space being ill-suited to MCTS has not been invalidated once you account for the synergy between the neural net and search method brought about by the imitation learning approach. What might be happening here is the neural net not only learns to correct when it goes out of bounds, it also learns to account for missteps of MCTS!

The AlphaGo Zero Chess Program is clearly smarter than stockfish from the perspective of its ability to better navigate the search space but before talking about fire alarms there are some things to note.

Assuming the paper, AlphaGo zero does well if you hold compute fixed and adjust time, but how does it do as you move along both compute and time? This is of relevance to the general community, especially if AlphaGoZero skill degrades gracefully enough to allow it to be a better tutor than current engines.

Contrary to the no fire alarm claim, we should see sudden improvements everywhere due to how close joint, structured prediction, reinforcement and imitation learning are to each other. Unexpected improvement across a broad class of problems is a fire alarm. Right now, POMDP or games with hidden information and multiple interacting agents are still very difficult. Structured prediction is still difficult. Granted, this was before AGZ, but Neural Nets+MCTS had to be modified to Neural Self-Play before it could work just ok in poker-like games.

What we should take away is the power of combining searching and learning. I'll argue that what is now being called expert iteration was presaged in an antique 2006 paper [1] where Hal Daume et al discuss the power of a learning algorithm trained to imitate a search computed policy. Even with limited compute and data, you can use similar ideas under the learning to search framework. The imitation approach is what's consistently yielded great results, whether applied to neural nets or logistic regression.

Correction to the above: I stated Deepmind applied Neural Nets+MCTS and achieved ok results. I was actually misremembering two David Silver (Deepmind) papers as one. Smooth UCT modified UCT (popular brand of MCTS) to be able to handle imperfect information games. MCTS does not converge under imperfect information. Smooth UCT is strong at limit poker. Limit is much simpler than no-limit.

Neural Fictitious Self Play based on fictitious play (invented 1950s), is an approach to reinforcement learning using neural nets for function approximation. Typical RL methods like DQN are highly exploitable. Against strong programs, NFSP did okay, with a win rate of -50 mbb/h against the best bot it played against.

Looking not just at Deepmind, there's Deepstack. It's similar to AlphaGo OG, combining CFR+Neural nets. Deepstack did not win convincingly against humans at 2 player no limit hold em.

The general point I'm trying to make here is that Chess and Go are closer to checkers than to poker, which is itself a constrained game with known rules. I mention all this and this Deepmind paper: https://arxiv.org/pdf/1711.00832.pdf, to provide a sense of scale to those talking about smoke and fire alarms.

Probably the wrong engine to test this with then. Although it's interesting nonetheless. It's pretty well known that chess engines have this trade-off between searching and evaluating. Among the consistent top 3 I suppose Stockfish is the easiest to test, being open source and all. It's pretty well regarded that Komodo has the best evaluation function though. Even if it doesn't keep up with the nodes/sec of Houdini and Stockfish, it's consistently up there with the top 3. The other chess engines doen't even come close. (Fire is probably number 4 but is on a league of it's own. Not quite good enough to challenge the top 3, but eats everything else.)

I know it's complicated, between the hardware differences, search method used, etc. But when claiming that NNs beat hand crafted evaluation functions, keep in mind that Stockfish is probably are not the best choice to compare, since it has made different tradeoff choices to get more depth (which goes back to search method and hardware choices).

Yeah, I'm quite confused that there's no mention of SEARN or LOLS or similar imitation learning algorithms in the references of the Alpha Zero paper. The algorithm for learning looks severely derived from that 10 year old idea.

It's certainly not the first NN chess program. You may remember one of OP author's Giraffe NN (https://arxiv.org/abs/1509.01549) which was essentially 'AlphaGo for chess'. But like the original AG, it struggles to learn and Lai had a lot less computation as a student than he does now at DM. What they're doing is applying AlphaGo Zero expert iteration with some simplifications and TPUs. And that pwns previous work like Giraffe the way AlphaGo Zero pwns AlphaGo. Quantity becomes a quality all its own.

Look at Figure 2, and remember that DM has access to a lot of hardware. At short thinking times, AlphaZero is weaker than Stockfish. This is equivalent to longer thinking times with weaker hardware, and it is likely that the former applications of NNs to chess had hardware that was a 1000-fold slower than what DM has access to. This means that even if the approach was identical to DMs, they would not have seen a better performance of NNs than the classical alpha beta approach.

they are a huge company (Google) with access to top top top talent (experts) and infinite hardware resources. I don't know why it would be surprising if they acheived performance that hadn't been acheived before

The best metric is total cost, including the cost of the hardware as well as the electricity. It might be worth prorating the hardware by the amount of time it spends on the task, too, assuming the hardware is general enough for many purposes (like TPUs are), vs say something like EFF's DES cracker which was not.

I don't see how this is relevant though. A GPU also provides graphics rendering performance equivalent to some boatload of CPU-hours, but who cares? GPUs exist and are used for the tasks they are good at. TPU hardware isn't theoretical; it does exist and it is being mass-produced.

Yes, it needs a boatload of very simple compute (8 bit operations), the kind that CPUs are not even close to ideal at providing economically.

Yes, it is; while the idiom standing on its own implicitly includes a leading “at least”, it is also idiomatic to use it in exactly the way used by the grandparent post, in an explicit contrast with better, where it comes with an implicit (or sometimes explicit) leading “merely” instead of “at least”.

It's unnecessary, though, and makes the point harder to read. "It works even better" would be a perfectly sufficient description. "It works not as well as but better" is an unnecessary rhetorical flourish.

One impressive statistic from the paper: AlphaZero analyzes 80,000 chess positions per second, while Stockfish looks at 70,000,000. Seventy million, three orders of magnitude higher. Yet AG0 beats Stockfish half the time as White and never loses with either color.

It would be interesting to see if there were some way to extract a couple of new heuristics from AlphaZero that could be implemented fast enough to incorporate in Stockfish's evaluator though. I suppose this is the age old problem of black-box models: _why_ does it think this?

I dunno, seems like Google would just do this instead of keep around the pesky neural net at runtime. There's an _awful_ lot of computation going on inside, and it's necessarily hugely interconnected. I'd be impressed if someone had already done it, but it seems a great avenue of research if not. I suppose it goes hand in hand with models for which you can actually _explain_ their results, which certainly is an active area of research.

There are well-known techniques that work pretty well to shrink neural nets a lot while keeping almost all of their performance. See Geoffrey Hinton's model distillation papers.

The first AlphaGo paper had a system that used tons of computation, and was followed up by one that used much less and worked even better. Not speaking for Google, but I think it's a bit of a race to publish great results first. I wouldn't be surprised to see something better than this that uses 1000 times less resources published in a year or two, just like what happened with Go. First prove it's possible, than figure out how to make it much more efficient.

A really good example of model distillation also comes from DM: their new realtime WaveNet used in Google Assistant. The first WaveNet was ungodly slow due to redundant computation; but even after that, it still was not realtime simply because the CNN is too deep and slow. But you need the CNN to be deep & big in order to train good audio generation. Model distillation to the rescue: take a wide fast small CNN and train it to imitate the slow deep WaveNet. Result: WaveNet quality realtime voice generation which can be deployed to the masses.

"We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon." <- Amazing!

I think being able to play tactically perfect chess over 20 or so moves will often look weird to human strategic sensibilities. The computer sees every tiny exception to the patterns and heuristics you've incorporated into your gut feel about positions. In a way these moves are right just because they're right, and that's what's jarring - there's no _principle_ behind them that can be learned and generalised, which is something humans struggle with in all walks of life.

Except AlphaZero doesn't evaluate nearly as many moves as Stockfish (80Knps vs 70Mnps), so in a sense, it has exactly generalized a principle (or likely a whole lot of principles) that allows it to estimate positions much better than Stockfish.

Of course you are right about perfect play, but the human-like aspect is part of what is exciting about these new Alpha engines.

There's definitely nothing fishy going on, although it'd be nice to see a fully loaded Stockfish on its full complement of 512 cores and a proper endgame tablebase to really slog it out with AlphaZero.

The whole thread is pretty hilarious. In another part of the same thread there is this comment:

we're in a similar space -- http://www.getdropbox.com (and part of the yc summer 07 program) basically, sync and backup done right (but for windows and os x). i had the same frustrations as you with existing solutions.

let me know if it's something you're interested in, or if you want to chat about it sometime.

My perspective as FIDE master who has played Ruy Lopez Exchange type of positions for 30+ years.

9. Qe1 is a pretty normal maneuvering move

13. Ncxe5??! looks like a major howler.

Ask 100 strong chess players and 99 of them would completely ignore it.
You are giving up a piece for two pawns in an open position and black has no real weaknesses. There is no real basis for a sacrifice.

This shouldn't work. The crazy thing is that Stockfish almost makes it work.

It is the kind of move you play when you absolutely must win and must win now.

The only reason Stockfish considered it is because of white pawn on a5 giving additional tactics in breaking up black pawn chain with a6 a couple of moves down. With pawn on a4 Ncxe5 wouldnt be worth attempting.

The crazy thing is that being such a bully almost worked!

At move 28. White looks very solid, with 3 perfect pawns for the piece + black has horrible weaknesses.
29. g3 is a bit suspect but the next super computer move is

31. Qxc7 this has to be losing but it is a typical computer bully move.

If you have seen the Stockfish project you will see many hardcoded weights in the configuration, found through experimentation. All these adjustments took probably years to achieve... and now Alpha Go Zero just self-learns everything and surpasses it.

Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.

Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.

I don't think this technique immediately applies to Stratego because it's not a perfect information game.

I suspect it would exceed the state of the art in Arimaa, since Arimaa is specifically designed to have a high branching factor (17281 -- compared to 35 for chess), and this technique was designed to work well in high-branching factor games (since Go is a high-branching factor game, though much lower than Arimaa).

I wanted to contact the authors directly but can't seem to find contact info at the moment, with a question. I hope some of you might know enough to answer it.

I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".

2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2,
14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently
be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions
(5)"

Shogi is a fun game, it always feels a little sad that it doesn't get more exposure outside of Japan (and my understanding is that, by and large, in Japan it is considered an "old persons" game)

Because captured pieces change sides, there is less of an "endgame" scenario, and as a beginner (like me) it is very easy to put too many captured pieces back into play, which makes it hard to defend everything and essentially you end up giving them back to your opponent

I've been interested in learning both shogi and xiangqi for a while. If anyone knows a nice engine with graphical frontend for either game, I'd love to know. Wasn't able to find much the last time I looked.

I'm curious to see if "San Gatsu no Lion" (the Lion of March) will spark interest. I highly recommend it to anyone interested in more slice-of-life/drama kinds of things. It's quite a beautiful anime/manga, even if the shogi isn't quite centre stage.

Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).

I can’t see any reference to whether Stockfish was configured with an endgame tablebase. It’d be interesting to see results then, as you’d expect AlphaZero’s superior evaluation to give it an advantage out of the opening, but later in the game Stockfish would have access to perfect evaluations. Obviously there’s nothing stopping you from plugging a tablebase into AlphaZero but that feels wrong.

I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play, while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an impressive achievement.

This is definitely a scientific paper. Pretty much no scientific paper comes with source code and the majority of scientific papers are not reproducible without an entire university department of resources anyway.

My main thing about source code and scientific papers is that it would just be so easy to release the source code along with the paper. Even if people don't reproduce work source code would often help to understand it as often I'm a little unclear on implementation details, which source code would be able to greatly clarify.

Why? How are any of the factors you mention related to verifiability? How does being supported by public funds with academic personnel from multiple universities make LIGO any more verifiable for me at home? At least I can run these games against my stockfish, thus verifying the result. The method I cannot verify, but being able to verify the results is already more than most of science.

This raises an interesting concept. If you cannot reproduce an experiment because of lack of resources, can you believe it? Or is this the equivalent of 'photoshopping your results'?

A similar problem exists in cosmology. Can you verify the multiverse model if you only have one universe to experiment in?

As data storage requirements in RAM and TPU power requirements increase to run certain models/algorithms, machine learning is becoming more obscure. Not only can we not understand how an AI is reaching its conclusions (inscrutability), we cannot even probe it (by tweaking parameters, etc) to find weak points (inaccesibility). This is actually a good thing. Where humans cannot tread, there can be no evil?

Stockfish plays like an ambitious amateur in the first game, giving away a piece for two pawns on move 13.

Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40,
Stockfish gets its own knight trapped and the game is over.

Yeah, that game was kind of different from the others - in the other games the feeling I got was that over time AphaGo's pieces got increasingly effective while Stockfish's pieces would get bottled up and lose their mobility.

Very happy to see this result. It's like a moral victory for humans, as alphago is more human like (discounting montecarlo search) than stockfish. Maybe deep learning will give us the next Euler, Newton, or Einstein.

Shogi, chess and Go are "perfect information games", meaning you can see the whole game state. It's a whole different thing to be able to solve games where you don't see everything (based on uncertainty).

A big class of imperfect information games can be modeled by having a record of everything the agent has seen so far. Then it has exactly the same, if not more, information available than a human player in the same position. We know that with equal information AIs can make better decisions than humans (see also, AlphaGo :] ) so at that point the AI could reasonably be expected to achieve superhuman performance.

The "imperfect information games are harder for AI" crowd are going to be surprised by just how badly humans deal with imperfect information. AIs have a much better memory than humans do, and much more potential to use actual probability which humans are truly shocking at utilising (although neural networks don't seem to utilise this edge; so far).

The difficulty of imperfect information is from cross cutting through information sets and partial observability. With perfect information games like chess or Go, one can solve subgames with guarantees that the equilibrium is the same as for the full game. This is not the case for games like poker, which is why they have been difficult. In addition to that, for n > 2 players, there are no longer theoretical guarantees about converging to a nash equilibrium, which makes designing theory guided algorithms harder. Though empirical performance with n=3 of CFR is encouraging, I know of no results for n > 3.

Earlier this year, DeepStack, a system combining neural nets with search, competed live against humans without any side being dominant. Search policy guided training might improve its results, which are impressive compared to even 5 years ago, but this highlights how much more demanding imperfect information games are.

Yep, this. Btw there are some encouraging results for n=4 using sequence form replicator dynamics (which are implementing a form of CFR) in Kuhn poker. Toy example but the game gets large fast with n=4. Don't know of any results with n > 4.

i'm not sure deepmind would publish a paper in which they describe a winning high stakes online no limit holdem player. the ethics would be quite shady. for all we know, they might have already done that just to see if it works.

I disagree. Computers have been outplaying the best humans at chess for two decades, but they only recently beat the top players at 2-player NLHE and only with the aid of massive computational power during training.

Furthermore, techniques like monte-carlo tree search used in AlphaGo don't work very well for poker - You can't just try and find the "best move" from the current game state, or you will end up playing a highly-exploitable strategy. You essentially have to solve the entire game every time (or completely in advance) to make sure you are playing a balanced strategy.

Only the Counter-Factual Regret Minimization algorithm has been able to achieve this level of play in Heads Up, and right now it looks hard to scale to poker games with more players, like the full-ring games you see at the World Series of Poker, for example. We still have a ways to go in Poker AI.

What an amazing result! Evaluating fewer (by a factor of 1000) positions AlphaZero still beats Stockfish.

In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?

Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)

What would be a good starting point to learn about the AI behind that for a "normal" programmer? There seem to be so many resources now that it's hard to choose. Combination of hands-on plus theory would be good.

For reinforcement learning, I hear Barto and Sutton is very readable, but I haven't read it myself. You can just pick the concepts up by reading papers. The introduction in the Deep Q-Learning paper is not great, but it's how I first learned the concept.

If I run SF on my desktop computer it will kill SF run on my phone. It doesn't prove anything.
Comparing TPUs and CPUs is hard but they could've at least let SF run on what is considered top of the line setup and sensible settings (1GB hash memory is very limited, 8GB is standard for rapid games on a quad core CPU, let alone 64core one).

Back when AlphaGo was playing Lee Sedol I was thinking about a chess playing version in TCEC.

The interesting thing is TCEC assumes a bit about the structure of the chess program. That is, the TCEC win-adjudication rule says that if both programs agree that one program is 6.5 pawns ahead for 8 turns in a row, they judge that program to be the winner.

But programs like Alpha don't have an evaluation function that operates in conventional units (like centipawns).

> We also measured the head-to-head performance of AlphaZero against each baseline player. Settings were chosen to correspond with computer chess tournament conditions: each player was allowed 1 minute per move, resignation was enabled for all players (-900 centipawns for 10 consecutive moves for Stockfish and Elmo, 5% winrate for AlphaZero). Pondering was disabled for all players.

Houdini for example tries to make it so that +1.00 evaluation is a win in 75% of cases in blitz games and +1.5 represents 90% chance of winning (http://www.cruxis.com/chess/houdini.htm). Anyway, this is not a problem at all, this was introduced so less electricity is wasted when the position is a clear win/loss.

I wonder if being an expert at one game makes it easier to be an expert at another. If so, then maybe the examples are datasets, and convergence would be able to complete new tasks after a few examples.

Really interesting question. Some strategic concepts may transfer, say, from chess to chess variants. However, a simple change in the rules can have a huge impact in the game mechanics as anyone who has tried chess variants [1] knows.

Well, it's not doing anything like that for now. Even though the algorithm, in an abstract sense, is the same for all three games, in fact it's a new network for each of the three games, with architecture and input features adapted to the game, and then trained from scratch.

It looks as if it doesn't play 1.e4 much as white. Since these statistics are for self-play games, that means it won't get a lot of opportunities to play 1.e4 c5 as black. Still, it does seem as if it likes the Ruy Lopez and French better as black than it does the Sicilian. (It would be nice to see a little opening "tree" with move probabilities, rather than this list of 12 most-popular-among-humans openings.)

[EDITED to add:] A couple of other remarks:

Playing against Stockfish, the Sicilian seems to give it more wins as white and more losses as black than any of the other openings listed here.

What's shown here are two particular versions of the Sicilian; for all we know there's a lot more 1.e4 c5 in its self-play than the graphs suggest (e.g., maybe as white it prefers 2.c3 or 2.Nc3 or something). Eyeballing those graphs, these 12 openings account for substantially less than half of AlphaZero's self-play games.

So when are they going to apply this to Atari Games or well anything? The next step is they have one AI figure out the rules by making a GAN that imitates player behavior and the other AI be Alpha Go which tweaks the GAN inputs to generate different moves to win. Voila...Almost General Purpose AI that can learn to play any game.

The main problem is that we still lack good generative models and good ways of interrogating them. GANs are unstable and difficult to apply to time series, VAEs suffer from posterior collapse, WaveNet/PixelRNN grow with the input size and overemphasize the details, RNNs are hard to train because we lack good training algorithms. Generally, small errors tend to compound in step-wise predictions because NNs do not generalize very well and gradients tend to vanish and shatter. If you just regard computation time to roll out the future, modeling domains in which the rules are simple enough to be hand-coded and evaluated quickly (such as Go and Chess) probably makes MCTS a million times more suitable compared to domains in which you need a complex model.

To expand on eref's comment a little: you absolutely could apply this or MCTS to ALE (and Guo et al 2014 did it very nicely). After all, the ALE is deterministic and simulatable by definition, so of course you can explore the game tree and reset the simulation as necessary. But people aren't much interested in this approach because using the ALE as a 'simulator' is cheating as far as testing full-strength AI techniques (we don't have simulators of the real world, after all), and the ALE games themselves (unlike Go) are of little intrinsic interest so there's no real benefit to engaging in cheating.

No. DM only occasionally releases software. Expert iteration is simple enough that someone can code it up on their own and there's already a few clones, so if anyone cares to train their own, it's doable, although it may take a while.

Leela zero (the main alphago zero replication project) is a crowd sourced computation effort that's going to take a fairly long time to get anywhere.

And from this paper:
> "Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters,
using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks."

You don't have to start from zero though. It's cool that it works with google scale resources. But it seems like it would be faster to initialize with a neural net first trained to mimic the moves of an existing chess or Go AI. And then improve it from there.

>"Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.

I don't think it's definitely true that will work well. AlphaZero did significantly better than the original versions of AlphaGo (which did learn from existing human games). However, even training those nets will still take a fairly intensive amount of computational resources.

As for that koan, I'm not convinced it's very applicable here. My interpretation of the koan is that the entire setup (training process, structure, etc.) all encode domain knowledge. In this case, I think AlphaZero's domain knowledge is transferable enough that I don't think it's relevant.

I'm only a fairly pedestrian chess player, but I looked at one of these games between AGZ and SF and aside from the endgame, AGZ played in a manner that almost seemed alien. It seemed to completely ignore various little rules of thumb which is to be expected in hindsight but fairly mind-blowing when you actually watch a game.

The more interesting metric going forward is performance at a given power budget (not unlike with motorsports). The TPUs are consuming sooo much power here! Most interesting real-world problems are power-limited, including in nature (e.g. metabolic limits).

This paper compares AlphaZero to the 20 block version of AlphaGo Zero that was trained for 3 days. Am I right in thinking that this version was significantly less strong than the 40 block version? If so, does it matter?

It would be interesting in one way though:
Magnus says he hates playing against computers, because "it's like being beaten by an idiot". Modern chess engines still make moves that are somewhat strategically weak, but they make up for it with amazing tactics.

It would be interesting to hear if Magnus thought AlphaZero played less like an idiot.

A lot of the graphs in the paper seem to level out as they hit the level of the opponent. It makes me wonder to what extent AlphaGo Zero is merely optimizing to beat flaws in existing opponents' current implementations (even if "existing opponents" == all available opponents' data and algorithms today) rather than generalizable insights into the underlying game. Because wouldn't you expect that unless we are at the theoretical limit of perfect chess that a tabula rasa approach might exceed existing best practice significantly, especially with the massive computation advantage it has?

Not that there's anything wrong with that; AlphaGo Zero supposedly optimized for the "just enough" win rather than the crushing win. It doesn't even mean Stockfish is doomed--I suspect Stockfish could beat it in a future heads up match provided that Zero didn't have time to retrain, but that a retrained Zero (having the benefit of optimizing against a new Stockfish) would be able to supersede it once again.

> A lot of the graphs in the paper seem to level out as they hit the level of the opponent.

DM is no longer investing much in the AG research program; Silver said the team has been disbanded already. If you look at the Go graph in this or the first AG0 paper, Zero was still getting better at Go when they shut it down, it hadn't converged. They just didn't want to tie up the TPUs. I don't think it's a coincidence that the graphs tend to stop after they reach superiority.

(Also, as Houshalter says, one of the critical aspects is that this is pure self-play ie the NNs never play against the existing engines except for evaluation. So it's all independent from-scratch reinvention.)

It's not. It learns entirely through self play and never learns from playing it's opponent. Diminishing returns isn't unusual and happens in every domain. These AIs are probably playing close to the limit of what is possible, just not quite there yet.

Are there popular games where the best human players are not near the limit of what is possible? Obviously you can construct one to be hard for humans (large 3SAT problems, or even big arithmetic problems), but I wonder if there is one that people enjoy.

I'd assume that for pretty much any nontrivial game the best human players are nowhere near the limit of what's possible. Humans can play a perfect tic-tac-toe, but for everything in the realm of go, chess, poker, bridge, etc the theoretical ideal is far beyond currently best human players.

ELO ratings level out eventually for a given pool of opponents. If a player already wins every game against all available opponents, there's no evidence that can tell you if they suddenly got twice as good.

If tracking improvements past the state of the art is important I think they'd have to freeze the algorithm every 400 ELO or so and rate the improved versions against the last snapshot.

(Doesn't really apply to the stockfish case, but it does to the other two games.)

Certainly a significant achievement. Also, kind of interesting that the AlphaGo team spent a lot of energy to convince us Go is much harder than Chess, only to turn around and tell us that it is amazing that it can also win at Chess.

The point is that real problem domains are not neatly partitioned and labeled.

I don't know what kind of input the NN itself gets, but computer vision is enough to translate a photo of a chessboard to a usable symbolic representation. But it would be nice to already have a black box-ish computer program that figures out what's the game at hand and how to play it.

The next variation is have the adversary start playing a chess variant and have the machine recognize it (assuming honesty) and play it to significant skill. Then "real life Pong" where the size and aerodynamics of the ball are unknown to it. This is the gist of human intelligence: answering questions is significantly easier than figuring out what the question is.