Just monitor one box and add time to the other player when the timer is still.

One minor problem is when Lee takes breaks and leaves the table. 1) It shouldn't count towards his thinking time and 2) if he leaves the table during AlphaGo's turn he can ask for the clock to be paused until he returns. (this happened once during game 4 when he was in overtime)

Two times. The third time he'd be out of times. The way the times work is they have 3 periods of 1 minute each. If they use up an entire period, it's gone. If they take, say, 30 seconds, then the period isn't consumed and restarts on their next turn.

They used the byo-yomi system. Essentially you get the main time (2 hours each in this case) followed by overtime (1 min per move). You also get overtime periods, in this case three periods, in which you're allowed to go over your overtime. The idea behind this is that once in overtime you'll still need to make 1 move per minute in general, but if needed you get two chances to go over that and spend an extra minute if a particularly complex situation arose.

You only lose a period if you go over one minute. So it goes like this:

2 hours main time is used up. You are now in overtime. You have one minute per move with 3 overtime periods. As long as you use up less than one minute per move you won't lose any overtime periods.

You go over one minute for the first time - you now have two overtime periods.

You go over one minute again - you now have one period (you're on your last period).

You go over again - you lose.

Edit: so looking at the graph it seems Lee used up his two hours around turn 43 (step 1). He went over one minute for the first time, losing a period on turn 46 (step 2) and again on turn 54 (step 3). Thankfully he didn't do it again after that :P

Yea that's right, if you lose a period you still only have one minute per move. If you go over that you'll lose another period. You can think of each period as like a "bonus minute" you can use up whenever you want.

Yes it keeps counting down after you go over a minute. If you use up your first overtime minute, you will lose it, but you might as well use the next 59 seconds to think some more, then make your move before you lose the second overtime minute. Then it will reset back to 1:00 with 2 overtime minutes.

You have three periods of 1 minute for overtime. If you don't use the full minute, it rests. If you use a full minute you lose one period and start to tick into your next period. If you use all your time you lose.

Analyzing variations is pretty standard stuff for commentary. The idea is to give you insight into what future possibilities the players are thinking about when they make their moves. Speaking as an (amateur) chess commentator, it can be difficult to do this in real time and still return back to the game for crucial moments. Overall I felt they did a good job.

keep in mind that he was analyzing the game in real time. The brilliance of the move may not have been as apparent at that stage of the game, but afterwards people would analyze and agree that that was a brilliant move.

The point is less about winning and more about how the winner does it. To that extent, playing more games tells you more about the AI.

For example, the mistake it made in this game is going to get a lot of analysis to figure out why it happened (and if the root cause can be fixed/exploited). Among other things, we might learn something about general AI. We will definitely learn something about Go, and about these two players.

From this, we can see that Lee Sedol's strategy was to perfect his midgame, so that the time constraints during the endgame didn't matter. AlphaGo, however, doesn't really have a good time management policy, spending more or less the same time on each move.

I think the point about time management was not that alphago was running out, but that for critical portions of the game, it may be wise to spend a long time considering your options. Alphago's usage of time is much more consistent than Lee's is. Perhaps there is a tweak to the time management strategy to allow alphago to give itself more time to consider it's move at critical junctures in the game.

I'm sure they did. I'm observing the game effectively was won whilst alphago still had significant time remaining. So at least for this game, it was too conservative with time. Whether that's a common problem, or just something in this game which was a one off you can't tell from so few games. But it's just an observation about how time usage played out in this game, and that the developers may go back and consider if this is a more common issue. I'm not claiming that it's novel. Optimization is all.

AlphaGo is based on Deep Neural Networks. It does not have hand-coded features that can be "tweaked" in the way you might imagine from traditional AI. In order to change this behavior, developers would need to train the model in a way that guides AlphaGo towards finding these "critical junctures" by itself. It's easy to look at one game in hindsight and say AlphaGo didn't think long enough but you cannot assume that the strategy of thinking longer in the mid-game would generalize to all games.

I am not a Go player, but am heavily into games. In a good game, it seems like a common pattern that mid-game I might notice something, and take a long time analyzing how it can be turned to a long-term advantage, hopefully leading to a move that looks unthreatening but begins the stratagem. So, I'd look specifically at the sections of code that recognize patterns in, and picks response to, the current scenario. Change it with the idea that sometimes you might have 10-20 times as much time as usual to calculate possibilities forward.

How to improve the pattern-recognition to decide when? Well, that's the fun one, isn't it?

So, I'd look specifically at the sections of code that recognize patterns in, and picks response to, the current scenario.

This illustrates a really important point. You largely can't do that with a system like AlphaGo. Your ability to introspect neural networks is incredibly limited. It's very, very hard, and in most situations functionally impossible, to find "the section of the code" you're talking about. The "code" in question is a bunch of higher order properties of a bunch of matrices. They're almost completely opaque. It may not even be appropriate to talk about a "section of the code" that does this - the decisions it's making are probably not local (i.e., it isn't like one particular matrix or configuration of matrices deals with one particular type or category of configurations).

It's not just one big black box though, is it? I imagine there must be design considerations that relate to how patterns in the current scenario are recognized, and deciding when/how to go how deep exploring possible branchings.

The design considerations are considerably more abstracted than figuring out how patterns in the current scenario are recognized or how deeply to explore a possible position - the choices you make as a designer are things like the learning rate and the number of hidden layers.

You don't ever actually program it to recognize situations or to explore certain branchings to certain depths* - you create a sort of "blank slate" and you can tweak the dimensions of the slate and how easily it can be written on and things like that, but that's it. You create the blank slate and then feed it a huge number of games of Go (first actual professional games and then it starts playing itself to generate more data) and it tries to optimize how deep to explore branchings, how to allocate its time, etc. Trying to actually figure out what optimizations it came up with is usually impossible - they're often very, very nonlinear and they depend on the distributed behavior of a potentially huge number of opaque elements that are ultimately calculating really high-order statistics.

* You can, and sometimes people do when designing things like this, but then you still build out the network on top of those prescriptions, the "logic" behind the network's decisions is still largely inscrutable, and you run the risk that your prescription is actually suboptimal compared to the function the network would have computed itself.

Thanks! I have a little out-of-date familiarity with neural nets and AI, doesn't sound too different from what I recall.

This is probably obvious, but I thought of depth as a variable to influence time investment. Sounds like number of hidden layers would be another variable that would hit that note. (I vaguely recall some people in the 1990s saying they had proven that anything you can do with more than three you can do with three, did that turn out to be incorrect?)

I guess I'd have to look at the several black boxes and what they do and how they relate to see if I see anything that hit the same note as the attention to heuristics I was going for. Do you have links with their design or are you going from other Go/game-playing code? Is AlphaGo's code public?

I was basing what I said on what I've seen in interviews and blogs so far, but I just went ahead and read the Nature paper.

What they're doing is using two (absolutely gigantic) convolutional neural nets and a Monte Carlo tree search.

One neural network is a "policy" network that they trained by simply having it try to predict the next move in a dataset of professional games from an image of the board and the move history (they mention other inputs, but they don't say what they were, and they apparently increased accuracy by 1.7% from 55.3 to 57.0).

Then they did reinforcement learning by having the policy network play itself (each new iteration played a random earlier iteration), optimizing over game outcome (so it gains the ability to learn novel strategies).

(There's actually a second, faster, smaller policy network involved too that they use to implement a mixture of MCST rollouts with their novel algorithm.)

Apparently that alone got them an algorithm that could pretty reliably beat several of the best Go programs without even doing any kind of search (without reinforcement learning, the network very reliably loses).

Then they used the policy network's gameplay to train a "value" network that attempts to model the probability of a given game state leading to a win (i.e., it approximates a value function). The innovation here was to use the policy network's gameplay to generate a huge number of games and sample only one position from each (if you instead naively try to train it on whole games you get massive overfitting thanks to the correlation between board states in the same game).

Then they use those to guide a MCST (basically, the policy network prunes the tree and the value network lets you avoid having to search to full depth before you get an estimate of outcome).

They don't actually mention in the paper how they adjust the timing - in the games they use in the methods they just give it a flat 5 seconds per move. Clearly that's not what's going on in the game against Lee Sedol since its move time varied. I'm not sure how they did it, but it's still the case that neither of the networks they're using can be easily introspected to programmatically assign more time to particular classes of positions that have been a priori determined by the designers.

Also bear in mind that AlphaGo has seen an almost unimagineable number of professional games of Go and an even more unimaginable number of games it's discovered itself. Where a human might think AlphaGo's opponent's move was "surprising" and that it should have spent more time evaluating the move, AlphaGo may not necessarily find it very surprising at all. It also may not be the case that it actually gets much as much value out of spending vastly different amounts of time on a move the way a player like Lee Sedol does.

Bonus: This is probably the most wonderful sentence in the paper: "To provide a greater challenge to AlphaGo, some programs (pale upper bars) were given four handicap stones (that is, free moves at the start of every game) against all opponents."

The innovation here was to use the policy network's gameplay to generate a huge number of games and sample only one position from each (if you instead naively try to train it on whole games you get massive overfitting thanks to the correlation between board states in the same game).

Nice - did they say how it chose which which position to sample from each game?

it's still the case that neither of the networks they're using can be easily introspected to programmatically assign more time to particular classes of positions that have been a priori determined by the designers.

Maybe so. But sounds like they didn't describe all of the relevant pieces involved (e.g. how they got from flat 5 seconds to the current AlphaGo).

Where a human might think its opponent's move was "surprising" and that it should have spent more time evaluating the move, AlphaGo may not necessarily find it very surprising at all.

I'm not sure if it makes a difference to anything, but what happened here was the reverse - the commentators apparently did not see it as a big deal when Sedol made the "brilliant attack" - and while "surprise" is obviously anthropomorphizing, AlphaGo lost, and quite a few turns went by after Sedol's attack before it started taking extra time on turns, so it clearly missed something.

They can't "look at the code and change this if statement".. alphago is a product of its own doing.. the software engineers have simply given it some input data to work with and sent it on its way to "train" itself, with some minor tweaks about what is "good" and what is "not good"..

You can't alter the AI in the way you suggested. If they were to re-train it for a few months (even with the same data, might I add) then maybe this "bug" wouldn't appear, because of the random variations in which it trained itself.

What I'm trying to say really is that the AI is essentially let loose after a point, and we've ended up with a system that can play Baduk pretty well... but there is no way that we "tweak" its behaviour like you could do in a normal program - because the engineers are not the ones that have created its "decision" engine - alphago is

My understanding is that it is not a single big neural network, but multiple ones. So, depending on what each of those boxes do, wouldn't tweakability of any given variable depend on how they relate to each other?

For example, at least some of the time when a given neural network is evaluated and updated, isn't it a person who has picked the criteria under which connections are strengthened or weakened?

Normally that might be the case but the analogy amounts to altering a cog in a well oiled machine.. even if you change what you think is the right thing, you cannot predict what will have the desired effect and what will not. I personally think it's a bad idea to go 'poking around' like this, since the whole of the AI is built upon its own training and its own investment in the way the program works

This, however, doesn't mean that its time usage is strategic as possible, especially against humans. Since AlphaGo is so consistent about its time usage, Lee Sedol could count on having at least 1 minute of thinking time on AlphaGo's turn. Once Lee Sedol was in overtime, he had only 1 minute to make a move himself, so AlphaGo taking 1-2 minute turns was at least doubling the time that Lee Sedol had to think.

So there's a question of how much benefit AlphaGo gets from taking that minute every time. To use a pretty drastic example, lets say it could make a 99% correct move in 5 seconds, but it uses the extra 55 seconds to get to 99.9% correct. Perhaps it would be better off just going with the 99% correct move, halving Lee Sedol's time to think, and therefore pressuring Lee Sedol into making a move that's much worse than just 0.9% incorrect.

I don't know if this is the sort of thing that AlphaGo could figure out from playing against itself, perhaps it is the sort of thing that it would need to play specifically against humans to understand what is optimal.

Exactly. Playing against a human as opposed to itself is hugely different. Lots of considerations outside the basic rules, like the time pressure you mention. On similar note I was interested that those strange tenuki mid game could potentially detail the train of thought of a human while being essentially 'free' for a machine that can just instantly pop back to what it was last thinking...

Well of course AlphaGo can rush very well at the end to avoid running out of time, but that doesn't mean it's making optimal use of time overall or that it would be making good moves when it does rush.

This topic has popped up a lot on these threads, and the answer to this is much, much more complicated than that. There are fundamental differences between MTG and Go/Chess etc., namely that Magic has hidden information that the computer cannot see, and that the cards have completely different strengths dependent on the other cards in your deck and the cards in your opponent's deck.

I would be very impressed if you could build an AI that was capable of building better decks than players can.

I think building an AI to actually play a deck as well as a human would be fairly easy, overall. At least with certain decks - you could set the AI up with a deck that basically just 'plays its own game'.

The hard bit with go vs chess is the crazy amount of possibilities, if you add deck building into it, hell even without deck building, imagine how many more possibilities there are with cards that all have unique(ish) effects.

Just like poker, computers playing games with incomplete information are generally not a great test of their ability. You don't know what is in your opponents hand and neither player controls the shuffle. What this means is that there is a large component of luck.

It's pretty easy for a computer to just use a lookup table for all the statistical odds based on the information at hand and even humans memorize all the main odds. AFter that it's all bluffing and statistical odds. MtG is similar even top players in the world don't have win rates above 60% against other players due to random chance in getting certain cards.

Go, chess and checkers are perfect for AI vs humans as there is perfect information available to both players and no statistical chance or random probability in outcomes.

Of course in reality, most of the interesting questions involve imperfect information.

So do we think that a computer doing as well as or better than humans at a game like MtG or poker, would be evidence of general AI? I guess there's a good chance it would at least show some social or emotional intelligence.

I mean poker bots exist, I don't know how they fare against top poker players, there's a reason you rarely see the same guys win those top poker championships twice... because so much is due to what hands you get played and chance.

Actually Lee's use of time is normal for a go game, AlphaGo is the unusual one. Usually as the game goes on you'll read out endgame patterns long before they actually happen during earlier moves. 1 minute per move is often enough to play well by the time you reach later in the game.

Still, is it so complex that is takes so long for even a computer with hundreds and hundreds of processing cores, and software specifically designed for that task? I'm not saying that I don't believe it, it's just mind boggling.

Yes. In chess you have pieces on the board that have movement rules giving you a limited set of moves. In go, you are placing a piece on the board and there are hundreds of spots that you can put it on. Then you have to consider the outcomes of each spot. So go is much much more complex than chess for a computer.

In chess, there are 20 possible opening moves - you have eight pawns that can move one or two spaces forward, and you have two knights that can move to two different spaces each. In Go, there are 361 opening moves, because it's a 19x19 board and you can place pieces everywhere.

Also, in games like Go and chess, you need to think several moves ahead. But the complexity goes up exponentially each time you think one more move ahead. With the large number of moves in Go, you have hundreds of times more calculations to do with every move ahead.

For example, the top computer in the world runs at a peak of 50 petaflops. Given two minutes to think, that means it can do 6x1018 calculations. If each calculation was the computer thinking about the placement of a single piece (in reality, it takes more than one calculation, because it has to assess the placement too etc), then that means it can think ahead about seven or eight turns.

If Moore's law applies for computer speed, computers double in speed every 18 months or so. That means that every twelve years we will have advanced our computers far enough to do think one more turn ahead, using this brute force method.

So, we can't use a brute force method any time soon. We have to be write really clever algorithms that "learn" from previous games, and try to match patterns, and to actually "think" about the game, rather than just brute forcing calculating every possible outcome.

We have a rough guess - around 10120. The number of possible games in go is around 10761. That means for every possible game in chess, there are 10641 possible games of go. For comparison: If you gave every atom in the universe it's own universe and counted all the atoms in each of those universes, you would have enough atoms to match the total possible games of chess (plus an awful lot more), but nowhere near enough to match the total possible games of go - you'd have to give each atom in each of those universes it's own universe, and each atom in each of those universes it's own universe, and so on. You'd have to do this nine times before you had more atoms than games of Go.

from /u/mr_yogurt above, it puts chess and Go in mathematical comparison and perspective. It's a shit ton.

The number of go games is 10761 and one core can process about 1010 operations per second. You can calculate yourself how many cores and/or time you need to solve it. The values are just mind-boggling. And for scale, there are about 1080 atoms in the universe.

GPUs are at their core many, many processors running in parallel. Software from all 3 major vendors exists that let's you leverage these cores to do massively parallel computing at a fraction of the cost for equivalent traditional CPUs.

GPUs are capable of doing light computations such as adding or multiplication really fast. Each neuron in neural networks needs to do such light calculations. Therefore GPUs can be quite useful in neural networks and there are various software packages such as Nvidia's CUDA making use of GPU power. Surely, Alphago or Deepmind in general makes use of GPU power for their NN architecture.

Technically computing 3D effect (for example a rotation) are just mathematical operation made with matrices and vectors (If you're not familiar with math, a Matrix is a kind of super-vector). Advanced statistics (like the one needed for "machine learning") are also using matrices.

NVidia (but ATI does it too) is even selling GPU dedicated to scientific computing (dont' dream to much they cost a lot) and provides libraries to do scientific computing on GPUs

NVidia (but ATI does it too) is even selling GPU dedicated to scientific computing

Nvidia is actually doing more than selling them. It's gracious enough to give them free of charge to university institutions and research labs involved in machine learning and GPGPU computing. We are talking about hardware with a price tag of around 1k dollars.

Lets be clear, Nvidia actually gets a lot of incredibly brilliant people giving them feedback, using them and developing new solutions on the GPUs.

So GPUs aren't necessarily specific to graphical computing, they're just really good at linear algebra? I've done a bit of games programming, and I guess the libraries I've used abstracted me from the low level stuff. I never really thought about why GPUs are good at what they do. Also, your description of matrices made me chuckle

GPU's are frequently used in data sciences as dedicated processing units for certain functions. In the end cpu's and GPU's are both doing math, you can use GPU's in different ways other then visual processing

GPUs are just complex parallel processors with less versatility but more raw strength than CPUs. You can leverage them for some pretty powerful computing.

Sidefact: gaming cpus are a bit different than high performance computing cpus; gaming requires more raw execution speed and scientific calculations leverage more accurate calculations. We don't care if a pixel every couple thousand is a bit grey, but that matters for scientists.

Go is a board game that is hard to brute force analyse because has a lot of possibilities. Lee Sedol is one of the world's best Go players. AlphaGo is a computer program created by Google Deepmind that plays Go. Until now, computers were not as good at Go as the best humans, but AlphaGo challenged Lee Sedol to a best-of-five match and won the first three matches. AlphaGo just lost the fourth match to the human.

Go is played with a clock, which is set to a pre-agreed time allocation for each player, usually the same for both. If you run out of time, you lose. After each move, a Go player stops their own clock and starts their opponents. This lets them allocate their thinking time as they see fit, within the parameters. Here they had two hours to spend however they liked, and when that ran out they got 1 minute per move with two chances to take a bonus minute.

This graph shows the time taken at each move ("turn index 1" = first move, etc) by both AlphaGo and Lee Sedol during their fourth match.

Going across is the moves in a game. For each turn going from left to right, the blue dot is how many minutes Lee Sedol, the human player, took before making a play. And the beige dot is how long AlphaGo, the computer, took.

Mr. Sedol spent a lot of time thinking in the turns just before his "brilliant attack" (typo on the chart). A few turns later the computer makes a mistake that falls into the attack, and after a few more turns it is taking a longer-than-usual time each turn as it tries to figure out how to escape the trouble it is in.

Sedol had actually run out of time at that point, but the rules allow him a small amount of time each turn. He used almost all of it, and managed to win. (AlphaGo has already won the match by winning the first three games, so it's fun & interesting to see the human win one.)

Just to add on to this. Chess has an incredibly large amount of potential outcomes. (Not nearly as many as go) but because of the rules for the way the different pieces can move it is much easier to do raw calculations for potential moves. This is impossible to do in go because the simplicity of the rules actually makes the possibilities more complex.

Also in chess its easier to evaluate a position to figure out who is winning and by how much (taking into account material advantage and positional advantages). From what I understand about Go, its actually incredibly difficult to tell who is winning and by how much because of how abstract it is. Because of this, the computer doesn't evaluate and maximise 'how much am I winning or losing by' but rather 'what is my probability of winning".

All true. Chess also has a tiny tiny tiny tiny tiny fraction of the potential outcomes of Go. The gap is such a large number it is, by itself expressed as a multiple of chess, also more than the number of atoms in billions and billions and billions of universes. The number is best expressed in exponents of exponents of exponents. Just FYI.

We have a rough guess - around 10120. The number of possible games in Go is around 10761. That means for every possible game in chess, there are 10641 possible games of Go. For comparison: If you gave every atom in the universe it's own universe and counted all the atoms in each of those universes, you would have enough atoms to match the total possible games of chess (plus an awful lot more), but nowhere near enough to match the total possible games of Go - you'd have to give each atom in each of those universes it's own universe, and each atom in each of those universes it's own universe, and so on. You'd have to do this nine times before you had more atoms than games of Go.

Right, but think of the number of potential moves starting from an initial configuration. Now think of the number of unique configurations you can reach with two moves, and so on. This exponentially blows up with the more and more moves you add, so you can see why this would take a lot of time. Moreover, Go is an example of one of the many PSPACE-Complete problems, which in the theory of computational complexity effectively means it's extremely unlikely an efficient method to solve the problem exactly exists. Now, systems like AlphaGo etc rely heavily on AI methods and heuristics which do not exactly "solve" the problem, but approximate an optimal strategy. That aspect of approximating an exact solution makes these systems extremely complex and is why this is such a significant AI achievement.

Many efficient methods exist and will be discovered, all heuristic. A perfect programmatic solution, however, possibly does not. It would take a scalable parallel computer to do so - none exists nor is it necessarily physically possible. Although the universe itself is a finite example of such a computer.

Each time you look one move deeper, you multiply the number of leaves in your search tree by the average number of responses... It grows exponentially. It's (relatively) easy to look a few ply deep, but the number of leaves in your search tree spirals off towards infinity and you can't look deeper very well.

Chess has a branching factor of 35-40. So say 40 possible moves, 40 leaves. 40 responses, 1,600 leaves. 40 responses to that, 64,000 leaves. 40 responses to that, 2.5 million leaves. 40 responses to those, 100 million leaves... It grows fast. And a typical game has ~80 ply start to finish, and can easily go over 200.

Go has a branching factor wayyy higher, like 300 (so leaves are more like 300, 90 thousand, 27 million, 8.1 billion, etc.) . That means you can only search to a very low depth in an exhaustive search.

For some games it is easy or at least possible to actually solve them.

For example tic-tac-toe has a very low number of possible states that the game can be at and you could relatively easily make a list of all possible states and write down the best move for each of those states. At that point the game is solved and playing the game with the 'solution' database available it becomes just a simple matter of looking up the correct move at each point. A player who has the database can never lose.

With tic-tac-toe you have nine fields and three potential states: empty, Xs and Os. given you in theory 39 states but in practice after removing illegal ones and ones that are just mirrored or rotated versions of other you end up with 765 potential states.

That is not very much. You can write that list down and carry it around with you.

With Nine Man's Morris, even after you removed mirror images, rotated and other version that are topological equivalent versions you are still left with something like 1010 different states. That is a whole lot but as long you describe each state with less than 100 characters it will still fit on a Terabyte HDD that you can carry around with you. Nine Man's Morrris has been solved and a computer who has access to the solution will never lose a game of it.

Chess is far more complex with a number of states somewhere around 1040. We have no computer who can hold a database that big and no way to create one if we could. If we wanted to created one we would have to use up a small moon or asteroid just to write each state down in some sort of memory. Currently our smallest memory takes up about 100 atoms per byte, which would mean, that depending on compression and overhead we would need to turn something like one of the moons of Mars into computer storage somehow (best use both of them at once to create a RAID 1 array).

Go has an estimate number of 10170 legal states for a board with 19x19 fields. The entire Universe is thought to have about 1080 atoms.

If you had an entire universe for each atom in the universe and for each atom in each of those universes you had a terabyte of memory, than maybe you could write down all possible states a go board can have.

Solving games this complex would be somewhat impracticable for any civilization as far down the Kardashev Scale as humanity currently is.