In March of 2016, the computer program AlphaGo defeated Lee Sedol, one of the top 10 Go players in the world, in a five game match. Never before had a Go computer program beaten a professional Go player on the full size board. In January of 2017, AlphaGo won 60 consecutive online Go games against many of the best Go players in the world using the online pseudonym Master. During these games, AlphaGo (Master) played many non-traditional moves—moves that most professional Go players would have considered bad before AlphaGo appeared. These moves are changing the Go community as professional Go players adopt them into their play.

Michael Redmond, one of the highest ranked Go players in the world outside of Asia, reviews most of these games on You Tube. I have played Go maybe 10 times in my life, but for some reason, I enjoy watching these videos and seeing how AlphGo is changing the way Go is played. Here are some links to the videos by Redmond.

Two Randomly Selected Games from the series of 60 AlphaGo games played in January 2017

The algorithms used by AlphaGo (Deep Learning, Monte Carlo Tree Search, and convolutional neural nets) are similar to the algorithms that I used at Penn State for autonomous vehicle path planning in a dynamic environment. These algorithms are not specific to Go. Deep Learning and Monte Carlo Tree Search can be used in any game. Google Deep Mind has had a lot of success applying these algorithms to Atari video games where the computer learns strategy through self play. Very similar algorithms created AlphaGo from self play and analysis of professional and amateur Go games.

I often wonder what we can learn about other board games from computers. We will learn more about Go from AlphaGo in two weeks. From May 23rd to 27th, AlphaGo will play against several top Go professionals at the “Future of Go Summit” conference.

Dr Xu at Penn State introduced me to Richardson Extrapolation way back in the early 90’s. We used it to increase the speed of convergence of finite element analysis algorithms for partial differential equations, but it can be used for many numerical methods.

Mnih, Kavukcuoglu, Silver, Graves, Antonoglon, Wierstra, and Riedmiller authored the paper “Playing Atari with Deep Reinforcement Learning” which describes and an Atari game playing program created by the company Deep Mind (recently acquired by Google). The AI did not just learn how to pay one game. It learned to play seven Atari games without game specific direction from the programmers. (The same learning parameters, neural network topologies, and algorithms were used for every game).

Various papers have been written on how computers can learn to pay the Atari games, but most of them used the abstract representations of objects on the screen within the emulator. The Mnih et al AI learned to play the games using only the raw 210 x 160 video and the score. It seems to be the first successful attempt to learn arcade gaming from raw video.

To learn from raw video, they first converted the video to grayscale and then downsampled/cropped to 84 x 84 images. The last four frames were used to determine actions. The 28224 input pixels were run through two hidden convolution neural net layers and one fully connected (no convolution) 256 node hidden layer with a single output for each possible action. Training was done with stochastic gradient decent using random samples drawn from a historical database of previous games played by the AI to improve convergence (This technique known as “experience replay” is described in “Reinforcement learning for robots using neural nets” Long-Ji Lin 1993.)

The objective function for supervised learning is usually a loss function representing the difference between the predicted label and the actual label. For these games the correct action is unknown, so reinforcement learning is used instead of supervised learning. The authors used a variant of Q-learning to train the weights in their neural network. They describe their algorithm in detail and compare it to several historical reinforcement algorithms, so this section of the paper can be used as a brief introduction to reinforcement learning.

The AI was trained to play seven games: Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, and Space Invaders. In six of the seven games, this general game learning algorithm outperformed all previously known reinforcement learning algorithms tested on those games and “surpasses a human expert on three” of the seven games.

The KDD 2014 article, written by Dong, Gabrilovich, Heitz, Horn, Lao, Murphy, Strohmann, Sun, and Zang, describes in detail Google’s knowledge database Knowledge Vault. Knowledge Vault is a probability triple store database. Each entry in the database is of the form subject-predicate-object-probability restricted to about 4500 predicates such as “born in”, “married to”, “held at”, or “authored by”. The database was built by combining the knowledge base Freebase with the Wikipedia and approximately one billion web pages. Knowledge Vault appears to be the largest knowledge base ever created. Dong et al. compared Knowledge Vault to NELL, YAGO2, Freebase, and the related project Knowledge Graph. (Knowledge Vault is probabilistic and contains many facts with less than 50% certainty. Knowledge Graph consists of high confidence knowledge.)

The information from the Wikipedia and the Web was extracted using standard natural language processing (NLP) tools including: “named entity recognition, part of speech tagging, dependency parsing, co-reference resolution (within each document), and entity linkage (which maps mentions of proper nouns and their co-references to the corresponding entities in the KB).” The text in these sources is mined using “distance supervision” (see Mintz, Bills, Snow, and Jurafksy “Distant Supervision for relation extraction without labeled data” 2009). Probabilities for each triple store are calculated using logistic regression (via MapReduce). Further information is extracted from internet tables (over 570 million tables) using the techniques in “Recovering semantics of tables on the web” by Ventis, Halevy, Madhavan, Pasca, Shen, Wu, Miao, and Wi 2012.

The facts extracted using the various extraction techniques are fused with logistic regression and boosted decision stumps (see “How boosting the margin can also boost classifier complexity” by Reyzin and Schapire 2006). Implications of the extracted knowledge are created using two techniques: the path ranking algorithm and a modified tensor decomposition.

Tensor decomposition is just a generalization of singular value decomposition, a well-known machine learning technique. The authors used a “more powerful” modified version of tensor decomposition to derive additional facts. (See “Reasoning with Neural Tensor Networks for Knowledge Base Completion” by Socher, Chen, Manning, and Ng 2013.)

The article is very detailed and provides extensive references to knowledge base construction techniques. It, along with the references, can serve as a great introduction to modern knowledge engineering.

Scott Young recently returned from a year abroad and wrote Up “Seven Principles of Learning Better From Cognitive Science” which is a review/summary of the book “Why Don’t Students Like School?” by Daniel Willingham. Here are the 7 Principles:

Factual knowledge precedes skill.

Memory is the residue of thought.

We understand new things in the context of what we already know.

Proficiency requires practice.

Cognition is fundamentally different early and late in training.

People are more alike than different in how we learn.

Intelligence can be changed through sustained hard work.

Read the full 5 page article by clicking here! Get the great $10 book here.