In order for this site to work properly, it uses cookies and javascript.
You can find more information here.

In order for this site to work properly, it uses cookies and javascript.
This site also tracks visits anonymously using cookies. Click 'agree' to confirm you are happy with that.
You can find more information in this site's policy.

Using Reinforcement Learning To Learn To Play Tic-Tac-ToeAbout a year ago I set myself the goal of writing an algorithm that could learn to play tic-tac-toe. I didn't want to tell the algorithm what the rules of the game are, nor did I want it to try and use some kind of calculation to look ahead at possible ...

The best opening move in a game of tic-tac-toeAs part of a machine learning project, I had to understand tic-tac-toe better, and so I have written an algorithm which a) finds all the possible unique games and b) gathers statistical information about those games. Based on Wikipedia's tic-tac-toe ...

Inspired by a recent newsletter from Heinz Kabutz as well as
Scala's Futures which I investigated in
my recent book,
I set about using Java 8 to write an example of how to submit work to an execution service and respond to its results asynchronously,
using callbacks so that there is no need to block any threads waiting for the results from the execution service.

Theory says that calling blocking methods like get
on a java.util.concurrent.Future is bad, because the system will need more than the optimum number of threads if it is to continuously do work,
and that results in wasting time with context switches.

In the Scala world, frameworks like Akka use programming models that mean that the frameworks will never block - the only time a
thread blocks is when a user programs something which blocks, and they are discouraged from doing that. By never blocking, the framework can get away with
using about one thread per core which is many less than say a standard JBoss Java EE Application Server, that has as many as 400 threads just after startup.
Due largely to the work of the Akka framework, Scala 2.10 added Futures and Promises,
but these don't (yet?) exist in Java.

The following code shows the goal I had in mind. It has three parts to it. Firstly, new tasks are added to the execution service using the
static future method found in the class ch.maxant.async.Future. It returns a Future, but not one from the
java.util.concurrent package, rather a subclass thereof from the ch.maxant.async package. Secondly, that Future has a method called
map, following the functional style from Scala or the new Java 8 Stream class.
The map method lets you register a callback, or more precisely, let's you map (convert) the value that the first future contains into a new value.
The mapping is carried out at some other time in the future, after the first Future is completed and so it results in a new Future.
Thirdly, we use another method in the Future
class to register a callback to be run once all the futures we create are complete. At no time are any blocking methods of the Future API used!

Line 11 calls the future method to register a new Task, which is constructed using a Work instance,
constructed here using a Java 8 lambda. The work sleeps for a little time and then either returns the number 20, as a string, or throws
an exception, just to demonstrate how errors are handled.

Using the Future that line 11 gets back from the execution service, line 25 maps it's value from a string into an integer, resulting in
a Future<Integer> rather than a Future<String>. That result is added to a list of Futures on line 35, which part 3
uses on line 40. The registerCallback method will ensure that the given callback is called after the last future is completed.

The mapping on lines 25-33 is done using a lambda which is passed a Try object. A Try is a little like a Java 8
Optional and is an abstraction (super class) of the Success
and Failure classes, which I implemented based on my knowledge of their Scala counterparts. It allows programmers to handle failure more easily
than having to explicitly check for errors. My implementation of the Try interface is as follows:

public interface Try<T> {
/** returns the value, or throws an exception if its a failure. */
T get() throws Exception;
/** converts the value using the given function, resulting in a new Try */
<S> Try<S> map(Function1<T, S> func);
/** can be used to handle recovery by converting the exception into a {@link Try} */
Try<T> recover(Recovery<T> r);
}

What happens is that the implementation of the Success and Failure handle errors gracefully. For example, if the Future
on line 11 of the first listing is completed with an exception, then the lambda on line 25 of the first listing is passed a Failure object,
and calling the map method on a Failure does absolutely nothing. No exception is raised, nothing. To compensate, you can call the
recover method, for example on line 29 of the first listing, which allows you to handle the exception and return a value with which your program
can continue, for example a default value.

The Success class on the other hand implements the map and recover methods of the Try interface differently,
such that calling map leads to the given function being called, but calling recover does absolutely nothing. Instead of
explicitly coding a try/catch block, the map and recover methods allow for a nicer syntax, one which is more easily validated when
reading or reviewing code (which happens more often to code, than writing it).

Since the map and recover methods wrap the results of the functions in Trys, you can chain the calls together, such as
lines 26, 29 and 32. The Try API from Scala has many more methods than the three that I have implemented here. Note that I chose not to use a
java.util.function.Function
in my Try API because it's apply method doesn't throw Exception which meant
that the code shown in the first listing wasn't as nice as it now is. Instead I wrote the Function1 interface.

Part 3 of the puzzle is how to get the program to do something useful after all the Futures are complete, without nasty blocking calls like those
to the Future#get() method. The solution is to register a callback as shown on line 40. That callback is, like all the others shown here, submitted
to the execution service. That means we have no guarantee which thread will run it, and that has a side effect, namely that thread local storage (TLS) no longer
works - some frameworks like (older versions of?) Hibernate relied on TLS, and they just won't work here. Scala has a nice way of solving that problem using the
implicit keyword, which Java doesn't have (yet...?), so some other mechanism needs to be used. I'm mentioning it, just so that you are aware of it.

So, when the last future completes, lines 40-60 are called, and passed a List of Trys containing Integers, rather than
Futures. The registerCallback method converts the futures into appropriate Successes or Failures.
But how can we convert those into something useful? With a simple map/reduce
of course, and luckily, Java 8 now supports that with the
Stream class, which is instantated from the collection of Trys
on line 42 by calling the stream() method. First I map (convert) the Trys into their values, and then I reduce the stream to a single value on line 49. Instead of passing my
own implementation of a lambda that sums values, I could have used Integer::sum, for example someStream.reduce(0, Integer::sum).

As you can see, the main thread adds all the tasks and registers all the mapping functions (lines 1-20). It then registers the callback (line 21 of the output which
corresponds to line 39 of the listing), and finally outputs the text from line 63 in the listing, after which it dies, because it has nothing else to do.
Line 22 and lines 24-42 of the output then show the various threads in the pool (which contained 5 threads)
processing the work as well as mapping from String to Integer, or recovering from an exception. This is the code in parts 1 and 2 of the first
listing. You can see that it is entirely asynchronous, with some mappings / recoveries occuring before all the initial work is complete (compare lines 38 or 40
which are a mapping and recovery respectively, to line 41 of the output, which occurs afterwards and is the last of the initial work). Lines 43-52 are the output
of the map/reduce which is part 3 of the main listing. Note that no reduce is logged, because the code I ran, and which is on Github, uses the Integer::sum
shortcut mentioned above, rather than lines 50-51 of the first listing shown above.

While all of this is possible using Java 6 (or even 5?), for example by getting the tasks which are submit to the pool to submit the callback themselves, once
they are finished, the amount of code needed to do that is larger and the code itself would be uglier than that shown here. Java 8 lambdas, Futures
which can be mapped using callbacks and the Try API with its neat error handling all make the solution shown here arguably more maintainable.

The code shown above, as well as the code for the classes in the ch.maxant.async package,
are available under the Apache License Version 2.0, and can be downloaded from
my Github account.