Maybe it’ll be part of JDK 9, maybe it won’t… but people are working hard on creating a REPL tool/environment for the Java Development Kit (JDK). More information on the code and the project is available as part of OpenJDK: project Kulla.

Some of you might not be familiar with the terminology ‘REPL’, it is short for Read Evaluate Print Loop. It is hard to explain exactly what it does, but very easy to demonstrate:

The idea is that you have a place you can enter Java and run it, without main method, class structure. You can build instances, and alter them while you type. For example you’ll able to do Swing development while you type:

And there we go, a final simple lambda creates a working “Hello World!”-button.

It is also possible to load a repl script from file, allowing you to share, store and run scripts. This is done using the ‘/load’ and ‘/save’ commands. You can even ‘/dump’ all the created classes in a directory.

I’m very curious how people will be using the REPL in the future, some use cases:

If you want to try out Kulla, it took me literally 20 minutes to get up and running on my MacBook. Just follow the instructions on AdoptOpenJDK, but instead use http://hg.openjdk.java.net/kulla/dev/ as codebase. After building the JDK, go to ./langtools/repl and look at the README.

We all do testing

In this day and age you aren’t considered a real Java developer if you are not writing proper unit tests.
We all know why this is important:

Instant verification our code works.

Automatic future regressions tests.

But how do we know we are writing proper tests? Well most people use code coverage to measure this. If the percentage of coverage is high enough you are doing a good job.

What is a test?

First let’s look at what a test actually is:

Instantiate classes, setup mocks.

Invoke something.

Assert and verify the outcome.

Which steps are measured with code coverage? Only steps 1 and 2. And what is the most important thing for a test? It is the third and final step, the assertion, the place where you actually check if the code is working. This is completely ignored by code coverage!

I’ve seen companies where management looks at code coverage reports, they demand the programmers to write 80+ or 90+% coverage because this proves the quality is good enough. What else is a common thing in these organisations? Tests without any real assertion. Tests written purely to boost coverage and please management.

So code coverage says absolutely nothing about the quality of our tests? Well, it does tell you one thing. If you have 0% coverage, you have no tests at all, if you have 100% coverage you might have some very bad tests.

Mutation testing

Luckily there is help around the corner, in the form of mutation testing. In mutation testing you create thousands of ‘mutants’ of your codebase. So what is this mutant you might ask? A mutation is a tiny singular change in your codebase.

For each mutant the unit tests are run and there are a couple of possible outcomes:

Killed

If you are lucky a test will fail. This means we have ‘killed’ our mutant. This is a positive thing, we’ve actually checked that the mutated line of code is correctly asserted by a test. Now we immediately see the advantage of using mutation testing, we actually verify the assertions in our tests.

Survived

Another possible outcome is that our mutant has survived, meaning no test fails. This is scary, it means the logic we’ve changed isn’t verified by a test. If someone would (accidentally) make this change in your codebase, the automatic build won’t break.

Tooling

In Java (and other languages as well) there are frameworks for doing mutation testing. One of the most complete and modern frameworks for doing mutation testing in Java is called PIT. The generation of mutants and the process running the tests is fully automated and easy to use, just as easy as code coverage. There are Maven, Ant, Gradle, Sonarqube, Eclipse and IntelliJ plugins available!

What about performance?

Using mutation testing isn’t a silver bullet and it doesn’t come without any drawbacks. The major disadvantage is performance. This is the reason it never took off in the 1980s. At that time it would take an entire evening to run all your unit tests, so people could only dream of creating thousands of mutants and running the tests again. Luckily CPU’s have become a lot faster, and PIT has other tricks to speed up the process.

One thing PIT does is that it uses code coverage! Not as a measurement of the quality of your tests but as a tool. Before creating the mutants PIT calculates the code coverage of all unit tests. Now when PIT creates a mutant of a particular line of code it looks at the tests covering that line. If a line is only covered by three unit tests, it only runs those three tests. This greatly decreases the amount of tests needed to run for each mutation.

There are other tricks as well, for example PIT can track the changes in your codebase. It doesn’t need to create mutants if the code isn’t edited.

Summary

Code coverage is a horrible way of measuring the quality of your tests. It only says something about the invocations but nothing about the actual assertions. Mutation testing is much better, it gives an accurate report on the quality and you can’t ‘game’ the statistics. The only way to fake mutation coverage is to write real tests with good assertions.

Install TASM:
Copy all the contents of the directories DISK1/DISK2/DISK3 from tasm5.zip to \TEMP

Step 5:

Install Borland C++ 3:
Copy all the contents to \BORLANDC.

Step 6:

Copy all the source files from Commander Keen to \KEEN.

Step 7:

Fire up DOSBox, mount the mount folder to C.
Put the following paths to PATH:
SET PATH=C:\BORLANDC\BIN;C:\TASM\BIN;C:\;Z:\

Go into C:\TEMP and run the installer to install TASM.

Go into directory C:\KEEN\STATIC and run ‘make.bat’
In the directory C:\KEEN and run ‘BC’ to start Borland C++.

To change the Borland directories to the correct path go to: Options > Directories and change the paths to C:\BORLANDC\*.

Step 8:

Compile and run! It creates the binary for me KDREAMS.EXE.

But sadly, when I run the executable it says “Can’t open KDREAMS.MAP!” :-(
It turns out you’ll need to own the game’s actual content before you can run this source code.
Thankfully a shareware version can be downloaded here: keendm.zip. This corresponds with the 1.01S version of the released source code, which is also released here.

Copy the missing files (SHL, MAP, AUD, etc) from the shareware version and play your own compiled Commander Keen!

Browsing the Ludum Dare (see previous post) website I found this post from a friend Will. He made the following mosaic:

Pretty interesting to see, I’ve already worked with him on improving the algorithm to generate these mosaics in the past. But next he set me a challenge: Find your own game thumbnail, it is in there somewhere!

This is the screenshot of my game, used as thumbnail:

So I went though the thumbnails, one time.. and a second time… then I decided to solve it like a real programmer:

How does it work? Well it is pretty simple:

Input #1: mosaic.jpg

Input #2: Amount of thumbnails width and height

Input #3: screenshot.png

The program resizes my game screenshot to the thumbnail size.

Next it loops over all sub-images of the mosaic.

For every sub-thumbnail: Calculate the error (+=Math.abs(mosaicPixelValue - screenshotPixelValue) for each color, for each pixel)

Store the location of the thumbnail with the smallest error!

That is it, solved it in 10 minutes of coding!
(and another 10 minutes to make the visual conformation and make the animated gif).

Last weekend the 30th Ludum Dare competition took place. For those us you unknown with Ludum Dare, this is a very short international game programming contest. You are allowed to use any tool or language but there are strict rules:

The theme is revealed at the start (and the game must match this theme).

You get 48 hours, nothing more or less.

Every image, sprite, song and/or sound effect in the game should be made within these 48 hour.

The result is open source (but you pick the license).

You work alone.

(There is also a ‘Jam’ version where you can work in teams, can do closed source, can use images/sounds and you have 72 hours)

Connected Worlds

The theme this year was ‘Connected Worlds’. This is a pretty broad term, so I started to think. How about a world where the main character is on one planet, and his love is on another planet. The planets are tantalisingly close (nearly touching) but out of reach. Our hero has to build a rocket to reach his love.

Game style

Last Ludum Dare (LD) I’ve worked on my Javascript skills and produced a little framework. This allows me to easily implement an old-school ‘point and click’-type adventure game. The resulting game should look and feel like the old Monkey Island, Day of the Tentacle games.

Drawing drawing drawing…

The decision to make a point-n-click game has a huge impact on how I get to spend my time during the contest. This type of game needs a lot of images, sprites and of course fun puzzles. In the end I think I’ve been busy drawing 90% of the time (on paper and in GIMP) and maybe 10% actual programming. After 48 hours my hands were cramped up from all the drawing instead of typing heh (I need a digital draw-tablet).

The result

First off, I’ve learned a lot about game design and Javascript programming, also I’ve learned how to draw and edit images much faster. At the start the process from getting an image from paper to coloured digital sprite took a long time, at the end of the contest it was all automatic.

Anyway, I’m pretty pleased with the result, please play it here.
Also, voting is still open, if you have a Ludum Dare account (or sign up) you can cast a vote here.

In the future somebody will inevitably invent a teleport, no doubt about that.

But how will it work?

Digital teleportation

The most likely way to teleport would be to digitalise yourself. Some yet undiscovered very high resolution MRI/CT scanner will scan every atom in your body and send this over to the receiver. This atom printer will build up your entire body again.

However, during a work lunch discussion, I came up with some scary fundamental problems with teleportation.

Transactions

What would happen if, during transmission, we get a failure? We don’t want to end up with a failed teleport… which would mean the person getting teleported is dead.

That is why we need to have some kind of two-phase commit. First we digitalise the person, we send this over the line, build the person up on the other side. Once this process has been completed, we ‘delete’ the original copy. Because we don’t want to end up with thousands of clones.

Wait… what? Delete?

What would it feel like stepping into the teleporter? First a copy of you is made, this copy eventually walks out of the receiving end. But what happens to you? You’ll step into a machine, which makes a copy, and disposes you! You’ll be exterminated, killed, pushing up the daisies, your metabolic processes will be history, kicking the bucket, you’ll be an ex-parrot.

You…

Lets not dwell too long on the loss of the old you. Of course YOU are also the one walking out of the teleporter, where nothing has happened but a successful teleport.

But is that really… you?
What defines you?
Are you just a selection of atoms clumped together?

If we make an exact copy, is that still you?

Did you know that (according to some research) every year almost 98% of the atoms currently in your body will be replaced? That would mean that a year from now, you will just be 2%… you!

A year ago Devoxx crossed an ocean for the first time. After all the events in Belgium (Antwerp) and France (Paris) a new satellite event was launched in the UK (London). And here I am again, in a London hotel room, after two days of Devoxx UK.

Day 1: Getting started

With a silly one hour jetlag (which shouldn’t be a thing, but is…) I was awake and at the venue very early. Slowly but surely people started flooding in on the exhibition floor. It quickly became clear there were more visitors than last year. On the exhibition floor a lot of smaller local startups were present. But surprisingly I haven’t had any sales pitch at all, this is something other conferences can learn from. Only genuinely interesting people talking about content.

There was also a corner where Devoxx UK had invited cool upcoming hardware like the NFC ring and Crazyflie tiny quadcopter. These projects quickly generated a crowd of people shouting: “Shut up and take my money!” The problem was, they didn’t have anything for sale… just an “you can order online”. This is a bit of a shame, I’m pretty sure they are now missing at least a dozen of impulse purchases. They should do something about this next year.

Day 1: The keynote

The awesome thing about Devoxx is… the lack of sponsored talks and keynotes. At least, that is how it seems to be. I really haven’t seen any session that even had a hint of sponsor.

This year the keynote speaker was Dan North and he talked about some of his personal experiences. He talked about moments where colleagues did or said things that really affected him and his career. Things you say and do at work might have a lot more influence than you can imagine. The best part of his story (in my opinion) was how a colleague tricked him into pair programming. If you keep asking “hey buddy, can you help me with this?” eventually your ‘buddy’ is just sitting next to you all the time.

During the keynote two amazing artists from Smartup Visuals created a drawing of Dan, later they hand customised conference t-shirts for the visitors.

Day 1: The sessions

As ‘Devoxx videographer’ I can’t always just pick a session and settle down. Most of the time I walk around and only settle down where there is a good opportunity. The first session I attended was by Venkat Subramaniam who talked about the new Lambda expressions in Java. He is a great clear speaker, to the point.

After filming some more I settled in room 2 with Dick Wall for the second session. His talk is named: “What have the monads ever done for us?”… and as you can guess it is about lambdas as well. His talk was a bit more theoretical, naming all the different theoretical objects and patterns (like monoids, monads and functors). Good talk, great speaker, best lambda related talk I’ve seen yet.

The third sessions I had the chance to see was by James McGivern. He’s a programmer with a math background and he has very similar interests as I do. His talk was about ECC (Elliptic curve cryptography) versus RSA, explaining how these security algorithms and the math involved work. I absolutely loved his talk, everything was explained so clearly it reminded me of Numberphile and Computerphile (two related YouTube channels I adore).

Instead of following more tracks I got distracted filming the Crazyflie quadcopter and shooting them from the sky using an automatic NERF gun which can only be fired wearing the NFC ring. I also had a nice discussion about the future of affordable hardware and 3d printing with Dick Wall. The thing we did that evening was to attend the IBM Sensor hacking challenge.

In this hands on session we had to form teams of 4-6 people, each team got an Arduino and a set of 30+ sensors which can be attached. The challenge was to build the coolest piece of hardware with this. There was only one rule though: We had to use WebSphere to communicate with the Arduino. This made absolutely no sense at all to me… we ended up having to write JSP pages that send signals to the Arduino to read/write from the sensors. Even worse was the deploy cycle (should not be needed, but was): stop websphere, kill hanging processes, reboot Eclipse (!), start websphere again, deploy new pages.

The winning team was the Crazyflie crew. They took their quadcopter, fired up the Arduino IDE (boo!) and made the firmware in the Arduino language. In the end they could fly the quadcopter with a joystick attached to the Arduino. Clearly this wasn’t according to the only rule we had, but it was just too cool and had to win.

Day 2: Sessions

The second day of Devoxx UK started with a session by two fellow Dutchies, Regina and Linda. They talked about a pattern they found to change things in projects. They all did this in a “Punch and Judy show”-style, which didn’t quite work in my opinion, it was very rusty and read out. Although the underlying message itself is simple and sound, they started calling everything a pattern. It is a good thing to organise brown bag lunch sessions, sure, but please don’t call this a “brown bag pattern”.

The second session of the day was “Is Your Code Parallel-Ready?” by Maurice Naftalin. His session was also about the new lambdas in Java 8. It had a nice buildup and sketched a good problem, the only thing missing was the code of the solution. The solution was described/hinted but I’d love to have seen the actual code there for clarity. The speed of this session was very slow and I didn’t learn anything new, which is a bit of a waste of time.

Next I went to a panel of: Martijn Verburg, Ben Snipe, Stephen Colebourne & Ted Neward. They talked about the patent wars going on now between Google and Oracle. The final conclusion by Stephen was probably the best: Oracle and Google are dumb, dumber and dumbest; neither can stop now and they’ll likely settle.

The title “Modern Web Architecture, 2014 Edition”, by Ted Neward, is a bit misleading. Because 80% of the talk is a big history lesson of the origin of ‘the web’. The final 20% was more about common sense. Don’t get vendor locked in, think about creating a platform with multiple possible entry points, not just a website. Not really what I expected to hear, interesting nonetheless.

The final session was by Arun Gupta and Antonio Goncalves, they quickly went through 50 new features in Java EE 7. For some reason I’m not fond of the way Java EE is going. All the logic is being put in annotations. I predict a new term ‘annotation hell’ which is going to replace the ‘xml hell’ we had a couple of years ago. I’ve been warning about this since 2008, and it is getting worse and worse.

Final keynote

In the final keynote Martijn Verburg summarized the things he learned this Devoxx UK, some of the trends he noticed. There were a lot of lambda talks, maybe a bit too much. A couple of years ago there would have been a lot of alternative language talks (JRuby/Groovy/Scala/etc) but there weren’t a lot of those talks anymore.

Then Dick Wall hit the stage, and actually continued from what we’d been talking about on the first day: cheaper electronics. The Arduino is cool and kind of cheap, so is the Raspberry Pi… but for a real Internet of Things we need devices much smaller and much cheaper. It doesn’t have to do graphics for example! It can probably be a fraction of the cost. Oh and did you know Dick Wall’s dog has a Fitbit? (true story).

Finally Arun Gupta and Audrey Neveu talked about Devoxx4Kids, which is still gaining a lot of popularity all over the world. But we can always use more volunteers and new events!

Also: Links in the article below seem to be broken and it might not work anymore.

Original (outdated) article:

Today a colleague and I wanted to install gnuradio on a Raspberry Pi. This can than be combined with the amazing RTL-SDR dongle. This dongle this is a DVB-T USB stick, but can be turned into full software defined radio.

If you want to create an object in Java there is only one way: use a constructor.

Constructors

Constructors in Java are different from all other methods. They don’t have a return type and can’t be invoked, only using the new keyword. Other tasks performed by constructors are:

Initialisation of class variables

Call the default contructor of the superclass if no constructor is present

Initialisation of instance variables

Execution of constructor body

Here is a common example of a constructor in Java:

publicclassPerson{//Two mandatory immutable fieldsprivatefinalStringfirstname;privatefinalStringlastname;//Constructor to set the final fields:publicPerson(Stringfirstname,Stringlastname){this.firstname=firstname;this.lastname=lastname;}...

If you want to create an immutable object in Java you need to have a constructor to initialise the final fields. Lets look at another example:

Personperson=newPerson("Roy","van Rijn",1983,200,95.0,12.5);

Immediately a problem becomes clear… if you read the code you have absolutely no idea what all the arguments mean. What is 200? What is 95.0? We have to look at the API or open the code of Person to see what arguments we can supply or have been supplied. Thankfully there is a design pattern that solves this problem, the Builder pattern.

Builder pattern

The builder pattern is an ‘object creation software design pattern’. It can be used to create a immutable objects in a fluent, readable way.

Personme=newPerson.PersonBuilder("Roy","van Rijn").birthYear(1983).length(200).weight(95.0)// This is what all these numbers mean!.shoeSize(12.5).build();

Reading this code is much clearer, we don’t have to guess what the arguments mean, it is crystal clear. So we’ve got a good solution, right? Well, not really, I’ve got a big problem with the Builder pattern. For a simple immutable object as the Person above with just six fields we need the following code:

publicclassPerson{// The immutable fieldsprivatefinalStringfirstname;privatefinalStringlastname;privatefinalintbirthYear;privatefinalintlength;privatefinaldoubleweight;privatefinaldoubleshoeSize;// The private hidden constructor with all fields:privatePerson(Stringfirstname,Stringlastname,intbirthYear,intlength,doubleweight,doubleshoeSize){this.firstname=firstname;this.lastname=lastname;this.birthYear=birthYear;this.length=length;this.weight=weight;this.shoeSize=shoeSize;}// A second Builder class with all the fields listed again:publicstaticclassPersonBuilder{privateStringfirstname;privateStringlastname;privateintbirthYear;privateintlength;privatedoubleweight;privatedoubleshoeSize;publicPersonBuilder(Stringfirstname,Stringlastname){this.firstname=firstname;this.lastname=lastname;}// All 'setters' with a fluent interface:publicPersonBuilderbirthYear(intbirthYear){this.birthYear=birthYear;returnthis;}publicPersonBuilderlength(intlength){this.length=length;returnthis;}publicPersonBuilderweight(doubled){this.weight=d;returnthis;}publicPersonBuildershoeSize(doubleshoeSize){this.shoeSize=shoeSize;returnthis;}// Finally a build method that calls the hidden constructor:publicPersonbuild(){returnnewPerson(firstname,lastname,birthYear,length,weight,shoeSize);}}

Woah… that is a stupendous amount of code just for a better readable immutable way of object construction. I don’t have a good solution for this problem, the Java language doesn’t have a good way to construct objects (especially immutable objects) in a readable manner. Constructors end up having a lot of confusing nameless arguments; or you’ll have to write a huge Builder class with a nice fluent interface.

The Java language is constantly evolving and there are now proposals to add ‘value types’ in Java. Reading through the proposal it seems the only way to construct the value type will be using the constructor, but I’m afraid this will quickly become a burden again. I’d love to have a better way to construct objects and (in the future) values, although I have no idea what it should look like. I’d love to have a fluent way of object creation without having to code a big Builder class, preferably in the language itself.

Would it be possible to change to language in a backwards compatible way to allow this?

One possibility would be to ‘steal’ from other languages, for example Moose Perl:

publicclassPerson{privatefinalStringfirstname;privatefinalStringlastname;privateintbirthYear;privateintlength;privatedoubleweight;privatedoubleshoeSize;// Full constructorpublicPerson(Stringfirstname,Stringlastname,intbirthYear,intlength,doubleweight,doubleshoeSize){this.firstname=firstname;this.lastname=lastname;this.birthYear=birthYear;this.length=length;this.weight=weight;this.shoeSize=shoeSize;}...// Made up syntax, similar to Moose, providing syntactic sugar.// Leaves the other fields null, compile error if final fields aren't set.Personme=newPerson(firstname=>"Roy",lastname=>"van Rijn",birthYear=>1983);

This has the readability advantage and has the flexibility of the Builder pattern (not needing dozens of overloaded constructors).

As some readers might know, I sometimes play a programming game called Core War. This afternoon I was browsing through some old ‘Core Warrior’ newsletters which John Metcalf has collected here. When reading an article about an old successful warrior called ‘Thermite II’ I came across this tiny unsolved mystery:

-- Bug?
John K.W. mailed me with a tiny, suicidal warrior which managed to beat
Thermite once in two hundred times. All I can imagine is that, that one
time, I somehow scanned his code within its few cycles of life, and that
then I crashed as a result. But how?
Program "Thermite II" (length 100) by "Robert Macrae"
;strategy Same old strategy, but nastier...
Killing Hazy Shade Of Winter III wins: 1
Thermite II wins: 199
Ties: 0
How did I get a win against Thermite?!?!
Here is an EXACT copy of what I sent to Pizza...
Thermite would've had to die within 3 cycles! ... :/
;redcode-94
;name Killing Hazy Shade Of Winter III
;kill Hazy Shade Of Winter III
;author John K W
p: ldp.b #0, #0
jmp <-1
Any ideas, anyone :-?

When executing the code of the second warrior “Killing Hazy Shade Of Winter III” I noticed there is a false assumption in the email. The warrior doesn’t need to die in three cycles!

The first round it’ll execute:

LDP.B #0,#0 (and become LDP.B #0,#-1)

Then it executes:

JMP <-1 (to a location two higher than the warrior, which is usually empty code… but not always!)

The final line in Thermite’s code is a JMP instruction to the launch-instruction which loads Thermite’s bomber ‘Brand’. If John K W’s warrior was loaded at a position 100 or 101 from Thermite it would parasite into the bomber and become Brand before Thermite itself does. This is how the small seemingly suicidal warrior can kill Thermite.

As input he is using thumbnails from the Ludum Dare game programming contest.

To make such a program you basically need four things:

A collection of thumbnails/photos/images to place

Divide the target mosaic image into a grid of tiles

Have a scoring/measuring method for each tile

Have a placement algorithm

Greedy

Will is using all the thumbnails from the contest and a given target image. He then divides the target image into a grid. Next comes the important part, how do you measure how well a thumbnail matches a target tile? He is using a clever per-pixel RGB matching technique. The closer the color matches, the higher the score.

The final (important) step is placement. For each tile he takes the highest scoring thumbnail and assigns it to the tile. If you keep doing this (greedy) you’ll get a good recognisable result:

Random swapping

When he explained his algorithm (in gtalk) it reminded of the problem I encounter in almost all the Al Zimmermann programming competitions. Eventually all these algorithms boil down to search algorithms. You are looking for the best combination of images with the highest overall matching score (correctness/likeness). Instead of using just the greedy algorithm I suggested he’d try randomly swapping a couple of images and checking the score for improvements. This turned out to be a difference of night and day, check it out:

Of course there are still problems with this method, for example: you’ll quickly run into a local optima. Maybe you’ll have a much better image if A -> B and B -> C and C -> A. This will never be reached by swapping two tiles if the individual steps don’t have an improved score. This can be countered by swapping multiple pairs at once and hoping you’ll hit this correct pair.

There are a lot of other ‘smarter’ things you could do, for example always try to put a ‘best match’ on a particular tile and trying to fill the hole it created…. But for now adding this simple random swap is perfect!

Be sure to check out Williams other tech related blogposts, he’s also involved in the development of the amazing Mill CPU.

Ever since the first time I heard about the Turing Test I’ve wanted to make my own chatbot. It started probably twenty plus years ago, when the only language I could program in was QBASIC. At that time I never got further than:

Hi computer!
Hello.

…. and now? ….

Current state

Since that first try (aged 10 or something) I’ve never tried to build another chatbot. But a couple of days ago I read a news article about the Loebner Prize, an annual event that tests the best chatbots in the world against some human judges, and it sparked my interest again.

I started researching the Loebner Prize winners and there seems to be three distinct groups/types of chatbots and algorithms:

Template based bots

Crowdsourced bots

Markov Chain bots

Let me quickly describe how they work.

Template based bots

A lot of populair bots (like A.L.I.C.E. winner of the 2000, 2001 and 2004 Loebner Prize) use pre-defined templates. Most of them use a certain XML template called AIML. This is a short example:

When somebody talks to the bot it quickly goes through all the templates and finds matches. This particular pattern will for example match:

What is your name?
My name is Chatbot.

These kinds of bots have enormous databases of AIML templates, mostly hand-crafted by skilled bot configurers. Although they work very good, they don’t seem very ‘smart’ and AI to me; so I’m not going to make another AIML-like bot.

Crowdsourced bots

One of the best (most human-like) chatbots is Cleverbot. But in my opinion it is one of the most simple bots around. It uses a nice trick and huge database. When it encounters a question it doesn’t know/understand, it just ignores it, and stores it. In another chat session, it’ll mimic and repeat that question to another human. The result is stored in the huge database for future conversations. As a result all the answers it gives are very human-like (because, well, they are human answers).

But there is obviously a huge drawback, for example at one moment the bot is pretending to be a 18 year old male, the next moment it claims to be a 40 year old female. Then it starts talking about how much it loves horses, the next moment it says it hates animals…

"Let's store this sentence, let's store it in a markov chain."
1> Let's store this
2> store this sentence
3> this sentence let's
4> sentence let's store
5> let's store it
6> store it in
7> it in a
8> in a markov
9> a markov chain

Now let’s imagine we are building a valid reply to a question and we already have “let’s store”… what can we do next? We go into the chain and walk the nodes until we find the two matching results (1) and (5). So the sentence can continue with “this” or “it”.

Famous bots like MegaHAL kind of work this way (see their explanation). Although this feels much more like real AI/knowledge to me, it also has drawbacks. For example you can’t say these bots are reasoning, they don’t understand the environment/context it is in.

A new attempt

I’ve made a list of what my ideal chatbot should be:

Learn through conversation/reading text

Not just repeat, but understand relations and concepts

Have different scopes, global knowledge, conversational scope

Two days ago these ideas started to take shape in my head and I started writing the first code. The first goal is to make the bot able to read text and extract ‘knowledge’ from it.

The first thing I had to do was to break up the input text into pieces, for this I found a great open source framework called Apache OpenNLP. It recognises words and sentences, it detects verbs, nouns, pronouns, adjectives, adverbs etc.

The next thing I wanted to do was to turn all the nouns and verbs into their ‘base’. When storing relations in the bots memory I want to avoid duplicate entries, so for example “fish are animals” and “fish is an animal” is the same. For that purpose I’m using WordNet® in combination with the Java library JWNL.

Understanding what kind of words are used and being able to transform them into the base-form will make it easier to store ‘knowledge’ and make sense of the world in the future.

Graph database

Instead of learning how to use a real graph database (like Neo4J) I decided to build something myself. Normally this is a horrible choice, but I’m in it for the fun and to learn new skills. Although it is yet another distraction from the actual bot, after a couple of hours I’ve got the following unit test working:

There are already more facts possible to extract from this ‘knowledge database’ than just the plain information I put in.

Data extraction

The next step in making my bot is going to be data extraction. I’m probably going to make a small template language that (might!) look like this:

Pattern: [noun] {be} (adj) [noun]
Store: %1 is %2

The template should match sentences like “fish are animals”, “roses are red”, “Roy is a handsome dude” and “Roy is an obvious liar”.

The bot should be able to store all these ‘facts’ and put them in my graph/relation database. With these data extraction templates it should be possible to build a large knowledge base with facts for the bot, for example just by parsing a Wikipedia dump.

Now I’m going to dive back into the code and continue my chatbot adventure, keep an eye out for part 2!

This week I’ve started running/jogging, and I’m using RunKeeper on my iPhone to track my progress.

RunKeeper stores all the GPS data from your runs. This data is displayed per run, including a nice map of your route. Most important data can be reached throught the web interface, things like pace (min/km or min/mile) and distance. The most rewarding thing in running is breaking your own records, and RunKeeper has a couple of records:

Longest run (distance)

Longest run (time)

Most calories burned in a run

Fastest average pace

etc

As you can see, all those statistics are about single runs, what I’m missing are the following records:

Fastest 1 KM, 3 KM, 5K, 10K etc.

For example, when I’ve run a very fast 5.5 km run, I’d love to see this reflected as my 5K personal record, but right now it is lost because I’ve already done a slow 6 km run and a very fast 1 km sprint.

Export data

But luckily RunKeeper has a very useful option for us developers: Settings > Export data.
This results in a ZIP file with GPX files, raw GPS data with time and location!

JAXB

The first thing I did was download the XSD and generate the JAXB Java code:

The next thing I did was translate the lat+long+time waypoints into ‘legs’ consisting of meters+seconds.
Those legs can be used to check for a record during longer runs, like this:

@Overridepublicvoidanalyze(List<Leg>legs){List<Leg>cache=newArrayList<Leg>();doublerunningDistance=0.0;for(Legleg:legs){//Add next leg to the tail:
cache.add(leg);runningDistance+=leg.getMeters();//If possible remove from head:
while((runningDistance-cache.get(0).getMeters())>targetDistance){runningDistance-=cache.remove(0).getMeters();}if(runningDistance>targetDistance){//Check if the current target distance time is a record:
checkTime(cache);}}}

For example using a target of 5000 meters in a 5070 meter run, this analysis finds the following 5K times:

I couldn’t believe this feature wasn’t available in RunKeeper… but after a lot of Googling it turns out a lot of other people are looking for this ‘most requested’ feature! With a little bit of Java (100 lines of code) you can get a pretty good result:

I’m now thinking of making a simple JavaScript application (no backend) that uses the RunKeeper export zip file as input and displays a lot of additional data. Are you interested in this? Please drop a comment and convince me to make this for you guys and girls!