We've developed a new syntax for message expectations in RSpec, which brings
message expectations in line with the new expect syntax
for state based expectations. This change required us to make a number of
internal refactorings, so we're cutting a release candidate of RSpec, 2.14rc1,
so that early adopters can try this new feature out. We encourage you to try
out the release candidate and report any issues to https://github.com/rspec/rspec-mocks/issues.

In 2.14rc1 both the new syntax and old syntax are enabled by default.
This is in line with state based expectations, where both syntaxes are also
enabled by default.

Myron's post
explains why using the expect syntax is preferable to the should syntax in some
cases. The problem here is that the :should syntax that RSpec uses can fail
in the case of proxy objects, and objects that include the delegate module.
Essentially it requires that we define methods on every object in the system.
Not owning every object means that we cannot ensure this works in a consistent
manner. The expect syntax gets around this problem by not relying on RSpec
specific methods being defined on every object in the system.

We think that the new :expect syntax for message expectations is an
improvement, and that you should try it in your fresh and existing spec suites.
You can get the new syntax by upgrading to RSpec 2.14rc1, and we encourage you
to do so.

I made a tiny mac app that checks GitHub Status. It
lives in your status items menu, goes orange for a minor outage, goes red for
major outage, and will stay deliciously black if GitHub is all systems go.
Clicking it will show you the current status message, if any. You can get it
here.

We are the music makers, and we are the dreamers of dreams. Since the dawn of consciousness, humanity has strived to reach those faint glimmers of light in the nights sky, to satisfy our curiosity, to know what we can find in the great encompassing dark. Few are the men and women that have taken the reins, and dared to go to those far away places that we can only dream of.

History is largely told as a chronicle of great people doing great things, but for most of us life is not made up of big moments, it's made of small moments. When Neil Armstrong uttered those all too famous words, and took his giant leap we all took part in a big moment, we all felt the pride in what we had achieved, finally dipping our toes into the ocean of the universe.

It is Neil deGrasse Tyson who said that "we stopped dreaming", that the end of the space exploration programme has taken something away from us. With the passing of the first man on the moon, I hope that we can return to dreaming, and maybe look toward tomorrow.

So today marks the last day of my internship at
Opposable Games, where I've spent the summer hacking
on stuff for their iOS based games. I've learnt an awful lot about how to
develop for iOS and had a really great time doing it.

I have a relatively web-based development background, spending most of my time
working in scripting languages, working with databases etcetera. It's been
refreshing to have a change of pace to high-performance low level programming,
although a certain amount of pain has been involved with this.

I've been working with the Cocos2d library which
is very good for rapid game development, but as newer versions of Xcode were
released a small number of bugs seemed to creep into the project, given that
few other things were changing my guess is that the library hasn't been updated
to work with the newer version of the compilers or something of that ilk. Either
way I'd very much recommend anyone who wants to do iOS game development takes a
look at Cocos.

Speaking of Xcode I have to say that I'm not a fan. It's crashed on me many times,
run slowly and forced me to reboot both my mac and my iOS devices to get it to work.
It's a shame that it's the only development platform for iOS because
as an IDE I have to say that I'm not impressed.

Another problem with working in a native programming language (Objective-c) is
that operations are memory unsafe and there's no garbage collector. As much as
that's great from a performance standpoint, when you're used to programming with
the "safeties" on, it can be a weird phase shift to stop using them. Manual
memory management is always going to be more hit and miss than having a garbage
collector, but for the most part I think we've done a pretty ok job in this
department.

One of the games we're working on requires a more or less completely real time
sound engine to work. I was tasked with building the prototype that plays the
sounds in (as close to possible) perfect sync. This lead to me learning about
the lowest level of core audio, which I would describe as being like a jet fighter.
It's insanely powerful, but get anything wrong and you're going to crash and
burn horribly (in most cases getting an earful of audio noise for your trouble)

If you're a programmer interested in getting into the games industry I'd very
much advise you to do a short internship with a company like Opposable where
you can learn what it's like to build the sorts of technologies required for
modern video games, because it's nothing like any other type of system you're
ever going to build, you'll learn a loads, and probably have a really
great time doing it.

For my part I've had a great time this summer and am now looking forward to a
well earned break before heading back to university for my final year.

The event

Last weekend I participated in the Data Science London group's hackathon.
The challenge was to take some data provided by EMI and use it to build a
recommender system that could predict how much a user would like a track
based on previous ratings, demographic data and some interview responses.

When I arrived at the event I grouped up with some guys from a company called
Musicmetric. The team then eventually split into
two groups, a guy called Ben and I worked on the recommender system problem. The
rest of the Musicmetric team started working on building visualisations with the data.

The hackathon officially started at 1pm on Saturday London time, and went on until 1pm the
next day. I was one of a small group of people that survived the entire 24 hours, with most of
the participants going home late on Saturday evening/early on Sunday morning. Food was provided
which was excellent, and this allowed us to focus entirely on the problem. As a tea
drinker I was slightly disappointed by the quality of the tea, but everything else was really good.

The hackathon took place in The HUB Westminster, which is a really nice work space. It is light and airy
and there were even some rooms left intentionally dark for crashing in (I slept on a beanbag for about 2
hours, and would recommend that if you go to a future hackathon you take a thermarest/camping mattress).

Our solutions

The problem was hosted on the Kaggle platform, which provides training and test data
and takes classified test data and evaluates it behind the scenes, giving you an output score. You can
see the scores of all the other participants, and within seconds of the competition starting a solution
had been posted that was very good. This was probably due to the data set being released before the
competition started, and someone training a really strong classifier ahead of time, testing it in cross
validation and then running their solution against the data and submitting. The evaluation
criteria for the problem was RMSE which means we
have to focus on minimising the overall distance between our solutions and the correct answer, as opposed
to the number of instances we get correct.

Our first solution to the problem was to apply simple collaborative filtering to the problem,
this seemed like an obvious approach because we're trying to build a recommendation system given
a bunch of input (user,item,rating) triples and a bunch of user,item pairs to predict. The RMSE of
this approach in cross validation was about 22 (out of 100) with a result of roughly 18 on the actual
test set.

We were given a lot of demographic information for each of the users, and it seemed to make sense to
attempt to break our approach down by demographic bins. Trying various combinations of the demographic
information we were given, however, yielded no gain in cross validation or against the actual testing data.

After racking our brains for a while we came up with the idea of using a random forest ensemble method to
solve the problem: shoving all the demographic, interview response and other data in and having the forest
classify in a brute force manor. This solution was implemented with roughly 2 hours to go until the end of
the competition. Knowing we did not have long to run our solutions we started with a very rough and ready
approach and jumped several places in the rankings. Excited we started running a number of different random forest solutions
with different parameters to try and find which parameter gave us the best jump. After determining that tree
height was going to give the best results we set two classifiers runnning with different tree heights on each of our laptops.

They both finished and we submitted them with a minute and 20 seconds to go until the end of the competition. We jumped
all the way to third place, which was really exciting. The person who won the competition used the exact same approach as we
did, but had been running it since the start of the competition which suggests that we may well have been able to win
if we had more time to fiddle with the solution parameters.

Thoughts about the data

The data we were provided with by EMI contained a lot of information. We found, however, that the demographic
information did not improve our classification accuracy at all. There are a couple of conclusions we could
draw from this. The first is that music taste is not effected by age, gender, region or any of the other
information we were provided with. I'm not sure I believe that 94 year old males have the same
listening tastes as 16 year old females, so I'm going to reject this conclusion.

The more likely conclusion is that there wasn't enough data
provided for demographic information to help. Every time you split by demographic you reduce the size
of your training and validation sets. This means that the accuracy of the individual classifiers are
reduced, and as such the accuracy of the overall classifier of all the bins is also reduced. Given a couple
of order of magnitudes more data it might well have been the case that we were able to produce an accurate
classifier based on demographic information.

I had a great time at the Data Science Hackathon, I would very much like to participate in another one in the future.
There were prizes, free t-shirts, free good food and really excellent people who understand a lot more about
machine learning and data mining than I do. I'm really really glad that I went. I'd like to give a special shout out
to Ben for being an awesome teammate, Greg for being supportive overnight when I began to burn out and Carlos for running
things and just being a generally awesome dude.

Week 1

TL;DR: Opposable Games good, Sam learns things, program games for iOS not Android

So this week I started my summer internship at Opposable Games. Opposable is a small
independent game development team in Bristol. I'm really enjoying working with the team, spending 3 days a week in the office with the other programmers and games designers. The week started with a meeting. In the meeting progress on current projects was discussed and it was decided what progress was to be made this week. Whilst I didn't have much to contribute in the first meeting it was good to see exactly what the rest of the team was working on and getting to know the people I'm working with.

After the meeting I dove head first into building an iOS game from the ground up. Ben suggested I use the Cocos2d framework to build the game. Having never coded for iOS before the progress at the beginning of the week was quite slow. I tried to focus mainly on learning how to write code, and use the framework. We're now at the end of the week and I feel that I've made excellent progress with the project, having integrated graphics, physics, networking and sound into the project I am working on. There are a number of features that are yet to be implemented which don't require crazy technical gymnastics, but will require lots of building to polish.

Working with Opposable so far has been great fun, I've learnt a lot, and I've also been asked for creative input on the project I'm working on and others. I really enjoy writing code and seeing results come out the other end, and interfacing with the projects that the group is building has been really satisfying. Particularly they have a controller system that connects via a network and my iOS program has had to integrate with that. For this I needed to learn socket programming which is incredibly complex, but it was very satisfying to see the working result.

Whilst I am only an intern, there have been times when my technical knowledge has been called upon to help the other programmers in the team, particularly related to the use of the Git revision control system. We're also continually discussing which technologies I am familiar with to see if there's a good place where I can deploy my expertise.

I often build scripts that need to have some kind of network persistance layer or
tiny web services that munge files or json or whatever. When I have to do this I
don't immediately reach for rails, or any of these other super heavyweight frameworks.
The reason for this being that I don't need all the extra super powers those
frameworks come with, and I can instead deal with a little more of the manual stuff
because I'm not going to be spending much time doing any of that anyway. This article
will try to serve as a guide to setting up tiny python projects on heroku. Using the
Notely Server as an example.

Change something

In app.py, you can see a route that matches "/" and returns the text 'Hello World!'.
This is the base point for our app, use the Flask docs to
change something, run the server with forman start and see what it's doing locally
before pushing back to heroku

Persist stuff

When you ran the giant blob of commands up there, you added a database to heroku using
postgres. You can interface with this database by using a psycopg connection. To create
one you can use the following python snippet

Conclusions

This is, I'm pretty sure, the fastest way to get from nothing to a running web service
with a database that you can use to build stuff in existence at the moment. For me
it's been incredibly useful to be able to throw these services up. I wouldn't have been
able to do that with heroku.

Redis is a key/value store that I've recently used for the
Student Robotics competition. I really like it, but
I think it's got some flaws.

Redis has a bunch of datatypes: Strings, Hashes, Lists, Sets and Sorted Sets.
Firstly, you'll note that there's a lack of the integer data type, but redis has an
INCR command. This command operates on redis's string data type, and if that string
is actually an integer, that integer will be atomically incremented. Whilst I know that
you can store integers (and floats) in strings, it doesn't seem to me to be a good way
of storing these commonly used data types. Additionally if you're using a redis binding
and you do something like this:

>>> redis.set("my_key",0)
True
>>> redis.get("my_key")
'0'

The binding has no way of figuring out if the data it gets back should be:
an integer, or the string "0". This means that any code one
writes where integer values for keys are set, one has to add extra code when
one pulls the data out of redis, so that the data can be treated
as integer values. Alternative key/value stores and databases have had
the ability to store values in integer data types for
long time. (redis also does the same thing with strings for booleans and nulls)

You'd think that with redis's more advanced data structures (hashes and lists for
example), you'd be able to do some nesting, so that for example you could have a list
of hashes. Unfortunately this is not the case. When we were working with redis we spent
a little while trying to come up with a solution and we came up with two alternatives

A list of json strings: redis's list structure can only store strings (or intish
strings), so we nested our data structures using json strings. This meant that when
we took items out of the list they had to be json parsed and json encoded. This wasn't
too much of a pain, but it wasn't particularly elegant.

Make keys heirarchical: For student robotics we decided that we'd namespace our
keys in the same way, prefixed with "org.srobo". For our a list of teams we had keys
of the form "org.srobo.teams.n.thing" where n was the team number. This meant that
we could nest our data structures by using a tree of variables, storing things in
some nodes and nothing in others.

Of these solutions I tend to prefer the first one. Whilst it's slightly more horrible it
does mean that all your data is conceptually stored in one place in redis. Redis makes
no distinction between keys, so there's nothing in redis that allows it to interact
directly with our structured heirarchy, instead that was dealt with in python scripts.

Redis has a publish subscribe mechanism which is extremely useful. The basic idea being
that you can subscribe to or publish on a "channel". There isn't anything that
particularly relates the data you've got stored in redis to the way output occurs on any
given channel, in fact you could not store any data in redis and just use it as a
publish subscribe mechanism. I can think of many strategies for combining uses of
variables and keys, but for our project we came up with a pretty good solution.

In our solution we use the redis command monitor which sends an update any redis
command is executed, we then read the output of that and any time a variable is modified
we publish a message on a channel with the same name as the variable letting any
subscribed programs know that that variable has been updated. We don't publish the value
but just the fact that an update has occured.

Redis is a very cool piece of technology, and I think it's definitely worth having a
play around with. We used it for a production system over the weekend with about 20
updates a second and it seemed to work fairly stably. I'm not convinced I prefer the
system over SQL or other key/value stores (like MongoDB), but
I've met people who use it in production, and they all say that they love it.

So yesterday (trust me, it was yesterday from my point of view) I created an application called notely (github). I've just finished creating the sync component of the notely software. You can now "pair" your notely instance with another notely instance and sync it by typing "notely sync". The notely server is a tiny python app which I plan to blog a little about the construction of at some time in the future. It's hosted on heroku because heroku's cool. You can get the source here: github

I'm at an airport with nothing but a laptop and wifi, so I built a little command line utility to allow me to quickly save small text notes for myself. I'll probably extend this to have note sections. The tool is called notely and it has a really simple command line interface. I mostly made this because I often want to have a list of a few text items for things like to-do lists or to leave reminders for myself. I think something super fast like this is exactly what I need. Todo: sync and a webpage that makes the data available on tablets/phones/whatever. The code's available here

London is a great place for tourists, and people often talk about the "hottest" places in London. I wanted to build some way to visualise this, and I'm really happy with the results.

By mining the 4square api I was able to determine where people were within London, places like coffee shops, tourist attractions, clubs and gig venues are all unified into 4square. By getting a map from Open Street Map and overlaying the data from 4square using a simple lighting equation I got really nice results.

Check the map out here and the source on github feel free to fork me :)

Every time you write an open source project, you should license it so that other people should use the code. I’m not going to tell you which license you should use because that’s a matter of great flamewars. I’ve written a very simple utility called licenseme to put a license in your projects.

It’s really easy to run, just pass it a list of the names of contributors and their email addresses and it’ll generate a license for you of the specified type. I use this in all my projects that I open source because, like I said, people need to know under what terms they can re-use your code.

In Starcraft we measure “apm”. The number of actions you
perform in a minute to determine how fast you’re playing.
I wanted to do the same thing on my command line, so I wrote
a tool called command-line-apm.
It’s a really simple python script which you can put in your ps1 to show you how
many actions (commands) you’re running in a minute.
I max out at about 18, see if you can do better.