Stupid Software for Clever People

Around two and a half years ago I wrote a blog post looking at the use of words like computing, code, programming etc in primary school Ofsted reports. I thought it might be a good way to track if computing was on the rise in this age group.

I’ve repeated the exercise with a sample of recent reports and a sample from 2011/12. The disappointing news is that there is no significant difference in the frequency of computing related words.

Clearly it would have been a better story if there had been a difference…. but given that I’d gone to the effort of scraping the text of almost 6000 reports from Ofsted’s website, I thought I should at least look for what the differences actually are. How has the language changed over the last couple of years?

Here are the top 75 most significant differences. The first chart shows words that appear more often in recent reports. The second shows words that appear more often in older reports. Theres some technical stuff below about the method, but what is graphed here is something called the log-likelihood of a difference – basically the bigger the line the more significant a difference there is.

Firstly, I was at least encouraged by the fact that “Mathematics” is mentioned a lot more now (at least thats Computational ). The other thing that jumped out at me was the change from the use of the word “disabilities” to “disabled”… a reflection of a general shift in language in this area?

As for the more educational aspects, I’m not really qualified to comment on how meaningful this analysis is, but would be interested to hear views from teachers.

The technical bits

The idea to use a log-likelihood score came from this paper by Paul Rawson and Roger Garside – However, as always with these things, there was a lot of munging required before being able to use the method.

Ofsted publish reports as PDF’s – these were scraped (painfully) from their site and converted to text using Python and pyPdf for the conversion. The reports contain a lot of boilerplate text and this has evolved over time. To prevent that from influencing the final results I wrote some code to remove the 1000 most often repeated lines from each corpus. Not perfect but it seemed pretty effective. NLTK was used to tokenise the text, remove stop words and do the basic frequency counts and then I coded up the log likelihood scores directly from the paper referenced above. Graphs were done in R.

Share this:

Just over four years ago I found myself having the same conversation over and over again. It would be good to have a regular Meet-up for coder/designer/maker/startup type people in the town I live in?

After having this conversation one too many times I realised that there was only one way to make it happen. I bought ReadingGeekNight.com and wrote a blog post along the lines that if ten people said they would attend and one person agreed to do a talk, then I would organise it.

On that first night four people spoke to an audience of forty or so people – every month since we’ve repeated the formula and four years later we’ve clocked up almost fifty Reading Geek Nights.

However, lots of things have changed for me since we started and organising the event every month has become more difficult. So I feel its time now for me to step aside.

We’ve had an amazing range of speakers talking about an eclectic bunch of topics… Interface Design; Cybercrime; Equality in Tech; 3D printing; and hundreds more. I’m humbled that so many people have given up their time to stand up and share things with us.

It’s a great event that regularly attracts fifty to sixty people with pretty minimal marketing effort and I’d love to think that Reading Geek will carry on without me. Of course, I’ll do whatever I can to help whoever comes forward to get up and running and always be a supporter from the side lines (and enjoy sitting in the audience!!)

My last Geek Night (as organiser) will be on 12th November – I’m hoping that I’m overwhelmed on the night with people who are keen to carry on the tradition and to breathe new life into what’s become an established event in the Reading tech community calendar.

Thanks to everyone who ever came along and especially thanks to everyone who ever spoke at Reading Geek Night. You are all awesome.

Share this:

I’ve been collecting tweets about BBC Question Time to produce these graphs of twitter reaction. As a summary of how twitter users reacted to the programme they work fairly well.

For a while I’ve been wondering about overlaying information gleaned from social media onto the video from the TV programmes as an experiment. Will it add a useful level of analysis? How easy is it to do? Does it make sense when you watch it?

So here’s my first stab, based on data I collected for the programme on 21st March. It shows a rolling graph of positive, negative and neutral sentiment and a dynamic graph of the relative frequencies of the most mentioned words.

Naturally its far from perfect. However (as always with these things) it’s the process of building it, getting feedback and iterating that ultimately improves it and makes it into something thats actually useful to someone.

Comments / Questions / Observations etc welcome!

Share this:

I thought it would be interesting to compare yesterdays UK budget speech with reaction to it on twitter. It’s one of those events where a message is ‘broadcast’ and you can then judge how it was ‘received’ by analysing relevant tweets.

People often use wordclouds for this kind of thing, but there are usually better ways to compare the information. Here is a wordcloud showing what the Chancellor actually said in the house yesterday…

Chancellors Budget Speech

…and here’s one showing all of the tweets using the #budget hashtag made while the Chancellor was speaking.

Twitter Budget Reaction

It’s hard to see the difference. If you spend a long time with it you can pick up words that are larger in one than the other, but it’s hard work. In these cases a simple old bar graph is much easier to interpret. Here’s one which looks at the top twenty or so words (having removed one’s which aren’t useful for a comparison).

This time it’s much easier to ‘spot the difference’. On twitter the words “Duty” and “Cut” featured much more heavily than in the Budget speech. The Chancellor didn’t use the word “Beer” at all. When the Chancellor referred to figures – Osborne used the word “Billion” many times – that didn’t feature particularly on twitter.

So can we draw any useful insight from the relative word frequencies? If there is a difference between the message sent and the message received, it’s that people* resonate more when it comes to changes in duty and cuts than they do when it comes to business and figures (even if they are in the billions). No surprise there then.

*more accurately… people who tweet about budgets

Share this:

I’ve known Alan Bradburne (@alanb) and Matt Mower (@sandbags) for a couple of years now. We’ve often put the world to rights over coffee, but until now, I’ve never had the chance to work with either of them. Thinking that our skills might complement each other, we agreed that we’d hack something together as an experiment… and this is the result…

You know that getting feedback from your friends / peers / random-people-from-the-internet will help you

You record a video of you doing your thing (be it a 30 second elevator speech, a product demo, a song & dance… whatever) and upload it to Youtube (or maybe you point people towards someone else’s thing)

You ask friends to help (or whoever you want) by visiting the tubeinsight website.

They watch your video, while moving a slider up and down to indicate how much they like or dislike what they see as your video plays. Effectively they highlight the bits where you do well, and the bits where you don’t do so well. If you want to have a go now at giving some feedback click here

TubeInsight records and aggregates the real-time feedback from everyone.

You go to your results page on tubeinsight. There you can watch your video with an animated graph overlay which shows everyones feedback. To see an example click here

At this point (hopefully) you’ve learnt something. You can use your new-found knowledge however you like!

Obviously its a rough prototype. Maybe Like/Dislike is the wrong question to ask? Maybe the interface isn’t intuitive enough? Maybe people should be able to just review a small portion of a video etc etc.

However, even though it’s pretty basic right now, we know from the initial feedback we’ve had that there are lots of directions it could go in (if any jump out at you then do feel free to tell us, we’d love to know!)

If you know of any communities of people where real-time anonymous feedback of the sort of activity that can be video’s (no – let’s not go there) is valuable, then we’d love to talk to them – put us in touch.

Share this:

Chris Hadfield is one of the Astronauts currently on board the International Space Station. He tweets a lot about life above the earth as he orbits our planet at a speed of just under 8km per second.

As I’m interested in astronomy and space-related stuff, I was idly wondering how easy it would be to map where Chris has tweeted from… he must surely be one of the most pan-global twitter users there is.

Many people choose to geo-code their tweets, showing readers where they were when they hit the tweet button. Chris’s tweets don’t include a position, but being on board a spacecraft with a regular orbit means it should be possible to reverse-engineer his location from the time each tweet was sent.

Heres a a little web-app I built which shows 800 of Commander Hadfields tweets and where the ISS was at the time the tweet was sent. (It requires a WebGL capable browser – if you don’t have one then heres a YouTube video of what you would have seen).

(for the technical amongst you….)

There were three components to building this…

1. Request a bunch of tweets from the twitter api – when these are returned, they include a UTC time which represents when the tweet was created.

2. Work out where the ISS was at the time the tweet was created. This turned out to be somewhat more difficult than I hoped, but I learned a huge amount about NORAD, orbital data, and spaceflight mathematics in the process.

3. Plot the positions on a spinning globe. To do this I used the excellent open source WebGL Earth library, which made the actual animation part of the project relatively simple.

So… is it useful?… probably not, but it made for a nice little side project.

Share this:

For a side project I’m doing, I needed to be able to find out the historical position (as a latitude/longitude) of the International Space Station. Given the number of ISS tracker sites available, I’d hoped there would be an API somewhere for it. However, after much searching, I couldn’t find a single one (Wolfram Alpha’s website will give you the info, but you can’t get at the info using their API and even if you could, their terms don’t let you store the data).

Given that I needed to build something to calculate the information, I thought I may as well also publish it as a freely available API – hopefully it may save someone some work.

How it works

NORAD publishes data for earth orbiting objects which you can use to calculate their positions. The data comes in the form of TLE’s which (if you sign up for an account) you can retrieve from an api at space-track.org (if you are old-school you can use a Nasa JPL telnet interface to query their database) . Once you have a TLE you can calculate positions from it using a public domain algorithm. Each TLE is only accurate for a point in time – so as you get further away from that time, your prediction will be further out. (around 3km’s error after 24 hours) For this reason the TLE’s are published several times a day.

The api works by maintaining a database of all of the published TLE’s for the ISS since late 1998 up until the present time. When you make a request the api finds the nearest valid TLE and then uses that to make its calculations. Thankfully the astro-physics number-crunching side of things is handled by a library .

API details

Note: the mechanism that pulls the TLE information for this API stopped working in July 2014 – I haven’t had time to fix it, so positions are accurate before that date, but not after

You can access the api as follows…

http://jimanning.com/issapi/?unixts=1359548643

…where unixts is a Unix Epoch time in seconds. if you omit the unixts then it will return the current position of the ISS. If you specify a time in the future, it will still make a calculation, but it won’t be accurate.

Share this:

BBC Question Time has become one of those TV programmes that I now rarely watch without also reading and interacting with the #bbcqt hashtag on twitter. Clearly I’m not alone – last night there were approximately 36,000 tweets on the hashtag over the hour or so that the programme was on. That’s a lot of data about a TV programme – and given the programme’s political nature there must be some really interesting information in there about politicians and the way people react to what they say.

Last night I captured every tweet using the #bbcqt hashtag that was made between 10.30pm and 11.45pm (the programme runs for an hour from 10.35pm) from the twitter api (with this volume of tweets you need to be sneaky to avoid crashing into the api limits… but it’s possible)

Before the programme I wrote a quick bit of code so that during the show I could capture which person was speaking when.

Afterwards I put together some code to….

divide the tweets up into ones that were obviously about the panellists and ones that were just generic and then further divide them up into one-minute chunks

remove all of the rubbish bits (punctuation, inconsequential words etc) from each tweet

With the data cleaned up and analysed I then coded up a front end to display the information (for the technical people, it uses D3.js and rickshaw.js for the graphing library).

Good things

I like how you can clearly see how twitter reacts just after someone has spoken – obvious really – but nice to see the data doing what you would expect it to.

There are some interesting points where clearly one of the panellists has struck a chord on a particular topic – more positive sentiment than negative after particular comments.

Things to Improve

The classification is trained on some generic good word/bad word data – I reckon a much more accurate sentiment would be gained by training the classifier on actual #bbcqt data (especially as there’s some quite choice anglo-saxon swearing that the current classifier doesn’t recognise)

I gave up, because I didn’t have time, but theres some really interesting information in analysing word frequencies within the tweets – maybe one to develop later

What’s next

I’m interested to find out if there is an appetite for this kind of (very niche I know) analysis – do Political parties monitor this stuff ?- is there some valuable feedback in there for them?

Share this:

A little over three years ago, I co-founded a company called SocialOptic. It’s a fantastic company, with some great products, but I’ve decided the time is right to move on.

Throughout my career I have always worked on project-based things – things with a start and an end – from building news production software at the BBC to creating a new project services team in an Oil & Gas company – from rescuing failing projects to writing business cases for future technology investments. It’s always been about starting from a concept; being creative; defining the why, who, what, when and how much; getting people on side and finances approved; managing a team building something new and then handing that on to an operations team to run with.

About three and a half years ago I started mulling over some ideas I’d had for a Project Management software product. I knew that it would be a useful tool, and could also see that no-one had built it yet… So I left my job (at the time I was contracting as a Projects and Programme Manager), rented a house in the foothills of the Sierra Nevada mountains in southern Spain, and went on an extended family holiday. In between family stuff and walks in the mountains, I taught myself to write code. It was quite a challenge, but by the end of our three months in Spain a very basic version of the product was ready and soon it was up and running and available to use on the web. I called it Milestone Planner and started to watch the sign up stats with interest.

My original plan had been to go back to contracting when we returned, but while we were away the economy tanked and pretty much every company I’d worked with before had put all new projects on hold. No new projects, no need for Project Managers. In the meantime I had met Benjamin (my soon-to-be business partner) and together we plotted and worked out how we might turn Milestone Planner from a prototype product into a fully fledged business.

We took the plunge and incorporated SocialOptic Ltd… and what a ride it’s been.

Over the last three years we have built two products from the ground up. We’ve had the satisfaction of seeing users become customers and watching them put our software right at the centre of their own project processes. I’m fantastically proud of Milestone Planner and what we have achieved. As the number of customers increases, the business is moving into a new phase, one where the key activities need to be focussed on operational and support matters rather than building *new stuff* . I’m not an operations person and never will be, so have decided it’s the right time for me to move on. It’s tough when a co-founder leaves a business, but we’ve been able to structure things so that Benjamin can continue to run the ship, and steer SocialOptic towards the solid operational success that I’m confident it will become.

So… if you know anyone who’s looking for a creative professional, who can speak business and code and has a track record of getting things off the ground… I’m available (here’s my linkedin profile for a potted career history)

Share this:

I’ve been thinking recently about how I can help introduce more young people to the joys of coding. So far I’ve scored a minor success teaching a group of nine year olds MIT’s Scratch, but I’d like to try something a bit more scalable, aimed at older learners.

One concept I’ve been mulling over is an easily accessible, online ‘coding game’. I have some basic thoughts, which I’ll outline below, but want to make it a collaborative, open source project. I’m putting the idea out there for comment / feedback etc and if you want to get involved in any way at all then get in touch.

So here’s my initial thinking…

You play the game by writing code to guide a character/robot/thing around the screen to solve a series of increasingly complex challenges.

When you complete the challenge, you get to see (and play with) the code that other people who also solved the challenge used.

There would be a ‘compete’ mode where you could play in real time, against another coder. At the end of the challenge we ‘swap’ the code, so both people can learn from the way the other has constructed their code.

My theory is that by setting out an ‘objective’ (ie completing the challenge / beating your opponent) to the coding and then sharing the code it will encourage people to learn from each other (I’d be v. interested on any educationalists take on this approach – how would we improve it?). Of course, its not a completely original principle, but I haven’t seen anything web based that uses this combination of competition and code sharing as a learning tool before.

I’ve expanded on some of my initial thoughts in this video.

I’m keen to make this happen, but I can’t make it happen on my own. What I can do is co-ordinate stuff; contribute ideas; do some of the code etc and generally move things forward. If you can help in any way then please leave a comment on the post and I’ll be in touch.

Some links for things mentioned in the video…

Robocode is a good example of a code based game, although my view is its a bit complex for the age group we’re targetting

Raphael is a javscript vector graphics library – i’ve used it for a few projects and it’s good on the cross-browser front

Skulpt is an in-browser implementation of Python. I think there are lots of things about python that make it a good language for kids to code in, but of course maybe theres a better choice

Node seems like the obvious choice to provide any real-time element for the competitive challenges