~ Clojure, Kafka, Spark, Hadoop and AlgoAutomation

Monthly Archives: October 2014

Let’s be honest, as web designers, solutions architects or whatever title you want to use, when a customer comes knocking for a piece of software they don’t ask what it will be coded in. Doing this BigData thing has made me realise more and more that there’s a serious oversell on the technology and not the actual requirement in hand.

I’m usually cynical when a prospect comes to me saying, “We have to do Hadoop!”. It’s usually the first sign that they’ve been reading the usual articles telling them how much a technology is going to change their business. “Having to do Hadoop” is usually the fourth or fifth question along the chain and it’s my job to make sure we get back to the first question, “What is the problem you are trying to solve?”.

Sometimes our choices are bound by necessity, I couldn’t see the likes of Learning Pool or Synergy eLearning getting off the ground quickly without Moodle, so straight away you’re bound to using PHP as a language solution as that’s what Moodle is written in. There’s no long arguments or deep discussions there.

In prospective clients we need them to have a clear understanding of their business, at a point before a solution is delivered, when the solution is delivered and the expectations once the solution is delivered. So for a “BigData” project (I’m under no illusion, most BigData projects aren’t that big) it’s a very clear definition of a business problem or question “We’re trying to find out X from data Y” or “We can’t scale this current solution, can you help?”.

Most clients I know don’t care what’s under the bonnet, it only becomes an issue when hiring talent, especially developers. Developers are a finite resource and with all the programmes to get developers interested (no bad thing, do not get me wrong) the problem is a now problem not a three years from now problem.

If a developer starts waving their arms in the air say it *has* to be done in (Go, PHP, Python, Java, Ruby…., Lisp I could go on), the fact of the matter is that there’s two states when this statement is trotted out, firstly it’s probably to do with the language that person is most comfortable with, the second (and more dangerous approach) is that’s the new language that the developer is desperate to learn and they’re willing to use you as the test project (it will take longer and you’ll hit more issues along the way).

I help tech companies figure out the best way to go, it’s my job and one I love doing. Most things can be answered in a fairly straightforward way too, it does though revolve around a simple, “what are you trying to achieve”, a simple and rational question to ask.

Everything else is under the bonnet and most times the client won’t be that bothered what it is, as long as it does the job it was commissioned to do.

Air travel makes it easy for any one of us to hop from country to country, for most of the time that’s fine. When the H1N1 flu virus started to spread though it wasn’t much of a surprise how the far and wide reach of reports was caused by, travel. With data being readily available it was easy to see how it was moving, a good example of this was the “Just Landed In” post which used Twitter data, Metacarta and Processing to visualise the movements (http://blog.blprnt.com/blog/blprnt/just-landed-processing-twitter-metacarta-hidden-data)

The Ebola virus has the potential to cause an awful lot of problems. So the question is where are the first connection points?

Gathering Route Data

The actual query to find out what flights goes where it easy enough and I’ll show you that in a moment. First though we need data, mainly airports, airlines and routes.

As they are CSV files they can be imported into a database such as MySQL easily. You can download the sql files here.

Crafting The Query

As it stands there are only four countries in West Africa that the Ebola outbreak is concentrated on: Guinea, Liberia, Sierra Leone and Nigeria. The United States and Senegal have travel related cases but I’m not looking at those just yet, one step at a time.

The SQL query is basic but it gives us good info.

SELECT
r.airlinecode,
al.airlinename,
a.longname as depairport,
a.country as depcountry,
a.iatacode as depcode,
b.longname as arrairport,
b.iatacode as arrcode,
b.country as arrcountry
FROM airports a, airports b, routes r, airlines al
WHERE r.depaircode
IN (SELECT iatacode FROM airports WHERE country="NIGERIA")
AND a.iatacode=r.depaircode
AND b.iatacode=r.arraircode
AND al.iatacode=r.airlinecode

This query will give us the departure and destination airports available for Nigeria, the country can be changed to anything.

Inspecting The Results

The output we get is the airline information, departure airport and IATA code and the destination airport.