Mind the stats?

Have you noticed how Google sometimes gives the top page in your search results a little summary box? For example, if you Google “how to plan a honeymoon”, you get this:

Since I didn’t do number two on this list, my job for tonight was to check out trains for our travel in the UK leg of our honeymoon. After my first Google search, I got a little distracted and consequently typed up this short post 🙂 I realised part way through that “mind the gap” is more of a London underground thing than a UK train travel thing, but it’s late so hopefully the reference still makes sense.

My first (and only) search tonight was for a train from London to Cambridge. Before even clicking through to the website listed, I got to read this little “statistical report” 🙂

The first two sentences got me questioning what “fastest journey time” means, since how can the “average journey time” be lower than the shortest journey time? The third sentence made me shake my head at the misuse our special stats word “average” and I automatically re-worded that sentence in my head to “on weekdays there are, on average, 96 trains per day…..”

This gives some immediate answers to my confusion about the Google search summary – I think. “Slowest route” actually means the minimum time, and “Fastest route” means the maximum time. At least now the average journey time of one hour sits between these two numbers, but did you notice when you scrolled down the page that there were some routes listed with times greater than 63 minutes, the supposed “fastest route”?

Me too, so I went through all routes for the next 24 hours (starting from 8:44am London time) and listed their times:

There’s bound to be a few mistakes in there when I was converting from hours to minutes 🙂 But to finish this short critique, let’s look at the data:

For this particular 24 hour period (from Monday 21st November 8:44am) there were 76 trains from London to Cambridge, with a mean journey time of around 64 minutes (based on the advertised times). If I wanted to check out the claims about the average number of trains per weekday and the average journey time, I’d need a better sampling method and more “weekdays” of data. But this sample does offer evidence to contradict the claims about “shortest” and “fastest” journey times.

Unless those terms still don’t mean what I think they mean, even when I reverse them 🙂