R

My company puts on a year end function every year. It’s at some resort or other, and the important thing for this post is that we’re not told the location in advance. We find out when we get there (by bus).

What we are told, about a month ahead of the event, is approximate distances from 3-4 locations. These are where the bus pickup sites are. The locations are:

Head Office

Near Clearwater Mall

Fourways

Centurion

The distances given aren’t correct. And, as a result, there’s usually several attempts by various people to figure out where the year end function will be in advance.

I thought I’d join in this year, using some machine learning on those distances.

Now, I should mention that this is a very poor use for ML. Mainly because of a lack of data. I should have hundreds of data points for a decent prediction. I have 2 or 3 data points, for 4 different locations. Still, it’s what I have to work with.

First, the starting data. The distances for this year are:

Clearwater mall: 63 KM

Centurion: 56 KM

Fourways: 43 KM

HQ: 20 KM

Cape Town: 1447 KM

I’m going to ignore Cape Town for training, as it only had a distance previously specified in 2015, and so I only have one piece of data.

Plotting this on a map makes it clear that the distances have been ‘massaged’ (I’m plotting ‘as the bird flies’, not driving distance for ease of plotting, I’ll use driving distances for the training)

Now to stick those into a linear regression and see if I can predict the error on this year’s measurements.

I need to mention that with so little data, the accuracy of the linear regression is going to be very low. I’m as likely to get the correct results from linear regression as I am to get correct results from rolling a couple of d20s.

That said, onwards to untrustworthy results.

Once the starting values are loaded into R, creating a simple model is as easy as