R: Weather vs attendance at NoSQL meetups

A few weeks ago I came across a tweet by Sean Taylor asking for a weather data set with a few years worth of recording and I was surprised to learn that R already has such a thing - the weatherData package.

Now we can see there's not really that much of a correlation between temperature and month - in fact 9 of the months have a very similar average attendance. It's only July, December and especially August where there's a noticeable dip.

This could suggest there's another variable other than temperature which is influencing attendance in these months. My hypothesis is that we'd see lower attendance in the weeks of school holidays - the main ones happen in July/August, December and March/April (which interestingly don't show the dip!)

Another interesting thing to look into is whether the reason for the dip in attendance isn't through lack of will from attendees but rather because there aren't actually any events to go to. Let's plot the number of events being hosted each month against the temperature:

Here we notice there's a big dip in events in December - organisers are hosting less events and we know from our earlier plot that on average less people are attending those events. Lots of events are hosted in the Autumn, slightly fewer in the Spring and fewer in January, March and August in particular.

Again there's no particular correlation between temperature and the number of events being hosted on a particular day:

There's not any obvious correlation from looking at this plot although I find it difficult to interpret plots where we have the values all grouped around very few points (often factor variables) on one axis and spread out (continuous variable) on the other. Let's confirm our suspicion by calculating the correlation between these two variables:

> cor(merged$events, merged$Mean_TemperatureC)
[1] 0.0251698

Back to the drawing board for my attendance prediction model then!

If you have any suggestions for doing this analysis more effectively or I've made any mistakes please let me know in the comments, I'm still learning how to investigate what data is actually telling us.