October 2009

Over time, our online pursuits generate a rich set of data points. The websites we visit, the articles we read, and the things we buy, they all reflect our personality. It can feel creepy when someone else analyzes this information, but when we explore it on our own we get to know ourselves a little better.

Me, I recently uncovered some trends in my e-mail habits. It turns out my mail client stores some message metadata — recipients, dates, and so on — in a local database. By mixing a little SQL and some quality time with R’s charting tools, I learned just how much nonsense I had sent out over the past four years.

Let’s start with the basics. This chart shows the total e-mails I sent out in that period, based on the time of day:

total e-mails sent, per hour

Here we see a ramp-up in the morning, followed by a brief dip around 6PM, then another quick peak around 8PM before it tails off for the night. (Those rare e-mails sent during the wee hours, I chalk those up to my travels through different time zones.) The lack of a midday dip hints at a person who typically works through lunch. Yes, that seems to fit me very well.

But we’re dealing with a lot of information here, so these are perhaps broad statements? It may help to slice the data by day to get a clearer picture of my habits:

e-mails sent per hour, broken down by day

The weekdays exhibit the same pattern of one large hump followed by a smaller, late-day peak. We also see some new details that were hidden in the other chart:

The thick magenta line represents Thursdays, and the dip around noon indicates this was my day to step away from my desk for a proper meal. While I took it easy on the weekends, I sent a relatively large number of messages after Sunday dinner. (Check out the peak at 8PM.) Was I getting a head start on the work week, or raving about some new restaurant I had tried that night? Hmm.

Slicing the data yet another way reveals even more details. The real eye-opener was the number of messages I sent per month:

total e-mails sent each month

See that spike there, between the two red lines? The one between December 2005 and January 2006?

That marks when I acquired my first Blackberry.

Fine, it’s time to come clean: I’m hooked. I’m a connectivity addict, and I now see why they’re affectionately called “crackberry” phones.

Granted, this is just a quick skim over a lot of data. I may see other trends if I were to separate professional and personal communication, and look at conversation (mail thread) counts instead of raw message counts. Additionally, the charts alone don’t tell us the whole story behind that spike in January 2006. (I may have seen the need ahead of time and bought the crackberry to keep up. At least that’s what I’ll say until proven otherwise.) Still, were I a sleuth, these charts would give me an idea of where to dig for more details.

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line. Thanks for reading.

Me, I always figured they were pretty mellow. It takes a lot of time and patience to knit something, right? Little did I know, hidden behind the handmade scarves and sweaters were killer instincts and nerves of steel. Sock Wars has forever changed my views of knitters.

Sock Wars is a take on the old Assassin game. In Assassin, you and your assigned target play contract killers. You rack up points by killing your target, then killing their target, and so on. The victor is the last one standing. The catch? You are someone else’s target, so you have to act quickly to rack up points.

I’ve seen Assassin games that use waterguns, touch-tag, and cameraphones to “kill” players. In Sock Wars you kill your target by … knitting them a pair of socks. Once your target receives the socks in the mail, they are considered dead and they send you whatever socks they had in-progress, along with their target’s details.

(You have to admit, if you’re going to get whacked, a fresh pair of handmade socks is a pretty nice consolation prize.)

When I first heard about Sock Wars, I marvelled over killer yarn for just a moment. Then I wondered aloud, “I’d love to see the data on that.” As it turns out, the kill data is available on the Sock Wars website. The data includes players’ locations (country or US state) so I knew I’d have a chance to play with R‘s mapping toolsets in addition to the standard number-crunching.

Digging In

When exploring a new data set, it helps to run some basic tests to get a feel for what’s going on. It’s like scanning a room before deciding who would make for an interesting chat. To that end, I churned the data into a usable form and fired up R to generate some pretty charts. I mean, descriptive statistics.

(The data shows that most of the active participants — 85% — are from the USA. So our analysis will focus on those members in the US.)

Killer States

Collectively, how deadly are each state’s killers? We can see that California killers did the most damage overall, taking out more than thirty knitters.

Sock Wars IV: Kills by State

That puts California head and shoulders above the rest of the states. Should I fear a west-coast knitter, then? Maybe, maybe not. The data show that California also had the greatest enrollment. Taking the ratio of kills to enrollment gives us a different view:

Sock Wars IV: Ratio of Kills to Enrollment

From this angle, California looks a lot less tough: its assassins took out roughly one knitter each. Texas knitters took out about two people each, while Maine and Missouri tie for the top spot at three kills per assassin.

Top Killers

Knitters from Maine, Missouri, and Texas have all proven deadly in a collective sense. What about the individual killers?

Here we see Texas had thirteen kills among six enrolled knitters … but a single person was responsible for half of that body count: the needles of Bustapurl sent many a knitter to their maker.

Sock Wars IV: Top Ten Killers

And this, kids, is why you don’t mess with Texas.

Now what?

So far we’ve focused on descriptive measures: counts, averages, and anything else that summarizes a lump of data at face value. Sometimes, though, people want inferential measures from their data: expected trends (forecasting), non-obvious connections, and anything else that will give an extra edge to their decision process:

If join the next Sock Wars tournament, should I tremble in fear if someone from Missouri gets my name?

If I run a yarn store, should I stock up when the next Sock Wars tournament begins? and if I run a web-based yarn distributor, will an offer of free shipping send me into the poor house?

If I’m assembling an all-star team of knitting assassins (“assassiknitters?” “knit-men?”) should I search Texas for a heavy-hitter? Would I be better off with a large team of Californians?

It’s tempting to draw conclusions from the existing data, isn’t it? The charts hint that we should be worried about solo killers in Texas and Seattle, and groups in California. But let’s face it, this is only a single data set. We can’t tell whether it represents future trends or whether there are a bunch of freak incidents lurking among the numbers.

Other than waiting to collect several years’ worth of data, what can we do? We could look for other data, for which we have plenty of history and which correlates with the knitting kill stats. It’s still a long shot, but that may just give us deeper insight into future Sock Wars competitions.

This being a game of killers, I compared the Sock Wars data against murder rates for 2005, 2006, and 2007. (Those are the most recent years for which the data was available. The data seems roughly consistent from year to year, so we should be able to use this as a rough estimate for 2008 and 2009.) More specifically, I took the number of murders in each state, along with the Sock Wars kills by each state, and scaled the numbers so they would all fit nice and pretty on the same chart.

The results?

Shocking.

Sock Wars IV: Knitting Kills vs Crime Stats

The thin lines represent the (scaled) crime data, while the thick red line is the (scaled) Sock Wars data. For the most part, a (relatively) greater number of murders in one state corresponds to a greater number of Sock Wars kills by assassins from that state. To put this in more technical terms, the correlation coefficient ranges from 0.76 to 0.80. The maximum value for a correlation coefficient is 1.0, so we have a reasonably strong match.

Now before you get too excited, notice that the crime stats correlate with the collective (aggregate) headcount for each state’s killers. From this we can possibly infer that, for example, if we have to kill a lot of knitters we hire a lot of assassins from California (and hopefully they’re cheap). Along those same lines, we can say that the Sock Wars knitters from Maine and Missouri should, collectively, be able to inflict a lot of damage.

That said, if I’m shopping around for one superstar knitting assassin, the crime data isn’t quite as helpful. We’d be better off trying to correlate against several other data sources as well as multiple views of the data — say, the ratio of killers to enrollment. Were I a betting man and this were all the information I had, though, I’d take it.

With That In Mind…

So the next time you see someone knitting, ask yourself: is this person a mild-mannered citizen, just passing the time? or are they a cold-blooded killer? Whatever you do, try to avoid eye contact and sudden moves.

Have some interesting data you’d like us to check out? Need our help making sense of your company’s data? Please drop us a line.

Sure, data can be numbers in spreadsheets and college classes you’d rather forget. It can be dry. But it can be more than that.

“Collecting data” is just a formal way of saying, “I noticed something.” You probably collect data without even realizing: The number of passengers on your (very crowded) flight. The time it takes you to get to work, based on which route you take. The number of times you tell your kids “no” before it sinks in.

Formally or informally, consciously or subconsciously, we analyze this data to assess what’s happened or to make an educated guess as to what the future holds.

People, businesses, government agencies, we all use the numbers as our guide. Analyzing data may yield that inside scoop that’s invisible to the naked eye, that little extra help you get ahead or have a private laugh.

Here at LocalMaxima, we’re all about the data, too.

Welcome to our website. This is our platform to share our adventures in data analysis. We’ll make some pretty charts, draw some conclusions, and occasionally take an irreverent look at the data around us. We’re essentially thinking out loud but you’re welcome to listen in. Feel free to pay us a visit now and then, or subscribe in your preferred RSS reader.

Thanks for stopping by, and please come back soon.

RSS Feed

LocalMaxima updates at random an on occasion. You may prefer to follow the RSS feed so you'll know when there is new content.