Saturday, December 31, 2011

So as I mentioned, I was writing a post on how to look at speed data at Trackleaders when it occurred to me that the situation might actually be somewhat more complicated than I thought. Or maybe not. Basically, I wasn't sure how they were calculating speed. Speed is a function of time and distance, and while "time" is pretty straightforward "distance" actually isn't.

The simplest way to calculate distance would be to take two GPS readings and calculate the straight-line distance between the two. A somewhat more complicated way to calculate distance would be to take two GPS readings and calculate the trail distance between the two, based on the trail projection shown on the tracking page. I had assumed that they were using the former when I realized that they might actually not be, so we did some poking around to find the actual GPS data they were using (it was buried a few layers down but Chris has awesome web app-fu) and did some arithmetic to see if what we were getting was matching what Trackleaders was showing (with a brief diversion caused by not taking into account that the track points were numbered starting at '1' and the data array was 0-based - you're never too old or too experienced for off-by-one errors). Ultimately what we found was that the speed calculations on Trackleaders were based on straight-line distances between two GPS points. Note that because a line, by definition, is the shortest distance between two points,

The actual distanced traveled by the dog teams will always be longer than shown on the tracker

Therefore, teams are always traveling somewhat faster than is shown. How much faster depends on how straight the trail is - if it's straight the tracker is going to be closer to the actual speed and if it's not it's potentially quite a bit slower than the actual speed. Here's a case in point (no pun intended):

This is Kristy Berington on the Susitna River, with GPS points 118 and 119. Here's a sort-of digression but not really - Trackleaders draws straight lines between GPS points - it help visualize the track but it absolutely must not be construed as the track. Every year there are questions about this - no, the musher almost certainly hasn't gotten off the trail and hasn't gotten lost, those are just lines between data points. Remember, when in doubt trust the data from the GPS more than the data summary, whether it's the leaderboard or lines between track points. We really don't know where the dog team actually was between track points, although it's a pretty safe bet that they were on the trail. Trackleaders gives the speed between those two points as 5.3 mph, based on the assumption that her team stayed on the straight line between points 118 and 119. We can, however, take a look at the image and understand that she almost certainly stayed on the river, traveled a greater distance than is shown here (they say 3.02 miles), and therefore was traveling faster than was displayed on the tracker. The straighter the trail the more accurate the speed given by the tracker will be, and the curvier the trail the more inaccurate the speed given by the tracker will be.

Note that this has implications for projections. Because the tracker is consistently underestimating speed at least a little bit, it's probably the case that mushers will tend to arrive at a checkpoint earlier than projected (taking into account things like camping, rests, etc.). It will be interesting to watch this during the Knik 200 and the Copper Basin 300, two pretty big races coming up in the next few weeks.

Trackleaders.com is doing a great job with some pretty sparse data. Collecting all that data and figuring out how to aggregate it in a way that makes what's happening on the trail easier to comprehend is a hard, hard problem and I think they've done a good job. But there are limits to what you can do with messy data and I hope that this discussion is not understood as criticism in any way, but as a tool to help fans understand a little better what they're seeing on the trackers.

As mentioned in an earlier post, most races providing online tracking are using Spot devices to collect and uplink the GPS data and Trackleaders to display the data on maps. Trackleaders is actually doing quite a bit more than just showing the tracks, and part of the reason that the tracking shows the quirks it does is because there's only so much one can do with certain kinds of data. This post started out as a discussion of where team speed data come from and how seriously to take it but that turned out to be a much more complicated question than I expected, so I'm moving that discussion to a subsequent post. This one is just focused on what Spot actually does and the data it provides to trackers.

Spot Satellite GPS Messengers are simple GPS units with a satellite uplink. They've got a very simple user interface: no display and a small number of buttons used to turn on tracking, send a check-in message, or send a custom message that can be programmed into the owner's account through the web interface. There are also two covered buttons for emergency use: an "I need help" button that sends a message to the individual(s) of your choosing, and an "I need help" button that sends a message to the GEOS International Emergency Response Center. There's no display. They'd be nearly foolproof if fools weren't so incredibly resourceful.

The tracking function has the Spot upload data about current location to a server owned by Spot (the company) every 10 minutes. The interval is not settable, and really needs to be fairly large because radio operations (i.e. the uplink) chew up battery. Because this process is completely opaque to pretty much everybody except for Spot we can't know what it's actually sending. However, the company makes Spot data available through what's known as an API, or application programming interface, and that's how software developers, hobbyists, and various other sorts of nerds are able to develop applications that use the data. A description of their API is here. They've defined their own XML schema (it's a data format - let's just leave it at that) and they use it to expose the data. So, a developer who wants to use the data sends a request to Spot saying "give me the data for whatever" and gets back a chunk of XML that looks something like this:

In this example I took my Spot, put it into "Track" mode, and set it out on a rock or a car or some other large object with a clear view of the sky for a few minutes. You'll note that the latitude and longitude in the messages remains unchanged - that's why. I just wanted to collect a few messages to show the how the data are presented to the programmer.

So basically what we've got is a set of three message, each set off by a "message" tag. Each message contains identifying information, message type, the time at which the data were collected represented as an ISO 8601 string (programmers love this stuff), the time at which the data were collected represented as the number of seconds since 00:00:00 UTC, January 1 1970 (yes, really - programmers really love this stuff), and the latitude and longitude. All told it's really not that much data, and the astute reader who can tolerate looking at XML or who noticed the post title may have noticed that one key piece of data for race tracking is missing: speed. And that's the topic of our next blog post.

Friday, December 30, 2011

A growing number of distance dogsled races have been using GPS tracking devices to allow fans to follow along from home in near-realtime. I think this has probably been the main driver behind the incredible growth of mushing as a spectator sport, since you can't really be a spectator without something to spectate.

Nearly all races that provide online tracking are using inexpensive Spot trackers combined with a free online service called Trackleaders. Spot trackers are GPS units with a satellite uplink to allow the sending of a small number of pre-programmed messages. They were originally intended to be used as a way for people in remote backcountry who get into trouble to signal for help, but their use has grown tremendously to encompass a huge number of other uses. I hit the "Send message" button on mine when I catch a fish on the Chena, as a way of mapping how many fish I'm catching in which locations. The Spot also has a tracking option, to allow you to uplink location information every 10 minutes. You can easily come up with similar stuff.

Anyway, the Gin Gin 200 is winding down, and as usual a lot of people found the tracking to be confusing, compounded by problems with data not being uploaded and a very, very unusual occurrence: some people appeared to be running the race backwards.

So here's what happened: a portion of the Gin Gin trail was a large clockwise loop, with a checkpoint at the far end of the loop. The mushers were to run west down the Denali Highway to the Maclaren River, turn south on the river, turn north on the Susitna from its confluence with the Maclaren, get back on the Denali Highway and head east a short distance to the checkpoint at the Alpine Creek Lodge, do their mandatory layover, and continue east on the Denali Highway to the finish in Paxson. Unfortunately high winds and deep snow had made trail conditions on the rivers extremely challenging and a few mushers chose to scratch but to continue to run west on the Denali Highway as a training run. This meant that they were running in the "wrong" direction, with their trackers still sending out location data.

The other problem is that there were a lot of missing data - data not being uploaded had to do with the GPS antennas not having a clear view of the sky. The race organizers said that they were experimenting with the placement of the Spots on the mushers and it looks like it didn't work out very well. The result was that data were often hours old, or more.

Here's what the leaderboard at Trackleaders looked like a few hours after the last finisher arrived:

[Note that the times are being shown relative to the start of the race so, for example, Brent is shown leaving Maclaren Lodge six hours and 37 minutes after the start of the race rather than at 6:37. I believe this format is useful for race officials but confusing for spectators.]

This table was generated from the GPS location data provided by the Spot devices. There are several odd things here, the oddest of which shows Paul Gebhardt out of Maclaren Lodge, past the confluence, off the Su, and into Alpine Creek Lodge all at the same time. You see similar problems with the data from people who scratched but ran to Alpine Creek. I believe that while there are different proximate causes in the two cases, the root cause is the same: poor handling of problematic data by the software developer. In Paul's case his GPS was basically not working at all until, I think, someone at Alpine Creek Lodge reset the device or otherwise fixed the problem. In the case of someone running "backwards" (they don't show up in this screen grab), the situation probably looked similar in terms of the data - the dog team should have been through, wasn't, and so they just filled in the timestamp from the last Spot uplink.

There's always so much confusion about the tracking data, particularly when data are late or missing or someone's doing something wacky (just for fun, a screenshot from last year's Copper Basin. The musher had left his Spot at a checkpoint and someone put it in a car and was driving it to the next checkpoint. This led to some confusion among fans. Me, I wondered what Johannes was feeding his dogs.)

I think in general it's safe to assume that if it looks weird it probably is, and if you're wondering what's going on good rules of thumb are:

Check the time of the last update

Trust the GPS data more than a summary of the GPS data (for example, if you can see where someone is on the map and the data are current, but a leaderboard claims they're somewhere else, trust the GPS data)

Sturgeon's Law says that 90% of everything in crap. When it comes to software we're well north of 95%, so if something looks peculiar consider the possibility of a programming mishap. In this case the programmer developing the leaderboard table did not make particularly good decisions about what to do when data are missing

Thursday, December 29, 2011

The use of social and other media have turned dog mushing into a spectator sport. People from all over the planet are watching even smaller, local distance races using accessible technologies like Facebook and GPS trackers. For the race committees, getting the data online in a timely way is sometimes challenging because

the races are often held in remote places with poor-to-no infrastructure

the people organizing the races are not necessarily particularly technical, and need to stay focused on race logistics while the race is underway

the software generally blows

Combine this with an audience that isn't necessarily very comfortable with the technology themselves and who typically aren't mushers, and you tend to get a lot of confusion and fans asking kind of clueless questions[*].

My hope is that examining some of the issues around technology, data, and the user experience can help improve the way race data are presented and increase fan participation.

I'm also interested in other technical aspects of dog mushing. Like, Matrax vs. Rex runners - what's up with that?

[* or maybe not - at the moment the 2012 Gin Gin 200 is underway. Bad trail conditions have knocked out about 1/2 the field and the trackleaders.com GPS tracking is working worst than not-at-all, yet fans seem to be taking it pretty much in stride. Could be that they're used to it or it could be that this race isn't drawing naive fans. Dunno]