Thursday, June 16, 2011

testing Strava segment timing reliability

DC Rainmaker recently did an interesting study of the accuracy of GPS units. He mounted eight different GPS computers on the handle of surveyer's measurement wheel then walked, ran, or rode various courses, comparing the measured results (Part 1, Part 2). The GPS units tended to disagree on how far he'd gone, although usually the results were consistent with the claimed positional accuracy of the units.

I suggested he use the data to test Strava reproducibility in segment timing. No luck there, but I did find that a friend of mine has been in the habit of riding with a Garmin Edge 500 mounted alongside a Garmin Edge 800 on his rides. We have a mutual friend who works for Garmin, so he's doing this to compare the two.

I asked him for data from three of his rides. I then created two new accounts on Strava and uploaded the data from his Edge 500 into one, and from the Edge 800 into the other. The new accounts are necessary because Strava rejects what appear to be duplicate rides from the same account. I made these rides private to avoid contaminating the historical record, since he'd already uploaded data using his personal account. Then I created a spreadsheet with all of the segments for each of the three rides.

The Edge 800 data yielded 58 matched segments, while the Edge 500 yielded 56 matched segments. The two segments matched to the Edge 800 data but missed in the Edge 500 data were "West Alpine Road Start of Climb" and "West Alpine Road Portola State Park Road to Finish". Obviously there had been an issue with the Edge 500 data on Alpine Road. However, the Edge 500 data did trigger the "West Alpine Road Alpine Creek to Peak" segment. West Alpine Road is a relatively complex climb and has had an extraordinary number of segments defined for it: of the 56 total segments matched to both data sets, nine are on Alpine Road. And these are in addition to the two Alpine Road segments which were assigned only to the Edge 800 data.

For each segment I subtracted the claimed time for the segment derived for the Edge 500 data from that derived for the Edge 800 data.

Before I show the graph, I should add I expected some difference. Garmins sample at one or two second intervals. Strava interpolates on these data points, but interpolations can only do so well, so I'd expect an error of around 1 second on the start time and around 1 second on the finish time, so even if everything's perfect, an error of ±2 seconds is about the best I'd anticipate.

There's other errors, of course. The GPS signal is only good to around 10 meter accuracy. But these are units with essentially the same electronics and the same algorithms looking at a signal within 10 cm of each other. So while the general positional error of the GPS signal should add up to around 10 meters of uncertainty to the start and stop position, since this positional error should affect both computers close to the same. But since bikes move at around 5 meters per second up hill, a 10 meter error at either the top or the bottom along the direction of travel could create another two seconds or so of variation in the segment timing.

Then there's the problem that the segment was defined with data which was also subject to noise. You'd like to believe there's an imaginary line across the road defining the start and end of a segment but the reality is the virtual line, even if your GPS is perfect, is slanted. So if your position in the road varies, or if the GPS signal varies your trajectory to the left or right, that will affect at what point you intersect these virtual start and finish lines. This could be another two seconds or so, similar to the error from longitudinal position error, off the start and finish. But again this error should be relatively smaller because we're considering two GPS units on the same handlebars at the same time.

So worst case I have the following error estimates for ride-to-ride variation:

1 second at start due to sampling time

1 second at finish due to sampling time

2 second at start due to longitudinal position errors

2 seconds at finish due to longitudinal position errors

2 seconds at start due to transverse position errors

2 seconds at finish due to transverse position errors

I assume these errors are uncorrelated so I take the root-mean-squared-sum and get around 4 seconds typical variability for ride-to-ride variations, but less than that for two GPS units mounted on the same handlebars on the same ride... let's say 2 seconds.

So what's the data show? Here's the results:

If I look at the mid-range of the distribution, my estimate was spot-on: errors are typically between ‒2 and +2 seconds, without evident bias between the two data sets. However, the devil here is in the tails. A significant number of segment timings have far worse errors.

These segments, it turns out, are all either on West Alpine Road or on Old La Honda Road. The top of Old La Honda Road, in particular, is notorious for terrible GPS signal quality due to the trees and terrain creating confusion from signal reflection. But Strava's algorithm is relatively forgiving, and so assigns segment times anyway.

Here are the worst offenders where the Edge 800 reported shorter times:

This one's even better -- the West Alpine segment has a whopping 48 second disagreement. It's as if the Edge 500 had dropped the 800 with enough of a gap to get out of sight on those final turns... And curiously Old La Honda data actually appears at both ends of this range, demonstrating what a problem child Old La Honda can be.

So it may be on most segments the Garmin-Strava link does fairly well: within a handful of seconds. But on problematic segments the error can be profound, enough to radically change rankings.

Perhaps Strava should tighten up the criteria by which it considers rides to be a match to segments. This would result in users complaining that they'd ridden a segment but not gotten credit. But on the other hand it would improve the integity of the KOM rankings for these difficult segments. An alternative would be to flag marginally matching data on the rankings, so it becomes clearer that the results are questionable.

On the sampling size... I was basing that on the default for the Edge 500, which is two seconds without power, or one second with power (IIRC): you might be right it's possible to set it longer, or might be longer with other units, but the 800 is similar to the 500.

Hi Dan, this is Mark Shaw from Strava. Great post! Your data-set is more extensive than the ones we've tested with in the past - I'd love to use it internally.

I believe a sizable factor in the deviations are the start/finish points of the underlying data behind the segment (the segment data is just the data from the original ride from which the segment was defined). The ends could be a bit off the normal path of the riders, so the matching is more irregular.

We also have issues when segments start or finish points are on tight switch-backs, in which case your start or finish could match to similar point further up or down the road.

The good news is that we're working on both of these issues. Eventually we'll have better tools to define start and finish lines on segments, beyond just finish points, allowing more accurate interpolation of the actual start/finish points on your rides. We also have some ideas on how to address the switchback issues.

Of course, as you point out, given the accuracy of GPS, it will never be perfect. I'm not sure we'll ever be in a position to know who pipped who at the finish line of a segment! The +/- 2 seconds is likely a reasonable target.

Hey Dan, Just imagine the world's troubles you'd resolve if you used that big ol noggin to solve non-white whine problems.

Ultimately, if people get that bent out of shape about a potential data error, you should be getting paid for your result, or you should just ride faster and worry more about when Chris Phipps will take your segment anyway...

Troy: I prefer solvable problems, not banging my head against hard and painful walls. We need to see the positive in the life we have, to satisfy our actual desires and needs as physical (not virtual) beings, and bicycling is a great way to do that as well as to encourage preservation of what's truly important. So don't get hung up over pedagogical notions of what's "important": it goes beyond world hunger and curing disease.

To make things more interesting, there is a significant time difference between the Strava iPhone app and the Garmin Edge 500. One trip up OLH showed a :40 difference between the two for the same ride.

I was wondering if there are any significant difference between a garmin and the strava app for the iphone. Today my freind and I got in an argument, and I thought that both gps connections come from one satelite, and he said that garmins have stronger satelite recievers which is why they do not lose connection and have more accurete times.

iPhones have particularly poor antennas since the antenna was added to the design as an after-thought. Other phones where the antenna placement was given priority are better. Phones have the advantage they have two options for position determination: GPS and triangulation off cell towers. The GPS is better, which is why phones have GPS chips and antennas, but the cellular network provides an advantage to the phone by (1) letting the phone know which satellites to check for near the proximate cell tower at that particular time, (2) helping with processing of the GPS signal. This second feature I don't really understand, but it means you should do better with your phone connected to the network than with the phone in airplane mode. My Droid Incredible seems to do fairly well in limited tests in airplane mode, as well as doing well in standard mode.

I think the Android app on a phone with good GPS will produce distances as good as the Garmin, yes. The iPhone app version 1 has had problems with distance. I believe version 2 of the iPhone app will improve this, but since version 2 isn't out yet, can't say for sure. I don't know if the iPhone 4S GPS is better than iPhone 4: that will be interesting to know.

Arriving a bit late to the party here. One problem with tightening the qualifying criteria for matching a segment is that people have a great ride, which doesn't match anything existing, so they create a new one. Thus we have extraordinary duplicate segment proliferation, as seen on both OLH and Kings.

On this morning's ride up Kings, I'd love to believe the Strava report of 25:59, but my manual lap timer shows 26:17. That's a huge difference for timing points that shouldn't be tough to define (Tripp Road, which I assume someone set as the sign itself on the right-hand side heading up Kings, and... what? Looking at the segment, it appears to end a bit prematurely.

http://www.strava.com/activities/76157048#1521397209

And that explains the shorter Strava time. So do I add to the confusion and create an official Tuesday/Thursday-morning segment?

I have Perl scripts I use for timing certain weeks of the Low-Key Hillclimbs where it's impractical to do organized hand-timed events. There I use a different model, hardly original: I use lines instead of points and interpolate the time the GPS track passes through the line segment. I then hand-place key checkpoints along the way to make sure the rider did the whole climb. Lines provide for timing precision while allowing for GPS to deviate laterally. Points, used by Strava without interpolation, are a much blunter instrument. Of course my model requires more care in segment definition, but interpolation is something Strava could do now -- it would just take a few extra CPU cycles.