The problem with King County Metro’s real-time data is a complex one. It involves the combination of two vehicle location systems (the old odometer based system and the new GPS based system) and the translation of data from those systems into a format that OneBusAway understands.

OneBusAway doesn’t do its own arrival prediction–instead, we rely upon data from others, who in turn run their own or commercial software. This arrival prediction data comes from the agencies themselves for buses that use GPS; and from MyBus for buses using the older AVL system.

Metro has, a little belatedly, it seems, decided to try to remedy the “garbage out” issue rather than wait for the GPS conversion to finish up. Rose told Oran that “work is underway to address the problem with meetings between OneBusAway and Metro engineers.”

UPDATE: Good news! Here are some results of that “work underway,” per Rose:

1. The schedule data that I just pushed for tonight’s commute seems to be the best we’ve had since the February “shakeup.” In particular, southbound route 60 trips are back after a… holiday… of more than a month. And so are a bunch of “DART” trips.

2. The creator of OBA–Brian Ferris [a saint among men-ed.]–has found and fixed some problems with double-reporting. Also landed today.

3. The schedule data will soon be more reliable, thanks to changes at KCM in how it’s going to be generated–soon, and surely by the next “shakeup.”

UPDATE: I got to wondering how many passengers, and asked Rose at OBA. He says that, monthly, 100,000 people use OneBusAway on their mobile phones. As a general rule, of course, you wouldn’t want to make 100,000 people miss their bus.

Metro spokesperson Linda Thielke, commenting on Seattle Transit Blog, argues that this isn’t all Metro’s fault. “[W]e cannot blame having two streams of data for all the problems users are currently experiencing,” she writes. I emailed to ask what we could blame the problems on, noting that things seemed to take a turn for the worse about six months ago. Thielke replied:

[Y]es, you are right about things getting worse last October. On our end, we had a very big service change with multiple data loads. Not every load was perfect, so updates had to be sent out. On the receiving end of those data streams, the app developers/maintainers need to manipulate the data and update their prediction formulas.

This appears to point to the concerns that Metro chief Kevin Desmond, in 2009, had about “grassroots” bus trackers like OneBusAway: whether they would reliably be able to interface with Metro’s data, whether people would treat them as Metro’s responsibility, and whether widespread adoption of a particular non-Metro bus tracker would make the ownership question moot. If 100,000 of your customers use OneBusAway, you really, really want it to work well even if you don’t own it.

In its January/February 2012 newsletter, Metro announced a 13-month contract with the University of Washington to support OneBusAway. Metro, Pierce Transit, and Sound Transit each put in $50,000 for the project. (Their estimated OBA population was 50,000 bus riders per day.) So we can hope that optimization is already in progress.

*For the truly wonky among you, Rose spells out exactly what the challenges are below:

The big challenge on the real-time data has been integrating data from the trips on vehicles with the legacy odometry-based hardware with data from the trips on vehicles with the newer GPS hardware.

For the legacy trips, OBA depends upon a service called MyBus, which, along with the UI known as BusView, was created about a decade ago by industry pioneer Dan Dailey and is still in use today– though by far the majority of end-users are using the OBA interface.

In order for BusView to report on the location of vehicles with the new hardware, MyBus consumes “legacized” GPS data. That leads to a non-ideal situation where the same trip, if GPS, is reported twice to OBA: once from KCM, and again from MyBus. When OBA tells an outright lie about the position of a bus, or when a bus appears to be in two places at once- it’s because that process has failed.

Now, we need good schedule data, too. If–for example–a trip is in the schedule data but never leaves the base, OBA is still going to report on it. That’s because real-time data is a sometimes thing. We judge it better to report the schedule data to the end user than to ignore a trip that isn’t generating positional records.

In contrast, if we get reports in the real-time stream about a trip we can’t identify in the schedule, we keep silent about it. What could we say?

We can look at the real-time data and see that there are problems with the schedule data– some percentage of the locations that are being reported are for trips that don’t exist. You can see it from the other end, too– some percentage of the arrival predictions are labeled “scheduled departure” instead of “late,” “early,” or “on time.” The more of those, the worse.