I’m being urged to get my act together regarding my masters thesis. I have a set of datasets I know I want to explore but I need to find a question of sorts that I can quite thoroughly answer with them. I also need to decide what type of person would be good to oversee this project — the ‘committee’ and whatnot. As I so often do, I’ll use you anonymous readers as the spur to set my thoughts to bytes and thereby make rigorous my abstractions.

SO: My dataset is real-time transit data feeds. I don’t care what buses are doing right now unless I’m waiting for them — I care what patterns they’re scratching into our lives. I’ve already demonstrated a Python script that will make random requests from a real-time API and store the results. There exist comparable API’s from other agencies that this script can easily be adapted to. As many agencies as have APIs I could squirrel data from. That’s the dataset or set thereof.

My question has been more difficult to discover. I have so many! Here are a few:

What is the distribution of delay? How does it vary? Spatially, temporally?

What kinds of lines/agencies/times have non-random, systematic delay?

How does the delay spread of ‘good’ transit systems compare to that of ‘bad’ transit systems and what might explain this?

Good scheduling should minimize systematic delay: what sort of delay remains after that and what might riders learn from it? How should they learn to best accommodate this delay?

What relation does frequency have to delay? At what service frequency can we say quantitatively that schedules should be abandoned and headways maintained instead?

What is the accuracy of arrival time predictions? What margin of error exists around predictions at various space-time distances?

I suppose the first question is probably my best shot. Though #5 is certainly intriguing. Now on to the lit review I suppose? *deep breath*

—

And then the committee! Beside my adviser, who is a regular transit user and quantitative geographer, I want another statistician/data-person, and this shouldn’t be too hard to find. I also want someone really good at graphic communication. For that latter, I want someone from DAAP. But I want to be sure that they don’t think or feel or act as though I’ve invited them to proof my presentation while others address it’s content; content is inseparable from presentation. Form does not follow function; rather both form and function must mirror each other. If I fail to make that happen, I will have miscommunicated or misunderstood my project.

—

Oh dear readers, what would you want to know if you knew, as I may, where all the buses are all the time?

Huge thanks to John Back of OpenDataCincy for hosting all these gigabytes and Dave Walters of the CTHA for providing the data. If any of you have old schedules or maps that you don’t see in this listing, Dave would like to borrow them! Leave a comment and I’ll put y’all in touch.

For all those anxiously wondering what exactly service will be like after SORTA’s service changes take effect this month, the answer arrived last night in the form of both PDF schedules and a new GTFS feed. Woot!!

SERVICE CHANGES: COMING AUGUST 18th 2013. Don’t get caught waiting for a bus that isn’t coming.Check your schedules.

—

In related news, I’ll should some more quantitative analysis of the new service plan here soon once I get around to it, including an updated frequency map to be available on the website. I was just waiting for the GTFS data for that one. And hopefully…some new analysis of ridership on a restructured system. Will more cross-town routes be reflected in relatively decreased boardings in downtown? I’m counting on SORTA to release that data from their automatic passenger counting system as soon as there’s enough time for the results to be statistically significant, perhaps around early to mid-September.

The night-time map shows transit lines that run after 10pm every single night, Sundays and holidays included. The graphic in the bottom right also indicates last departure times by direction and day of week.

Quick analysis: Downtown is the best place to have a few drinks. Not a big surprise there. Campbell county is not doing so well.

Since there are inevitably a few things to change with any new graphic like this, please leave suggestions in the comments. Road or city with the wrong name? Something not clear? Something that should be added? Let me know!