Searching for calendar information

With the community calendar service now live, I’ve got to do a bit more work to make it fully data-driven. Since I’m already managing the per-community feed lists and metadata on Delicious, I figure I might as well go all the way. So I’m keeping a list of the Delicious accounts that control each community’s calendar aggregator on Delicious too. Today there are three. The idea is that when I add the fourth, I won’t touch any code — or even configuration data — that will require an update to the running service. I’ll just bookmark a fourth Delicious account and tag it with calendarcuration.

But that’s merely an administrative convenience. Much more critical, at this point, is to help curators find machine-readable calendars in their communities and — since most of the calendars that might exist don’t — also show people how they can easily create them.

I got a running start when I bootstrapped the Ann Arbor instance, thanks to Google Calendar. I searched for Ann Arbor there and found a nice list of iCalendar feeds. But that search feature is, at least for now, gone.

Several curators have tried searching the web for .ICS files (e.g. filetype:ics), but that’s not very productive for a couple of reasons. Where iCalendar resources do exist, they often aren’t exposed as files with .ICS extensions. But more importantly, relative to the number of iCalendar resources that could exist, very few actually do.

So I thought back on how I bootstrapped the original Keene instance. A number of the events there are recurring events that were advertised on the web, but not in any structured format. I found them one day by doing web queries like:

"first monday" keene
"every thursday" keene

There’s no fully automatic way to convert this stuff into structured calendar data. But it’s pretty straightforward to fire up a calendar program, enter some recurring events, and publish a feed. The advantage of recurring events, of course, is that they keep showing up, which is very helpful if you’re trying to build critical mass.

So I’m now envisioning a pair of tools to help curators do this more easily. First, I’d like to have each community’s aggregator running a scheduled search that helps the curator be aware of calendar-like information that could be upgraded to actual calendar data. Second, I’d like to provide a tool that partly automates the cumbersome data entry.

I’ve done an initial version of the search tool, and an example of its output is here. I’ll attach the code to the end of this item, for those who care, although I expect that if it winds up being useful to curators, most will appropriately not care, and will only want to scan the links now and then.

It may be interesting, over time, to try to evolve this into a robot that makes sense of the calendar information that people actually write, as opposed to the information that calendar programs constrain them to produce. But meanwhile this hybrid approach seems like a way to make progress.

HOW

I did this tool in two parts. The kernel, so to speak, is in C#, because for now that’s the most practical way to write Azure services and applications. But the application is in IronPython, because the search function doesn’t yet need to be hosted on Azure, and IronPython is a really flexible and convenient way to experiment with the kernel.

The C# piece uses James Newton-King’s Json.NET library because JavaScript interfaces are now the preferred way to search programmatically. It’s been a while since I’ve done this kind of thing. Used to be, the REST APIs were easy to find. But now, since those interfaces are mainly intended for use by JavaScript objects embedded in web pages, I had to do a bit of spelunking.

One of the interesting things about Json.NET is that it includes an implementation of LINQ for JSON. That’s why you see the “from … select” syntax, which extracts an enumerable list of URLs from the JavaScript results returned by the search services.

Related

Published by Jon Udell

Post navigation

7 Comments

The people over at FuseCal have already done all this work for you. http://fusecal.com. You just plop in a URL that may contain some kind of structured event data, and it generates an ICS file from it.

It’s been really slick in my experiences.

They said they reached out to you a year ago and you said you were angry that they had to exist :)

Mykel, thanks for mentioning fusecal; I’d not heard of that service before. So, since I’m working on calendar curation for Huntington, WV I decided to try a bunch of the calendars I had found and so far I’ve successfully gotten it to parse/scrape five different calendars.. I’m going to try for some more.