Does Pleiades have an API?

2011-12-01T21:41:10Z in pleiades, data

This is a becoming a frequently asked question, and as I work on the definitive
answer for the Pleiades FAQ, I'll think out loud about it here in my blog. Does
Pleiades have an API? In truth, it has a number of APIs, some good and some bad.
Does it have a HTTP + JSON API like all the cool kids do? No. Well, yes, sort of.

Before I get into tl;dr territory, I'll write down one of the guiding principles
of the Pleiades project:

Data is usually better than an API.

It's not that we're uncomfortable with interfaces in Pleiades. Our application
is based on Zope and Plone, so you know it has all kinds of interfaces under
the hood. I'm even a bit of a geek about designing nice APIs (see also Shapely,
Fiona, etc). It's just that data is better ... usually.

By "data" above, I mean a document or file or sequence of bytes containing
related information, in bulk. The entire text of a book, for example, is better
to have than an API for fetching the N-th sentence on page M. All the
coordinates of a simple feature linestring (as GeoJSON, say) are better to have
than an API for getting the N-th coordinate value of the M-th vertex of a line
object. Given all the data, we're not bound to a particular way of indexing and
searching it and can use the tools of our choice. APIs are typically chatty,
slow and pointlessly different from others in the same line of business. Subbu
Allamaraju goes deep into the trouble of working with inconsistent systems in
"APIs are a Pain" and with more hard earned wisdom than I have, so I won't
pile on here. Data is better ... usually.

An API, and here I mean "web API", can be better in the following and probably not
exhaustive list of situations:

Sheer mass of data making dissemination practically impossible

Rapidly changing data making dumps and downloads out of date

Desire to control access to individual data records

Desire to monetize data (ads, for example)

Desire to impose a certain point of view

Desire to track use

Tracking use lets us tweak the experience of users. "People who viewed record
M might also be interested in record N" and the like. It doesn't have to be
nefarious tracking, just nudging users into useful and mutually profitable
patterns. Only one of these situations is very relevant to Pleiades and so
we're not designing APIs to sort them all out like other enterprises must. The RDF
and KML serializations of the entire 34,000 place Pleiades dataset are not
large by modern standards and don't change very rapidly. An application (like the
Pelagios Graph Explorer or GapVis) that
fetched and cached them once a day could stay quite up to date. The number of
Pleiades contributors is growing, but they are primarily enriching existing
places; I don't expect Pleiades to ever become so large that those files
couldn't be transferred in less than a minute on a good internet connection.
We control access to data that's in development, yes, but the locations, names
and places that pass through review into a published state are completely open
access and not private to any individual user or group of users. In only one
part of Pleiades are we concerned about controlling a narrative through an API:
the slideshow that plays on the Pleiades home page uses an API that stumbles
through the most recently modified places and progressively mixes in more
randomly selected ones.

Instead of fancy APIs, then, we have boring CSV, KML, and RDF downloads. The
shapefile format, by the way, is inadequate for our purposes. Information will
be lost in making a shapefile from the Pleiades model (any number of locations
and names per place) and we're going to let people decide for themselves what
to give up if they want this. The downloads are updated daily.

Pleiades also has JSON, KML, and RDF data for any particular place. Data that
is current and linked from every page (http://pleiades.stoa.org/places/422987,
for example) with HTML <link> and <a> elements. It's not an API ... or is it?
The map on the page about Norba gets its overlay features from those very same
JSON and KML resources. Looking at it in this way, you could say we do have an
API here: the web is the API. When I finally finish the Pleiades implementation
of OpenSearch (with Geo extension by Andrew Turner), I can replace Plone's
crufty search API with even more consistency and interoperability from The Web
as API.

Pleiades doesn't need the same kind of API that Twitter or Facebook have
(obviously) or that OpenStreetMap has. We simply don't have anywhere near that
much data, that much churn or (in the Twitter/Facebook case) that much need to
control what you access.