Dreaming of an Atom Store: A Database for the Web

After a year of work -- two if you count the work done before the AtomPub WG was formed
--
the Atom
Publishing Protocol (APP) is moving closer to being done. There are still parts that
are unspecified, and other parts of the protocol are under debate even now. Even given
that,
some of the protocol is very stable, unchanged from even the earliest drafts. Despite
the
incomplete nature of the APP, there are plenty of people that are excited about it
and are
beginning to imagine all sorts of uses for it beyond a weblog publishing API, which
was its
original target. Over the course of the last couple months several of those ideas
have
collided often enough that we're starting to see a stable fusion. The ideas that are
colliding are the APP and Amazon's OpenSearch. Let's
take a moment for a quick recap:

Atom Publishing Protocol

The Atom Publishing Protocol leverages the work done on the Atom Syndication Format
and the
basics of HTTP to form a simple, yet powerful, publishing protocol. One of the things
that
doesn't get noticed at first about the Atom Syndication Format is that not only are
feeds
first class documents, but so are entries. This, for example, is a valid Atom document:

The Atom Publishing Protocol is all about pushing around Atom Entries. For now, we'll
assume that the APP is just used for editing a weblog, and that a weblog is made up
of
entries. Note that is a small "e" entry. For each small "e" entry there is a big "E"
Entry
that represents it. Each of those big "E" Entries lives at its own URI. Each entry
in your
weblog has a corresponding URI for the Atom Entry that represents it. Do an HTTP
GET on that URI to get the Entry; PUT a new Entry to the URI to
update the Entry and the corresponding small "e" entry gets updated too. HTTP
DELETE on that URI and the small "e" entry is deleted. The Entries that are
used to represent the entries in the weblog are grouped together in a Collection.
That, too,
is a resource and has its own URI. To add a new entry to your weblog you POST
an Entry to the Collection, which in turn creates the small "e" entry.

Open Search

Amazon's A9 service launched two years ago to research and
build innovative technologies to improve the search experience for e-commerce applications.
One of those technologies is OpenSearch, a
CC-licensed specification for search using Atom and RSS. In other words, OpenSearch
defines
a RESTful web service for searching, including a format for advertising what kind
of search
your site supports, and specifying how to return your search results in Atom or RSS.

Two Great Tastes That Taste Great Together

Now it's these two ideas, the APP and OpenSearch, that have started to show up together.
Imagine enhancing the APP method of editing your weblog with OpenSearch. You could
search
across all your entries and find the right ones you want to edit or delete. You already
have
the capability to return an Atom Entry for each entry on your site if you implement
the APP,
so returning a bunch of them in the form of a feed in response to an OpenSearch request
isn't such a great leap.

Here is where the idea itself starts to break loose from its beginnings and take on
a life
of its own. Imagine that there isn't a weblog associated with all those entries. Imagine
that you just have a huge glob of storage that you can store Atom Entries in, and
which you
can edit using the APP, and then search over using OpenSearch. That idea, that big
blob of
Atom Entries, all editable and searchable, is an Atom Store.

An Atom Store

The idea of an Atom Store has been bouncing around the blogosphere for a bit now,
though
not always called by that name. Jesse Andrews points
out a few of the sources of inspiration, and as far as I know he was the first person
to use the term "Atom Store":

You can even hear Google's Adam Bosworth request it on IT Conversations,
hoping MySQL folks don't become Oracle as Oracle doesn't scale the way an Atom Store
could scale.

The range of applications that are being talked about here is breathtaking. The monkeydo
and magicline usage of an Atom Store would be a remote persistence mechanism for a
Greasemonkey script. Contrast that to the ideas that Adam Bosworth is talking about,
databases that scale like Google's GFS does today.

It's All About the REST

That's a huge range of applications, but I think such a thing could happen. There
are
several forces driving it. First, you and I have lots of data, and it's stored in
lots of
places. I have my weblog, my email, my subscriptions to all my syndication feeds,
maybe a
del.icio.us and flickr account, and so on, and so on. You are not going to combine
those all
into one big, happy service. Ever.

I want my choices and even if you are a big company and end up being able to provide
all
those services under one brand, I doubt I would trust all that data in one place.
Instead of
consolidating services, what syndication over the past 5 years suggests is that now
I can
aggregate feeds from all those places into a single dashboard that let's me view the
status
of my far-flung data empire in a single view. Now if all those sources of data not
only
supplied a feed, but also supported the interface of an Atom Store, well now that
passive
view changes into a real dashboard -- not only are those entries viewable, but they're
editable from one spot.

Yes, I know that some aggregators support search, and some even support some of the
current
blogging APIs, but that's very different from every source being searchable and editable.
An
aggregator is only going to be able to search across entries that have appeared since
it
started subscribing to that feed, and not any earlier ones.

The other advantage of an Atom Store is that it's built on top of RESTful services.
That
means that we get the advantages of REST -- caching and uniform interfaces and hypermedia
as
the engine of application state. For both OpenSearch and the APP there is an XML document
that describes the capabilities of each endpoint. They are self describing. That allows
another service to come along and wrap several Atom Stores together by reading those
description documents and then presenting itself as an Atom Store, an aggregate of
all those
stores it uses. Now that aggregate store could be a melange of your disparate data,
your
weblog, your email, etc. On the other hand, it could be a uniform series of servers
each
with a subset of a huge store: now you're building a monster database.

"Just" Use a Database

Aren't these just the same promises made in the early days of SQL? Sure they are,
but I
think an Atom Store has a better chance of meeting the hype for several reasons: The
first
is that the data model is not wide open like SQL; the format is pretty restricted
as far as
the core elements of Atom are concerned. Secondly, the query and updating operations
are not
nearly as comprehensive as SQL. If you want to point to SQL as the only reasonable
way to
query over gigabytes of data, I'll just point to Google or Yahoo as counter examples.

It's Not All Puppies and Roses

Now that I've got you all worked into a lather over how great the world will be with
Atom
Stores on every street corner, let me splash a little cold water in your direction.
I've
kind of glossed over some areas that need work. Some of the open questions are:

How do you know where to POST to for creating new entries vs. annotations?

Creation

If I POST a new Entry to an aggregate of a bunch of Atom Stores, which of those Atom
Stores should it be created in? How should I route that POST?

Foreign Markup

Let's say I wanted to use an Atom Store for storing all the customer transactions
in my
e-commerce store. To do that effectively I may have to add some extra information
to an
Atom Entry to fully represent a transaction. How and where is that information stored
and
indexed? Do I start creating microformats for all of that data or do I stuff it in
the
Entry as foreign markup? How much indexing of foreign markup is useful? Do we need
specialized indexing and search terms for that?

As you can see there's plenty of work to be done. Let's roll up our sleeves and make
it
happen.