At the Forge - Aggregating with Atom

Want to give everyone a polite reminder when you have new content on your Web site? Give your site the latest syndication standard and you'll have a new tool to keep visitors coming back.

In the world of organized crime, a syndicate is a collection of
gangsters who work together. In the world of newspapers, a syndicate
distributes information to subscribers, allowing each publication to
tailor the content of information it receives. Comics, news stories
and opinion columns often are distributed by syndicates, providing
greater exposure for the authors and more content for the readers.

In the past few years, Web developers also have begun to use the term
syndicate, as both a verb and a noun. Fortunately for our safety,
syndication on the Web has more in common with newspapers than with the
mob. But as with organized crime, many people have been hurt in
public disputes (albeit with words, not guns), leading to a split and
a fair amount of acrimony in the world of Web syndication.

The result of this split is Atom, a new syndication format that has
much in common with RSS (rich site summary or RDF site summary,
depending on the version and whom you ask). I believe that Atom
offers a number of advantages over any version of RSS, and that the
simplicity with which Atom feeds can be created makes it an obvious
choice over RSS. That said, the fact that most Weblog products
provide RSS feeds means that the two camps happily can coexist for
now. Understanding how both work also means your organization can
decide to adopt one or both standards, depending on your needs.

Some History

As we saw last month, RSS really is two different formats, or more
precisely, two different families of formats. RSS 0.9x and RSS 2.0
are from the same family and demonstrate the evolution, over time, of
syndication on the Web. RSS 2.0 is maintained mainly by Dave Winer of
Userland, scripting.com and (most recently) Harvard University.
Winer has given ownership of the standard to Harvard but also
has declared that version 2.0 will be the final one. Nevertheless, the
combination of RSS 0.9x and RSS 2.0 represents a widespread, stable,
well-understood and ambiguous protocol for syndicating Web content.

A separate flavor of RSS, confusingly known as RSS 1.0, uses the
resource development framework (RDF) produced by the World Wide
Web Consortium (W3C). RDF is designed to make it possible for
computers to understand a site's contents, allowing it to make
connections between sites, much as people instinctively do all the
time. RSS 1.0 produces a summary that is incompatible with all other
versions of RSS, using RDF to produce a standardized description of
the site's contents.

The fact that RSS 1.0 used the RSS name caused a great deal of
friction and animosity, with many people variously blaming Dave Winer,
the vagueness of the RSS specification and the proponents of Atom's
predecessor. At the end of the day, a number of prominent
individuals—led by Tim Bray, Mark Pilgrim and Sam Ruby—were backed by such
companies as Six Degrees (which publishes Movable Type software for
Weblogs) to produce a specification, initially called PIE and Echo,
which attempts to address the shortcomings of RSS.

The development of Atom took some time, because it involved
understanding and defining exactly what syndication means on today's
World Wide Web. RSS no longer is used only for news sites, its
original target, but also for Weblogs and nontextual content. The
developers decided to make internationalization a top priority,
meaning that it should be possible to produce a syndication feed in
any language. Another priority was the development of extensions—that is, it should be possible to add new functionality to the Atom
feed without having to redefine the core Atom specification.

As of this writing (mid-August 2004), the Atom specification now
exists in version 0.3, along with a standard API for editing content
over the network. Atom has begun the process of becoming standardized by
the IETF (the Internet Engineering Task Force, which produces and
publishes Internet standards), meaning it is on its way to
being a universally accepted standard, much like TCP/IP, SMTP or
HTTP. This undoubtedly will lead to even greater interest in Atom
from organizations that wait for the IETF's stamp of approval.

Atom is still in its initial stages, lacking public specifications for
a number of items, such as its extension mechanism. But its authors
have, to date, produced a standard whose complexity is fairly close to
RSS 0.9x and 2.0, written in as unambiguous a fashion as
possible, which includes many members of the Web syndication
community and offers a vision of syndication that goes far
beyond the Web.

Producing an Atom Feed

Although RSS was designed to summarize a news feed or Weblog, Atom was
created with a more general purpose in mind. For example, factory
machines could produce status reports in Atom, with an aggregator
displaying those that are malfunctioning. Libraries could produce
Atom feeds of the latest additions to their collections, with smart
aggregators looking for books on certain subjects. Fax machines could
be replaced by fax modems, using Atom to distribute fax images to
appropriate groups of people.

You even could use Atom feeds to create a newspaper publishing system,
where reporters send their stories not as e-mail, but instead publish
drafts on an Atom feed. Each editor would aggregate Atom feeds from
the reporters under his or her control, moving them onto an outgoing
Atom feed when the editing was complete. The final feed would end
up in the production department, where the text would be laid out and
made ready for actual printing. The newspaper's content flow
thus would be a flow of many Atom feeds into a single, final feed
representing the newspaper itself.

Producing an Atom feed is fairly simple, if you use Perl or another
high-level language for which an Atom library exists. Perl, for
example, has the XML::Atom module, available from CPAN
(Comprehensive Perl Archive Network). I had a bit of trouble
installing XML::Atom on my machine running Fedora Core 2 and Perl
5.8.3, but I was able to work around it by ignoring the optional
DateTime module during the installation process. I would not
recommend doing so in a production environment.

Although XML::Atom is the overall package name, programs that create Atom
feeds actually use XML::Atom::Feed and XML::Atom::Entry. Here is a short
Perl program that produces a simple feed, based in part on the sample
program in the perldoc on-line documentation for XML::Atom::Feed:

As you can see, we create a single XML::Atom::Feed object, containing
one or more instances of XML::Atom::Entry. Each entry object
corresponds to a single <entry> tag in the Atom feed, which in turn
represents a single entry in our Weblog or a single message from our
factory floor.

The Atom specification indicates that the feed may contain a number of
attributes and sub-elements, including a language, a description of the
Weblog or site, copyright information and other general information
about the originating site. Each entry, in turn, has its own set of
elements, such as a title, an indication of when it was created and a
summary. Each Atom element also has a MIME type indicating what type
of content it contains, much like HTTP responses and e-mail
attachments.

Of course, creating a feed, as in the above example, is necessary
only if you are writing a new Atom-powered application or if you are
adding Atom capabilities to a Weblog product. Most Weblog products
now provide Atom feeds, either as part of their standard distribution
or through a plugin or other extension mechanism.
For example, an Atom feed plugin for the Blosxom Weblog product makes
it easy to add such a feed from a Weblog; install the plugin (by
placing it in the plugins directory), and anyone interested in
receiving an Atom feed from the Weblog in question will be able to do
so.

It shouldn't come as a surprise that this is so easy to accomplish,
given the fact that Blosxom is written in Perl, that Perl provides
excellent tools for working with XML and that the plugin simply needs
to summarize and rewrite content from the most recent entries in the
Weblog. Because Blosxom makes it so easy for plugins to modify
the main page (so as to advertise the Atom feed) and to retrieve
content (through the plugin API), it might be slightly easier to work
with Atom from that product. Given that most Weblog products are
written in a high-level language, such as Perl, Python or PHP, it
should be easy to add an Atom feed where none currently exists.

The server firewall allows incoming SSH traffic from anywhere. It then performs IP address filtering to allow only evden eve nakliyat certain IP addresses access to more open ankara evden evenakliyat resources, such as NFS, LDAP, CUPS and the FlexLM license server. The Web server uses a slightly different setup to allow only incoming SSH and HTTP traffic.

I regularly read your articles,they are informative and I put them to use at work!

I not able to get the atom feed validated (feedvalidator.org) using exactly the information given in your aticle for creating a feed. I tried passing your feed through the validator and it would not validate!,The validator complains about missing version, author etc. Can you please guide me on this.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.