An overview of the Atom 1.0 Syndication Format

How this popular Web content syndication format stacks up

Get a technical overview of the popular Atom Syndication Format. This article discusses Atom's technical strengths relative to other syndication formats, and offers several compelling use case examples that illustrate those strengths.

James Snell is a member of IBM's Emerging Technologies Toolkit team. He has spent the past few years focusing on emerging web services technologies and standards, and has been a contributor to the Atom 1.0 specification. He maintains a weblog focused on emerging technologies at http://www.ibm.com/developerworks/blogs/page/jasnell.

Web content syndication is an area of growing importance on the Internet and behind the firewall. What was once the sole domain of bloggers and online news sites is evolving into a platform for next generation Web-based services and content distribution. While the adoption of syndication technologies is growing at a fevered pitch, these technologies have a long history of technical issues, ambiguities, and interoperability challenges that have made life difficult for software developers and consumers who use these emerging trends. To address these issues, members of the syndication technology community came together to pool their combined experiences and define the Atom Syndication Format and the Atom Publishing Protocol standards (see Resources). On July 15th, 2005, the first of these specifications, the Atom Syndication Format, was released to the world for implementation.

This article assumes that you have at least a basic understanding of content syndication and the existing family of specifications. As you read through this overview, I recommend that you keep a copy of the Atom 1.0 format specification handy as a cross-reference for the various elements discussed.

It is important to point out that it is not the intent of this discussion to disparage RSS in any way. Rather, the goal is to illustrate the types of improvements that the Atom format delivers relative to the existing family of syndication formats, and to highlight the strengths inherent in the Atom format.

A simple example

For anyone who has used the RSS family of specifications to do content syndication, Atom 1.0 will be readily familiar. Atom does, however, differ from RSS in many important respects. Listing 1 illustrates a simple Atom 1.0 feed.

A unique identifier, which can be as simple as the URI of a blog entry or other Web resource represented by an entry, or as complex as a truly unique 128-bit Globally Unique Identifier (GUID)

A title, which expresses a short, human readable subject line for the entry, and can be a blank string (represented by an empty title element, such as <title />)

A timestamp which indicates when the last update occurred

Further, Atom takes the time to carefully describe a robust, flexible, and consistent content model that's capable of supporting: plain text, escaped HTML, well-formed XHTML, arbitrary XML, base-64 encoded binary content, and URI pointers to content not included directly within the feed. In contrast, without resorting to the use of non-standardized and inconsistently implemented namespace extensions, RSS is capable only of handling plain text and escaped HTML content.

Atom also provides a well-defined extensibility model that provides the same kind of decentralized, dynamic mechanisms for adding new metadata and content supported by RSS, but does so in a way that helps protect core interoperability between implementations. For example, Atom clearly articulates where extension elements can and cannot appear within a document, which extensions are language sensitive (and thereby affected by xml:lang attributes), and how an Atom implementation must react when it encounters an unfamiliar extension element.

Finally, Atom provides rigorous definitions for the various required and optional metadata elements within its core namespace. For instance, Atom defines an author element that is a complex structure including a name, an e-mail address (as defined by RFC 2822), and a resource identifier that's associated with the author in some way (such as the URI of the author's home page).

A feed or entry can have multiple author elements along with zero or more contributor elements. These elements identify individuals who might have contributed to the production of the feed or entry, but whose level of input does not warrant recognition as an author (for example, audio engineers, editors, software developers, and others). Both the author and contributor elements are fully extensible, allowing content producers to provide as much detail about the author or contributor as they deem appropriate. In comparison, RSS specifies a much more limited author element that only appears once within an item and is only capable of expressing an e-mail address.

Overall, the various features built into Atom are geared towards allowing the format to support a much broader range of syndication use cases while it addresses many of the technical weaknesses that permeate the existing family of syndication standards.

Support for enclosures

Outside of weblog and news content syndication, one of the most popular evolving applications of syndication technology has been in the area of podcasting. A podcast is a data feed that distributes recorded digital audio files that are automatically downloaded and copied to a user's portable media device. Currently, podcasting is enabled through the use of RSS 2.0's enclosure tag as illustrated in Listing 3.

While podcasting is rapidly growing in popularity, the RSS 2.0 enclosure tag has at least one very significant limitation that has proven to be an annoyance to podcasters: RSS allows only one enclosure tag per item. This means that podcast producers who wish to make their audio downloads available in multiple formats (such as MP3, BitTorrent, or WMA) must offer separate feeds for each format they wish to offer. Atom, on the other hand, allows any single entry to contain multiple enclosures, each with an associated media type attribute that makes it possible for podcasters to produce a single feed containing all of the formats they distribute.

To illustrate this by example, consider the list of podcast feeds available from IT Conversations (see Resources).
Because IT Conversations podcasts are offered in multiple formats, potential subscribers must select from at least 73 individual RSS feeds with enclosures (excluding the 37 text-only feeds that are also listed). Using Atom enclosures, IT Conversations would be able to cut the total number of feeds in half simply by including two enclosure links in the Atom entry. Such a reduction in feeds results in a net reduction in complexity for both the content publisher and content subscribers.

Atom enclosures allow you to do more than just distribute audio content. Enclosure links can reference any type of resource. Listing 5, for instance, uses multiple enclosures within a single entry to reference translated versions of a single PDF document that's accessible through FTP. The hreflang attribute identifies the language that each PDF document has been translated into.

The example in Listing 5 is impossible to support in RSS 2.0 unless you introduce non-standardized namespace extensions into the feed. There are a number of important reasons for this:

RSS does not allow multiple enclosures within an entry

RSS does not provide a means of associating a language with the enclosed resource

RSS enclosures are required to use HTTP URLs

RSS does not provide a means of optionally associating a human readable title for a referenced resource

Another important point is that the Atom link elements that enable enclosures can do far more than just associate downloadable files with an entry. Links also can specify meaningful links to other types of resources:

<link rel="alternate" /> -- Identifies an alternate version of the feed or entry (for example, a weblog home page)

<link rel="related" /> -- Identifies a resource that is described in some way by the content of the entry

<link rel="self" /> -- Identifies a resource that is equivalent to the feed or entry; generally this permits a feed or entry to become self-referential to allow flexible auto-discovery mechanisms

<link rel="via" /> -- Identifies a resource that provided the information contained in the feed or entry; for example, if the entry was distributed through an online aggregation service, the via link identifies the aggregator as an alternative to the currently common practice of having the aggregator override the RSS link element

These built-in link relations are designed to cover the most common and generic types of links expected to be used with feeds. New types of relationships can be dynamically defined using fully-qualified URIs. I'll talk more about the extensibility of link elements, as well as illustrate a simple example, a bit later in this article.

Content-by-reference

In addition to support for links and enclosures, Atom introduces the ability to reference entry content by URI. Listing 6, for instance, illustrates how an Atom feed for a photo weblog might appear. The content element references each individual photograph in the blog. The summary element provides a caption for the image.

This content-by-reference mechanism provides a very flexible means of expanding the types of content that one can syndicate through Atom.

For example, the idea of using the syndication model to distribute software updates is often discussed. In so doing, it is helpful to link to the downloadable file that contains the software update, and a Web page that describes the update. Because Atom clearly separates the roles of the link and content elements, creating such a feed is a straightforward exercise that requires no extensions to the core Atom namespace.

Other applications of content-by-reference include the syndication of data not typically suitable for static embedding within a feed. Examples of such content include live audio or video broadcast streams, links to secure account information or transactions, and large data streams.

Extending Atom

An important strength in current syndication technology is the ability for application developers to dynamically extend a feed with new types of metadata. One key goal of the Atom working group was a well-defined extensibility model that preserved the decentralized, dynamic extensibility mechanisms that content publishers and syndication application developers have come to expect, and to protect core interoperability between Atom implementations.

Extensions to Atom come in two flavors, both of which I illustrate here:

New namespace-qualified extension elements and attributes

New link element relation types

Namespace extensions involve mixing new XML elements and attributes with the core Atom elements. For example, Atom defines elements that describe the moment when an entry was created and when the entry was published. However, imagine an application that produces entries whose content must expire at a given point in time (for example, a feed representing special sale offers or a weekly top-ten list). Atom does not provide any core elements that can be used to specify an expiration date. It is possible, however, to declare such an element in a separate namespace and include it in the Atom feed as in Listing 9. Consumers of the feed who are not aware of the expiration extension element can simply choose to ignore it.

You can include extension elements and attributes throughout an Atom document with a few basic exceptions. For instance, Atom date constructs such as the atom:updated element can contain extension attributes but cannot contain extension elements.

As a quick aside, because Atom was defined through a formalized IETF standardization process, a common misconception is that extensions like the s:expires element in Listing 9 also have to be defined and ratified through an equally formal and centralized process. That is absolutely not the case. Atom extensions can be defined in a completely decentralized, open, and informal manner without any involvement on the part of the IETF, while still preserving interoperability.

Link relation extensions involve the creation of a new link element rel attribute value that identifies a new type of link relation. link elements associate external resources with the feed or entry; the rel attribute identifies the purpose of the link. By creating new link relations, you can extend the types of relationships that the link element is capable of expressing.

For instance, most weblog software packages support the ability for readers to post comments to a blog entry. These comments can themselves appear as entries within a feed. Listing 10 illustrates a link extension that I have proposed to allow a bi-directional link between an entry and an associated comment.

http://purl.org/syndication/thread/1.0/comments -- Links a feed or entry with an Atom feed that contains comments

http://purl.org/syndication/thread/1.0/root -- Links the comments feed with the feed containing the original entries

http://purl.org/syndication/thread/1.0/in-reply-to -- Links a comment entry with the original entry

This proposed extension is still being actively discussed and developed and is expected to evolve over time.

Other extensions to express feed history, associate licenses, and provide a list ordering mechanism have been proposed and more are in the works. Some of these extensions might ultimately become IETF Internet-Drafts or even RFCs; others will not. It is expected that many useful extensions will emerge over time as developers begin to roll out new and interesting applications. It is also entirely possible to use a great number of existing common RSS extensions with Atom with very little effort.

Wrapping up

In May of 2004, Uche Ogbuji published an article here on developerWorks that provided an early, introductory exploration of the effort to define Atom. In his introduction, Uche wrote that one of the goals of Atom "was to create a more technologically sound design than many of the flavors of RSS, using the practical experience of the many RSS users to make the practical design compromises that would enable the new format to work in harmony with the architecture as well as the culture of the Web." While it has taken some time, a lot of careful discussion, and a significant amount of effort on the part of IETF working group participants, Atom 1.0 now achieves the goals of providing a simple, well-defined, and unambiguous format for content syndication on the Web.

The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.