Dan Diephouse on Atom, AtomPub, REST and Web Services

Recorded at:

Bio Dan Diephouse is an enterprise architect and open source developer,founder of XFire, the incubating Apache CXF project (aka XFire 2.0) and a committer on several other open source projects, including Apache Abdera, XmlSchema, and Jettison. He currently works at MuleSource where he is focused on building and helping others build open source web services/SOA solutions.

Currently I work at MuleSource. It's kind of the home of the Open source Mule ESB, and I work on a variety of things. One of the main areas is focus around web services and working on some of our support for this. On one side we have the WS-* side which some of you may know me from. I originally started the XFire SOAP web services project, which is now called Apache CXF, so I'm integrating these things with Mule and making it very simple to do SOAP, WSDL, WS-* integration, but I'm also very focused on enabling restful support inside Mule and making REST support as easy as possible. Mule actually has this unique architecture with POJOs and it's really easy to make it map to resources in a variety of protocols like this. So I am pretty enthused about that. One aspect of that is adding Atom Publishing protocol support. I've been working on an Abdera connector for Mule exploring various integration scenarios that you can do there. Another aspect of my work at MuleSource is come around conferences like the excellent QCon, which I am very thrilled to be at right now, and talk with developers and we are also working on new projects as well, which we hope to announce in the future.

Yes. CXF, just to back up one second, it started in the Codehaus XFire project and it's kind of grown up now. We have a large community of developers around it. We're at Apache and we've done various 2.x releases and the 2.x releases focused on SOAP 1.1, 1.2, WSDL, WS-Addressing, WS Policy, WS Security and WS Reliable Messaging. Some of the work going on right now is focused on adding support for a couple of new standards which can be pretty important for some integration scenarios like WS Security Policy, WS Trust and WS Security Conversation. And then the other big focus is JAX-WS 2.1 support which adds a few extra features like WS-Addressing support to the JAX-WS APIs which are very handy for the developer. JAX-WS in general has been a great thing for Java developers, as really it's standardized, simplified everything for everybody and so I am excited to see the standard progress and make it even simpler for people in the future.

Atom regionally started off as a syndication format for blogs. It was born out of frustration with the RSS feed and the various inconsistencies in the spec and all these different versions of it, so a bunch of great guys, very smart guys came up together and developed this Atom specification. The Atom format is basically this thing called "a feed" which has a collection of entries and if we go to the typical example, each entry represent a blog entry, but really an entry can kind of represent any resource or any chunk of information. Important things to know about Atom feed or an Atom entry is they each come with an ID, a time and a summary or some type of content and what this does it makes it universally readable by everybody. So I can have this feed and everybody knows some common elements and how to interpret this and extract some information out of it. Of course it's pretty straight forward how you apply these things to blogs, but it actually quite useful for domain or business applications as well. So what I can do is I can shove any type of micro content in an Atom feed. For instance I can describe an employee in an Atom feed and I can have a feed which is a collection of employees and each entry is going to represent an employee and they can have an XML snippet with some employee metadata, and the nice thing about it is you have this feed which is universally readable by anything, so you could subscribe to a feed and be notified by new employees. Or you can use some of these various search protocols on top, you now, query strings, to search an employee database simply by HTTP.

The Atom Publishing protocol, its primary focus is making it easy to create new resources, edit resources, delete resources, get resources inside a collection. So I can individually grab a single entry via the publishing protocol, I can use the HTTP PUT method to edit an entry, I can just PUT to a URL and I will replace the original entry with my new one. I can POST an entry to the collection, which will create a new entry for me and add it to the feed, and I can DELETE individual entries because each entry is referenced by URL, so it's just a simple HTTP DELETE and that entry is gone.

Yes, which is one of the great things about Atom: it is a standard restful protocol, so it stops you from building you own to some degree and it stops you from screwing it up which is nice because it's nicely designed. And then the real great part about it is this Atom format, which is universal, the idea that I can have this content section or the summary section which, is going to provide something which anybody can look at without knowing about my microformat or the data that I have behind it.

You could definitely use both; they might be complementary for your scenario. One doesn't prohibit the other. Sometimes you might just need Atom, you might not need something like WS-*. Atom provides all the necessary things you need to do on your model. Going back to the employee example you could add employees to the collection, you could remove employees, you can update employee information and contact info, but other times Atom might not be sufficient from some of the limitations there and you might want to go with your own restful protocol or even the WS-* approach.

There is a couple of limitations to the Atom Publishing protocol that I have run into. It's not necessarily saying that Atom Publishing protocol is bad all around; it's just there are some scenarios there that are not great to use at the moment. One is modeling hierarchies, so it's a little bit tricky to model various hierarchies in AtomPub because I have an entry which might model a costumer but then a costumer is associated with the purchase orders. How do I model purchase orders inside an entry so I can edit individual ones or create new ones. Some people suggest just sticking this Atom collection element inside an entry, but it's not necessarily a natural model but it can be made to work.

Another limitation is there is no designated way to do batch updates. That is widely accepted. GData has tried to standardize one but there has been a little bit of batting of heads whether or not it's the right model to do that. If you are doing some high performance application the Atom Pub model might not be the right one because you are always going to have Atom entry associated with it and you might have latency requirements, so you might want to be sending very simple messages.

Yes. There are definitely a few: security is one of those areas where I feel it hasn't quite baked itself long enough yet for the "just use HTTP" mantra I fell like the WS-* folks have done a bit more thinking about security. We have things like WS Trust which makes token exchange possible and there isn't any kind of equivalent for that on the restful side, so you have things like CardSpace which actually depends on WS Trust and so I think we may see some of the REST people actually end up depending on WS-* which I find kind of ironic. Another is Atom Publishing or the Atom feed model might not yield any benefits to your application. If you application has some type of model that is not time indexed or you are not going to yield any benefit from being able to look at the summary or the update time or anything that is in the standard Atom model. I just don't think it's really going to get you any benefit and you might as well just use your RESTful protocol in that case or your own WS-* thing if that floats your boat. One other is transactions over HTTP: There are some ways to do it but it's definitely not baked into the Atom Pub model, so that may be something that you want to think about as well.

There are a lot of business applications that you can use Atom Pub with. An easy one is directories and information whether it's costumers or employees or contact database, calendar events. You can basically manipulate any type of information which fits in a collection of some information, whether it's a Java collection or .NET or whatever the Ruby equivalent is. Anything like that can end up as an Atom collection pretty easily. One of the use cases I have been thinking about is using it to store event so that, whether it's a business level event or application level event, you might want to listen for errors on your application and you might want to subscribe to a feed of these things, or you might want to listen to a business event like "so-and-so is denied a loan" and just be able to look at that and get kind of a asynchronous update of what is going on there. Other work that is going in is, I know that James Strachan has done some work modeling queues with Atom Pub that's been pretty interesting and applying that to messaging systems and I think that can have some use there as a way to build a restful bridge and queue for some systems. So these are some of the main areas where I think it might be very interesting to apply. There is more out there, I am sure as well.

G data is this "protocol" that Google has been working on, so "GData" as in "Google data" and it's supposed to be a simple standard protocol for applications and it kind of focuses on a couple of different areas. One is that it tries to abstract Atom a little and builds on Atom and RSS and describes the model which you could work with either one. One of the main focuses it is useful for outside people is this query syntax. So they have his way to actually query feeds and page information that they are trying to standardize. When I say standardize they are actually standardizing on this and using it across all of their services. So you can interact with Google calendar, Google spreadsheets, I think, GoogleBase, their search engine, pretty much any Google application you can use GData with, which goes to show some of the power of the Atom Pub model to begin with. Going back to what they were focusing, on what this GData protocol focuses on, it describes some additional semantics which Atom Pub doesn't describe really well, like how we should handle versioning and what happens if two people write to the same resource at the same time and so it specifies an optimistic concurrency model, it specifies some stuff for sessions. They also define some common feed elements like data point or geography points, like how to represent those in feeds. There is a couple of other random things as well. It's built on top of Atom Pub and it specifies of bunch of these things that Atom Pub doesn't and can help you build you application or provide some of this additional structure that you might want.

One of the things I have been working on is Abdera which is in the Apache incubator. Another Java one is called Propono, which the roller web logger program uses, in Python, there's Amplee there is a Ruby one which I don't remember the name of. I don't think there is a great .NET one at the moment although there has been talk of it, but the great thing about Atom is it's just HTTP, or Atom Pub is just HTTP, so you can use your favorite HTTP library and just POST away and PUT away and DELETE away and GET away, so you are good to go.

This is a fun question. I love the REST model actually, I think it's a great model to build services, people I find really like the simplicity although we are starting to see more standards around REST and more additions, things like open search on top, so there is more stuff for people who learn now then there was. But I like the simplicity, I like the uniform interface, the fact that I can use the same client to manipulate resources and it's universal across all things and it just makes things simpler when you are architecting large scale applications I think. There's also some things … I think we have some work to do on yet; one of these is a kind of a description language. There has been some work in WADL, but there is no kind of universally accepted way to really figure out how to interact with these resources.

Actually I don't have a lot of demands here, I am not trying to trying to turn this into SOAP, but basically when I start working with resources I should know what some of these MIME types are, right? Like I can understand an image/png MIME type or I can understand the text/plain mime type, but there are these MIME types out there which I don't understand and so I just would like to see a way to figure out what MIME types the service might use and a way to map these MIME types to various description languages if I want. So I can come up with my own costumer XML MIME type and then have a schema behind it. In Atom Pub actually comes really close to doing this. It describes the MIME types that you can use with a service, but it doesn't provide anything to discover what those MIME types might be and so I think if we just close that gap we would be able to do a lot more interesting computer to computer interaction.

I mentioned before security I think we could do some work on; there is definitely some thought on that. There has been some discussion about how to sign GET requests and responses and how do we use things like WS Security with REST and so it will be interesting to see where that goes. But overall, you can just like at the Internet, and see that we've already enabled a lot of applications and there are definitely some applications which don't fit the RESTful the model and you definitely don't use it, if it's not the right fit you don't use it. But otherwise I think REST should kind of be your default thing and go from there.

I don't know that is sufficient. We definitely have some out there. I come from WS-* background, so I can go and I can do some .NET integration with Java in just a couple of minutes actually, but building a RESTful service in Java, for instance, takes a little bit more time. Mainly there are so many ways to do it and they aren't necessarily simple ways to do it quite yet. So one of the things that I have been watching is the JSR 311 specification which I am actually pretty pleased with so far and some of the work they have done there. You can go and you can write a restful service really easily, but I think on the client side we probably have some work to do. I mean you look at the Java libraries and we don't even have a decent HTTP client. Thank Goodness we have some things from Apache but even that isn't necessarily the cleanest. I've also looked at some of the Restlet stuff; they've done some good stuff on the client side and Atom Pub libraries out there are going to help you build restful services as well.