I work for Red Hat, where I lead JBoss technical direction and research/development. Prior to this I was SOA Technical Development Manager and Director of Standards. I was Chief Architect and co-founder at Arjuna Technologies, an HP spin-off (where I was a Distinguished Engineer). I've been working in the area of reliable distributed systems since the mid-80's. My PhD was on fault-tolerant distributed systems, replication and transactions. I'm also a Professor at Newcastle University.

Sunday, December 30, 2007

I started to write this as a comment to Greg's posting, but it got too long.

I think Greg still misunderstood me, though looking back at my posting I can understand why: just enough detail to confuse and not enough to clarify. Oh well, I was rushed.

First, the notion of a root coordinator isn't present in the WS-BP model at all (most certainly NOT OASIS BTP). The WS-BP approach leverages some of the JFDI (REST-based) transaction work we were doing in HP where once again there wasn't a global coordinator. It was much more akin to the weakly consistent replication models that use a "gossip" approach and no single (centralized) consistency manager, rather than the strongly consistent replication protocols that do use algorithms based on a single coordinator. Same reasons: it doesn't scale (number of participants as well as physical locality), it doesn't perform and it isn't working with the application/user (sometimes taking advantage of the application semantics can make it more efficient to implement a good replication protocol, particularly when you look at recovery). That's why I hinted that the transactions crowd can learn from the replication crowd.

As I think I said in the original (original) post (and during my keynote at DOA 2007): there's not necessarily a single coordinator; there will be "domains" that may have coordinators that drive participants within them (but that'll be implementation specific and hidden behind the "service" endpoint), and how these domains are pulled together into a global "transaction" will not necessarily be through a single coordinator at all. There may be a single coordinator to kick start any interactions, but that role could even be taken by the application. Semantic information about the application/service/specific interaction needs to be "injected" into this model.

Global coordination is definitely out. But that doesn't mean that at some point the state of the system will not be such that an external observer could not tell the difference between when one was used and when one was not used (ignoring timing constraints). As I said in the DOA keynote, it's a bit like Heisenberg's Uncertainty Principle at work: you can tell the state of the participants in the business "transaction" (interaction) but not when that state will appear, or you can look at the participant states at the exact same time but not see the same "values". Yes, the analogy breaks down under closer scrutiny, but it's a nice way to try to illustrate the differences and begin the discussion proper ;-)

If we ever get round to updating our book I can write an entire chapter around this and explain it oh so much better with diagrams. Oh and as usual: one size doesn't fit all (which makes this discussion harder to have in a blog!)

Friday, December 28, 2007

There have been only two occasions when my Mac has let me down badly: the first was last Christmas when the disc died. The second was (is!) yesterday, when the disc died again. I backed up 2 weeks ago, but I'm still not happy. So if you're after responses to emails, blog posts etc. you'll have to get in line and wait until I have a replacement. I think I'm going to go bang my head against a brick wall for a bit!

I agree broadly with Ganesh and have been saying the same things for years. When discussing MEST with Jim and Savas in its early years, we covered the same ground: distributed computing practitioners have been doing this work years. I believe that's why they eventually clarified that MEST isn't necessarily anything new, but a term to cover an architectural approach that (some) people in the industry (and academia) have been using. I don't actually care what we call it: MEST, message-oriented, message-based, Nirvana, as long as there's something we can point to and agree about, that has many years of good practice use cases behind it.

I've been developing distributed systems (small and large scale [physical remoteness of participants and number of participants]) for over 20 years. I pre-date Sun RPC, for instance, going back to a time when TCP/IP wasn't the default way in which to build systems. (My first main development effort was collaborating on the Rajdoot RPC mechanism.) I still think UDP has much more to offer than TCP, which is a good general protocol for reliable delivery of messages; but if you know the specifics of your application and distributed environment, it's often better (easier, more efficient, faster) to build something on UDP. But I digress.

If you look at distributed computing (it doesn't even have to be the Internet), it's all about message passing at some level: even the dreaded RPC is simply an abstraction of two correlated messages. In the beginning that's all you had: low level message passing primitives and you encoded the information you wanted to convey in the message somewhere (since you were probably only talking to endpoints you had developed, it was easy to get agreement on the payload format - they did what you wanted!) But this was a pretty cumbersome and manual process, making large scale distributed systems development a slow, error prone process. Then someone had the bright idea to take a high-level programming language abstraction and layer it on to this: RPC was born. The fact that multi-threaded processes and operating systems were at least a decade away had meant that most message passing implementations were synchronous anyway, so RPC was an abstraction that fit with best practices. RPC started to constrain the more open (general) interface of send-message(blob)/receive-message(blob), trading this off for ease of use. When object-oriented programming became the standard, distributed object technologies with their own versions of client/server stub generators took off. These didn't constrain the interface any more than RPC did, but they were a logical extension of the paradigm.

At the low-level, messages (Signals) can carry any data, but higher up the stack the application developer is constraining the messages by imposing syntactic and semantic meaning on them (based on the contract that exists between sender and receiver): back to the opcodes and parameters. Therefore, at the developer’s level, changes to the implementation (the contract, the object implementation etc.) do affect the developer again: this can never be avoided since at some point you need the equivalent of a dispatching stub at some point if you want to do the work. The message-driven pattern simply moves the level affected by change up the stack, closer to the developer: in some cases that may well be the right place for decisions on that change to be made; in others it isn't. If you have the right tools to assist in the development of distributed systems based on this approach, then it's fine and can really help bring flexibility and extensibility to your systems. But without those tools, it can be a problem, particularly as you want to scale your systems beyond your own organisation (or even your own department!)

Now we all know that Web Services uses HTTP as a transport protocol. It's fair to say that this is a bastardisation of HTTP. I was at the first OMG meeting where the ideas behind SOAP were introduced and it was pretty evident (and admitted by some) that the reason for using HTTP was to tunnel through firewalls. This fact has probably been instrumental in limiting the bindings of SOAP, but also key to its adoption. Naturally enough RPC was the approach that pervaded Web Services development. That's because the tools were there (from distributed object systems) and it fit the applications and services that were being developed. Sure RPC is limiting as I mentioned before. But in the grand scheme of things it's hardly a great evil as some try to make out. Sometimes there are good reasons why you should use RPC. Don't let anyone dissuade you from that. But sometimes there are good reasons why you shouldn't. You need to look at what you're trying to accomplish and fit the right tool (abstraction in this case) to the right job. If it's RPC, then go for it! If you've done your homework about your needs and the assumptions made about your application, services and infrastructure, don't let someone who hasn't persuade you otherwise just because "the Web doesn't work that way". Let's remember the Million Flies Argument!

In general the way we've been evolving WS-* standards and specifications is away from RPC and back to a more message-oriented approach, with one-way message invocations, to facilitate loose coupling and the kinds of long-duration interactions we see on the Internet (I think one of the first specifications to really push this was WS-CAF). Correlation of these one-way messages is used to achieve request/response interactions (aka RPC). But this whole approach still constrains the interface: changing the backend implementation is only possible in a limited way. Yes, this has all sorts of other effects, such as the inability to utilise HTTP cacheing, but if I don't need that what's the problem? Maybe I can handle cacheing within the application anyway? Believe it or not, cacheing protocols did exist before the Web came on the scene! But this is not a black-or-white argument: the problems that exist because of the way in which Web Services use HTTP are important to some developers and we should not ignore them. But neither should we make them the central reason for not using Web Services.

But the RESTprotagonists (and let's make this clear, most of them are really talking about REST/HTTP) use the uniform interface and resource-oriented approach of the Web to show that it is superior to SOAP/HTTP. Well as I said earlier, I like REST and technically there is no reason we cannot do what is done in WS-* with it. But the Web does have its problems too. For example, broken links, the lack of orphan detectionand elimination. Of course you can live with these deficiencies: we do that every day. But they force the developer into a mindset that could otherwise be simplified and improved. Now I'm not suggesting that WS-* would solve these issues either! I'm simply pointing out that it's not a done-deal with REST. But developing using REST does have some significant advantages over SOAP for certain types of application. And this has nothing to do with putting the human in the loop, i.e., the fact that most people interact with the Web through a browser has nothing to do with this: REST/HTTP is just as useful when there are no human tasks involved in the system.

So where does this leave us? I'm a fence sitter because I've never been someone who believes in one-size fits all. A good architect or developer needs to be open to all of the possibilities when tackling any problem. Approaches such as REST or Web Services should be seen as tools in your tool belt, to be used as and when necessary (although with enough force you could use a hammer to cut wood, that's not normally the tool you'd use!) I think the debate between REST and Web Services people has become too polarised and there is a lot of Emperor's New Clothes Syndrome going around. No one should be thinking that Web Services or REST are meant as a replacement for (all) pre-existing distributed system infrastructures. And you should definitely not be pressured into one approach or another! Have an open mind and match your requirements with the capabilities offered by each approach (and let's not rule out some of the older technologies like CORBA or DCOM, that still have things to offer). Certainly when I'm developing "Internet scale" applications, I'll look at all possible approaches and choose the right one for the right job. Getting input from others, particularly based on their experiences, is always a good thing as well. But remember: your mileage may vary. What's right for one person/organisation may not be right for you. Don't follow the crowd because they are vocal: the emperor may be naked after all!

Wednesday, December 12, 2007

While reading my friend Greg'sresponse to my recent posting on transactions and SOA (really on transactions and scale), I noticed that his posts were flavoured with Web 2.0 style labels. I didn't even realise our shared blogging system had been updated to support such a thing. DO'h. Yet another feature I'll have to get used to.

Anyway, I also realised that maybe my post wasn't explicit enough with regards to transaction futures, so here goes again. I don't see distributed ACID transactions having much of a future in large scale systems. I do think that something called a transaction coordinator, with an associated transaction model has an important role to play, though the semantics such models offer to the developer will be different (and not necessarily subtly different either). If you look at some of the extended transaction models that looked at years ago they do blur the distinction between what you might class as workflow and "transactions". But there's still a reliable coordinator in there that controls the state transitions and can "do the right thing" on failure and recovery.

It turns out that the same is true for transactions: in fact, it's necessary in Web Services if you want to glue together disparate services and domains, some of which may not be using the same transaction implementation behind the service boundary. I still think the best specification to illustrate this relaxation of the various properties is WS-BusinessProcess, part of WS-TransactionManager (OASIS WS-CAF). Although Eric and I came up with the original concept, we were never able to sell it to our co-authors on WS-TX (so far). I think one of our failings was to not write enough papers, articles or blogs about the benefits it offered and the practicalities it fit. However, every time I explained it to people in the field it was such an easy sell for them to understand how it fit into the Web Services world so much better than other approaches. (The original idea behind WS-BP came from some of the RESTful transactions work we did in HP, where it was code-named the JFDI-transaction implementation.)