BPM in the Cloud: Disruptive Technology

Sometimes, it feels like ZapThink is the lone voice of reason in the wilderness. We fought to cast SOA as a business-driven architectural approach in the face of massive vendor misinformation, hell-bent on turning SOA into an excuse to buy middleware. And while many organizations are successfully implementing SOA, perhaps with our help, in large part we lost that battle.

Today we’re fighting the corresponding battle over the Cloud. The general battle lines are similar to the SOA days—ZapThink espousing business-driven architecture, while the vendors do their best to spin Cloud as an excuse to buy more gear. But just as the players in the two World Wars were largely the same even though the tactics were quite different, so too with SOA and the Cloud.

The Original BPM Battle Lines

Today, a battle is brewing that promises to reopen some old wounds of the SOA days to be fought over the new territory of the Cloud. That battle is over Business Process Management (BPM). The fundamental idea behind BPM software is that you need some kind of engine to coordinate interactions between actors and disparate applications into business processes. Those actors may be human, or other applications themselves. To specify the process, you must create some kind of process representation the process engine can run.

Vendors loved BPM because process engines were a natural add-on to their middleware stacks. Coordinating multiple applications meant creating multiple integrations, and for that you need middleware. Lots of it. And in spite of paying lip service to cross-platform Service compositions that would implement vendor-independent processes, in large part each vendor rolled out a proprietary toolset.

ZapThink, of course, saw the world of SOA-enabled BPM quite differently. In our view, the Service-oriented way of looking at BPM was to free it from the engines, and focus on Services: composing them and consuming them. But there was a catch: Services are inherently stateless. The challenge with the Service-oriented approach to BPM was how to main state for each process instance in an inherently stateless environment.

The ESB vendors largely ignored us (Fiorano Software being a notable exception). Sure, WS-Addressing might have helped, but the vendors chose not to implement this standard consistently, instead relying on threads or other object instances to maintain process instance state. After all, once you fall for the heavyweight, big engine approach to BPM, you’re essentially locked into the platform you have chosen. And vendors love nothing more than customer lock-in.

The New Battle Line in the Cloud

While we have to award victory to the vendors in the SOA-based BPM war, the move to the Cloud offers an entirely new battleground with completely new rules. Today, of course, the vendors (and it seems, everyone else) want to put their software in the Cloud. So it’s a natural consequence that the BPM vendors would seek to move their BPM engines into the Cloud as well, perhaps as part of a PaaS provider strategy. Clearly such a move would be a good bet from the business perspective, as it’s likely that many BPM customers would find value in a Cloud-based offering.

Here’s where the story gets interesting. In order to achieve the elasticity benefit of the Cloud for a distributed application, it’s essential for the application tier to be stateless. The Cloud may need to spawn additional instances to handle the load, and any particular instance may crash. But because the Cloud is highly available and partition tolerant, such a crash mustn’t hose the process that Cloud instance is supporting.

As a result, there is simply no way a traditional BPM engine can run properly in the Cloud. After all, BPM engines’ raison d’être is to maintain process state, but you can’t do that on a Cloud instance without sacrificing elasticity! In other words, all the work the big vendors put into building their SOA-platform-centric BPM engines must now be chucked out the door. The Cloud has changed the rules of BPM.

Hypermedia-Oriented Architecture, Anyone?

ZapThink, of course, has the answer. The same answer we had in 2006, only now we must recast the answer for the Cloud: maintain process state information in the messages.

The messages we’re talking about are the interactions between resources on the server and the clients that the process actors are using, what we call representations in the REST context. In other words, it’s essential to transfer application state in representations to the client. This is the reason why REST is “representational state transfer,” by the way.

The vendors, however, are far from shaking in their boots, since ZapThink once again has the lone voice in the wilderness when we talk about REST—and this time, the vendors aren’t the ones responsible for the misinformation. Instead, it’s the community of developers who have miscast REST as a way to build uniform APIs to resources. But that’s not what REST is about. It’s about building hypermedia applications. In fact, REST isn’t about Resource-Oriented Architecture at all. You could say it’s about Hypermedia Oriented Architecture.

At the risk of extending an already-tired cliché, let’s use the abbreviation HOA. In HOA, hypermedia are the engine of application state. (RESTafarians recognize this phrase as the dreaded HATEOAS REST constraint.) With HATEOAS, hyperlinks dynamically describe the contract between the client and the server in the form of a workflow at runtime. That’s right. No process engine needed! All you need are hyperlinks.

The power of this idea is obvious once you think through it, since the World Wide Web itself is the prototypical example of such a runtime workflow. Youcan think of any sequence of clicking links and loading Web pages as a workflow, after all—where those pages may be served up by different resources on different servers anywhere in the world. No heavyweight, centralized process engine in sight.

On the other hand, there is a substantial opportunity for the innovators—new entrants to the BPM marketplace who figure out how to build a Cloud-friendly BPM engine. Think you have what it takes? Here are some pointers to architecting a partition-tolerant, RESTful BPM application.

Separate the resources that build process applications from the representations of those resources. Servers are responsible for generating self-descriptive process maps that contain all the context necessary for any actor to work through a process. After all, orchestrations can be resources as well. In other words, the persistence tier doesn’t host a process engine, it hosts a model-driven resource that generates stateless process applications to run on the elastic application tier.

The application tier acts as the client for such process representations, and as a server that supports the clients of the process. Keep the application tier stateless by serving application state metadata in representations to the clients of the process. In other words, the application tier processes interactions with clients statelessly. As a result, any application tier instance is interchangeable with any other. If one crashes, you can bootstrap a replacement without affecting processes in progress. This interchangeability is the secret to maintaining elasticity and fault tolerance.

Separate UI representations from application state representations. If a client has the state representation, it should be able to fetch the appropriate UI representation from any application tier instance. As a result, the state representations are portable from one client to another. You could begin a process on a laptop and transfer it to a mobile phone, for example, and continue where you left off.

Use a lightweight, distributed queuing mechanism to address client uptime issues. If a client (typically a browser) crashes in the middle of a process, you want to be able to relaunch the client and pick up where you left off. But if the client has the only copy of the application state, you have a problem. Instead, allow the client to fetch a cached copy from a queue.

For processes that require heavy interactions among multiple actors, follow a peer-to-peer model. Most processes that involve multiple actors call for infrequent interactions between those actors, for example, processes with approval steps. For such processes, support those interactions via the resource state in the persistence tier. However, when you require heavy interactions among actors (imagine a chat window, for example), enable the actors to share a single application instance that initiates a peer-to-peer interaction.

Maintain integrity via checksums. Conventional wisdom states that you don’t want to let the client have too much control, or a bad actor could do real damage. To prevent such mischief, ensure that any invalid request from the client triggers an error response on the application tier. As a result, the worst a hacker can do is screw up their own session. Serves them right!

Not much to it, is there? On the one hand, it’s surprising no vendor has stepped up to the plate to build a fully partition tolerant, RESTful BPM app. But then again, it’s not the technical complexity that’s the problem—it’s the paradigm shift in thinking about the nature of stateful applications in today’s Cloud-ready world. That shift is what makes Cloud-friendly BPM a disruptive technology.

The ZapThink Take

Perhaps I should have named this ZapFlash “RESTful BPM,” because, of course, that’s what it’s about. But that would have introduced unfortunate confusion, since that phrase has already been co-opted by vendors who use it to mean “traditional BPM engines with RESTful APIs.” But that’s the old way of thinking. What this ZapFlash proposes is a different way of thinking about BPM applications altogether.

This paradigm shift, as with all such shifts, is part of a larger trend from old ways of thinking to entirely different approaches to building and consuming applications. The table below illustrates some aspects of this shift:

It may seem the new way is at the bleeding edge, but in reality, many of today’s Cloud-based companies are already living the new paradigm. Only enterprises and incumbent vendors find themselves struggling with the old way. But that is changing. Stay tuned, ZapThink will continue to lead the way!

At Scripps we are new to the cloud and starting to look for how to leverage its capabilities. I believe I understand the inherent issues with state management and how you proposed to deal with state via embedding state in the messages and holding it on the client side.

How does the new Amazon Simple Workflow Service (SWF) impact your proposal (if at all)? It seems that Amazon has created a framework to hold state durably.