BPEL as a source of application management metadata

Let’s put aside for now all the discussions about whether BPEL is an appropriate tool to capture a “true” business process, i.e. to implement the business logic understood by a business analyst (a topic that has been discussed at length already, including here, here, here, here, here, here and around the 5 minute mark of this podcast). Today, let’s look at it as simply another resource in a developer’s toolbox, alongside things like servlets and XML parsers. It’s a tool that can simplify the invocation of remote services (especially asynchronously), the parallelization of tasks, the definition of scoped compensation handlers, the transformation of XML, the encapsulation of key business logic and, most importantly, the reliable implementation of long-lived processes. If you need a few of these features, you might find BPEL a suitable programming tool. Plus, it refreshingly encourages handling of XML as XML (e.g. via XPath) rather than mindless code generation.

In addition to whatever developer productivity benefit you see in BPEL, there are other potential benefits form using it. They are the topic of this post and they relate to application management.

We all know that in an ideal world, no developer would release an application without providing a set of management capabilities that are carefully crafted to reflect the business logic of the application. Such that IT administrators can monitor, configure, optimize and troubleshoot the application in ways that are related to what the application really does (as opposed to generic metrics like memory, CPU and I/O metrics…).

Back in the real world, this is of course rarely the case. Enters BPEL. Just by virtue of using it in a reasonable way, and without any “just for the ops guys” metadata, BPEL provides a management model for the application. Sure it’s not as good as a hand-crafted management model, but at least it’s there. And it has some pretty compelling properties:

It feeds directly from the metadata used by the runtime, so it is guaranteed to be accurate (unlike metadata that is created specifically for management but has no role in the actual runtime).

It shows what external services the application depends on. Of course there is no guarantee that all remote invocation will be represented in the BPEL process, but since that’s a strength of BPEL it is reasonable to expect that it provides a good view of application dependencies (to be complemented, of course, by the application infrastructure dependencies like the database and the BPEL engine itself…). Remote invocations are a common point of failure and/or performance problems so they are a first class citizen of an application management model.

It explicitly captures process instances. No more jumping from one database table to another (assuming you even know where to look) to try to get a sense of the current overall status. The BPEL instances show the number of in-flight transactions in the application. It is also easy to compare the initialization and termination rates to see the trend.

It provides a horizontal segmentation of the processing tasks (via the BPEL activities) that is a good complement to the vertical segmentation often offered by application management tools (e.g. time spent in the database, time spent waiting on I/O, etc…).

It makes explicit certain exception conditions.

All these only make use of very basic aspects of BPEL: the enumeration of PartnerLinks, the notion of a process instance, the existence of activities, the fault/compensation/termination handlers. A fair amount of visibility into the health of the application can be derived form this alone. I am not making fancy assumptions about the management tool being able to make sense of the routing logic in the process or of the correlation rules. I am not assuming that the BPEL engine provides ways to control individual process instances. I am not assuming that the name attributes of certain elements (e.g. PartnerLink, variable) convey semantics that could help the administrator understand some of the semantics of the application.

At the end, it’s not about managing BPEL, it’s about managing an application that uses BPEL.

My point is not to push everyone to write any application as a BPEL process (or a set of them) as a way to get a great management infrastructure for free. But if BPEL is a potential choice for the application, then it’s worth considering those extra benefits in the “pros and cons” analysis. And if you have already decided to use BPEL, it may be worth looking into what management dividends you can harvest from this choice. Of course your mileage may vary depending on how manageable your BPEL infrastructure is. Hinthint…

A few related links. Todd Biske has also written about the management value of BPEL, here and here. A similar analysis can be applied to SCA, but at this point in time there are many more applications out there that use BPEL than SCA, making the former more relevant. I briefly described the SCA side of the equation in an earlier exchange with David Chappell. That discussion is summarized here (including a pointer to David’s original piece). In an earlier post, I touched on the manageability potenial of other sources of application metadata, like OGSi and Spring (in addition to SCA and BPEL). Jean-Jacques Dubray provided additional context at InfoQ.

3 Responses to BPEL as a source of application management metadata

What does “in ways that are related to what the application really does (as opposed to generic metrics like memory, CPU and I/O metrics…)” mean? Do you mean to troubleshoot the applications according to its behavior?

Yes. For example, today you often don’t realize that you have a connection leak (connections that are not used and yet not returned to the pool either) until you run into stuck threads or equivalent lack of service. You shouldn’t have to go down to these such low-level concepts for this, it should appear through application-level metrics.

In another example, you often see people having issues (response time too high, connections refused…) and it is not clear at first whether something is going wrong (there is technical problem) or you simply are getting a lot more traffic than usual traffic (nothing broken, just high demand). Because through the low-level metrics (e.g. CPU usage, memory utilization) both scenarios may look the same. A monitoring system that is centered on application-level metrics would make it instantly clear which situation you’re in.

Thanks for your explanation.
This is very interesting,especially the 2 scenarios (which low-level metrics would look the same.) I wonder how people are doing today for big web sites, such as Amazon, twitter, flick, youtube? Do they monitoring the application-level metrics? That would be a lot to look at.
Secondly, is web service widely used today for database interfaces, especially in multi-tiered web application systems? Thanks!