Friday, November 28, 2008

An old friend of mine started talking about Guerilla SOA a few years back and got into some interesting debates around it. Ignoring the fact that some of G-SOA is flack about the need (or not) for vendor specific SOIs (always a good thing to push if you're working for a consultancy firm), one of the important aspects that I take from this is the incremental approach to SOA. Whatever you call it, the concept of taking things one step at a time when you approach SOA rather than trying to do a big bang approach is fairly obvious and intuitive. We're used to working in that way for many different things: you don't have to know all of Java in order to write HelloWorld; you don't have to understand how your car engine works in order to drive it etc. etc.

This incremental approach to SOA is something we've been pushing since the very early days of JBossESB and it's embodied in the latest release of our SOA Platform. The feedback we've been getting is overwhelmingly positive, so we've also been doing this within Overlord. I'm not sure if we'll call this Guerilla Governance (G-Governance?) but the principles are the same: start small and build up as you gain experience and as your needs evolve. A flexible SOA Governance infrastructure should adapt to you while at the same time encouraging good governance practices so that you can adapt to it. This is why we've started with things like DNA/Guvnor and Process Governance, as they satisfy well defined needs on an independent basis but also within the whole Overlord infrastructure as and when we get there. SAMM is a little more invasive as far as governance is concerned, as is the forthcoming work on Policy Enforcement Points and SLAs. But more on them in the future.

So if you're looking to deploy your favourite SOI and are concerned about governance, don't think you necessarily have to buy in a complete solution immediately. (I'm not even sure there is such a thing as a complete solution!) Look for something that can grow with you and even better, something that you can influence with your own policies and approaches to governance.

Monday, September 1, 2008

For instance, how does SAM related to Business Service Management (BSM)? Well according to the common definitions of BSM, it is aimed at helping users (e.g., administrators, project managers) inform their management software which services, tasks etc. are the most important for the business. BSM then enables them to correlate the performance and availability of those systems with their business goals, identifying when an application, service etc. is not behaving as expected. Through some magic, the BSM identifies the cause(s) of the breach in contract and how to fix it (them).

So is SAM a solution to BSM? Not quite. But it should be easy to see how SAM can be a critical component in the development of BSM.

What about infrastructure monitoring? Well we did cover this in the very first posting about SAM. In essence the answer is the same as for BSM: SAM should be at the core of all infrastructure monitoring, receiving and collating information (data streams) from components, routers, processors, etc. Are these data streams real-time? Well let's ignore the fundamental problems with the limitation of the speed of light and simultanaeity. (Everyone should be made to read Lamport's classic paper on the subject as it relates to computing.) Let's also ignore the differences between hard real-time and soft real-time. The answer is yes AND no: of course most of the information streaming in to the SAM implementation will be coming "as it happens", but we're allowed to provide archival data that may have been taken hours, days or weeks previously. In fact, in order to support the right kind of correlation, archival data, whether provided by the SAM user or taken by SAM itself (since it is based on CEP principles), is critically important. This is also where the Bayesian Inference Network aspect to SAM comes in too. No longer will strict binary triggers be sufficient for the kinds of networks we see today and in the future.

So what constitutes the data of an event? In fact an event message contains multi-dimensional data that we need in order to correlate and visualize it. Of course there has to be a way of distinguishing "when" something (event) happened. That could be explicitly mentioned in the data stream or implied by the local time at which the message is received. Then we need to figure out "what" is being mentioned, i.e., what data is being analyzed. Which brings us to the actual data. Of course one message may contain multiple readings, e.g., the temperature at a sensor as recorded over 5 different intervals. Given all of this information (which is common to most other monitoring techniques), we can start to build up a map of what is going on in the system and trigger on any desired event, even if triggering requires correlating across a multitude of input streams. (In order to make the architecture symmetrical, we'll actually consider time as an input stream to SAM as well.)

But what about visualization? Well we've already seen the start of what we want to do with SAM. The BPM console is just one graphical view on to the data that we are accumulating and correlating. But there will be the equivalent of a BAM console: the SAM console. How you would view this information to obtain the best and most intuitive representation is an ongoing effort in its own right. For example, customizable home pages, displaying the most important graphs, data etc. per user. The notion of mimic diagrams will also be interesting to explore. In all likelihood, because of the inherent flexibility of the SAM infrastructure it'll be impossible to cater for all of the different ways in which the information may be displayed, so there'll need to be a combination of common out-of-the-box views as well as a toolkit of components that allow for the easy construction of other views.

In a later blog posting we'll look at how SAM can be important in the support of cloud computing.

GWT is already quiet modular and it will allow for integration of consoles across projects. Another side effect is, that you can easily take a GWT application, or pieces of it, and hook it up with existing web applications. For instance this would allow users to embed the task management functionality of the BPM console with their own intranet.

Process instance details

Improve on BAM and BI functionalityProbably the biggest drawback of the current console is lack of BAM and BI features.

Workload overview

Improving on BAM and BI is not going to happen within a day, but you could expect to see the first metrics and stats in early releases and we try to add more bits and pieces while we move towards a full fledged BAM console. Because this going to overlap in both functionality and technology with the Service Activity Monitoring project, interested readers should have an eye on SAM as well.

Performance metrics

How to move forwardTo begin with, we are going to provide a replacement for the existing jBPM console based on GWT. It will retain the current features and provide additional BI functionality. Initially we are going to leverage the existing jBPM3 backend and then gradually enrich it with SAM components or even replace it at all.

Process Graph View

Stay tuned. Next time we dive into implementation details: gchart, gwt and gwt-ext.

Friday, August 8, 2008

Peter: "SAM means service activity monitoring". Paul: "So, you are going to monitor service activities"Peter: "Right. That's what I said."Paul: "But it's two different things, isn't it?"

A typical hallway chatter. You just want to grab some coffee and then you bump into somebody, who completely turns your world upside down. It's a small question, but in this case an important one.

The last week I spend thinking about SAM infrastructure, basically getting an idea on how SAM might fit into different application/service landscapes. So you have SAM "monitoring" what your services do, and you put instrumentation code pushing "events" to SAM. Anything that might potentially be interesting to SAM, but it should figure our itself what's considered relevant. Hence the CEP backup behind it, right?

Fortunately I got distracted by some console work (moving the jBPM console to GWT), which made me think more thoroughly about what information will be relevant to people using the BPM console. There is process definitions, from which you create process instances, which again consists of work items (aka nodes). You can't tell upfront what path of execution will taken, neither you know what particular node types do. At least from a monitoring perspective, you are forced to keep a more general look onto things. Coming from the technical side, this quiet naturally leads to a static view onto things: Process start and end dates, overall number of executions, average execution time. These sort of things.

But is it really what console users are interested in? If I put on my "chief of sales operations EMEA" hat, I would probably like to see key performance factors relevant to my business. This might be anything from missed opportunities to SLA breaks of certain partners. But what are key performance factors?Especially without knowing the business domain beforehand, how should I (the console developer) enable "chief of sales" to get the information relevant to him?

Let's look at the catchy title again: If I would have known upfront what you are going to ask me, I could have prepared myself. But I don't, and I never will.I am just "Service Activity Monitoring" and you are monitoring service activities.

Now that we know "chief of sales" actually exists, we can start wondering about what actually enables him (or her) to do a successful job. IT is just the tool set here. If we do that, we will probably see that any business is goal driven (did we achieve it or not?), it has a number of participants (partners, systems, colleagues) and suffers from derivations (exceptions, compensation). To "chief of sales" the underlying IT infrastructure is irrelevant. It exists and enables business, that's it.

Back to SAM. To SAM the infrastructure exists but is irrelevant, too. Things that happen inside or outside your company can be relevant to the business. SAM just makes it visible to you and raises your attention.

Paul: "Great, sounds like a match."Peter: "Yes, it is"Paul: "But what's with the BPM console then?"

Well, the work on the BPM console made me realize that "chief of sales" questions are not restricted to BPM. It covers anything that enables particular business aspects. IT is just an implementation detail here. And so is BPM. Or ESB, or web services. But still all of these SOA components will provide information relevant to the business. And so does the BPM engine.

For the BPM console it means, that it doesn't need to be tight to the BPM domain. I believe that to a large degree general service information (goals, participants,derivations) which SAM can deliver would be sufficient. However, we may still add pluggable console components that are specific to BPM or even proprietary to jBPM. But to begin with, we should focus on a general monitoring concept.

Thursday, June 26, 2008

We started integrating WS-CDL into our design and runtime processes a while back. This work became one of the defining (and differentiating) factors behind our governance efforts (and therefore Overlord). Some people (users and analysts) just "get it" and understand the need behind CDL (let's drop the WS component to the name, because CDL is not limited to SOAP/HTTP by any means). However, others don't and still others ignore it entirely. Best case, this is a shame. Worst case, this is compromising integrity of the systems they develop.

Steve Ross-Talbot recently gave a presentation on CDL recently at the Cognizant Community Europe workshop, and used the analogy of a house architect to explain where CDL fits in. This is a good analogy, because CDL should be in any good Enterprise Architect's repertoire. Just as you don't throw together a straw-built house from a pencil drawing on the back of a napkin and expect it to withstand a hurricane, neither should you just cobble together components or services into a distributed system (irrespective of the scale) and expect it to be correct (and provably correct at that). In the housing example you would pull an architect into the solution and that architect would use best practices that have been developed over centuries of collective experience, to design a building that can withstand 100 mph winds. Software engineering should be no different. Some sectors of our industry have been able to get by with computing as an art rather than a science, and house designers did pretty much the same thing thousands of years ago. But we don't live in caves any more for good reasons (although there's still something to be said for using caves in a hurricane!)

Of course it means that there are more layers in between the act of deciding what needs to be done and actually realising that in an implementation, but those layers are pretty important. The days of just throwing something together and assuming it'll work as planned are well and truly over. Asynchronous systems, which really began life several decades ago but were muzzled by layers of synchronous abstractions, are back to stay. Yes, synchronous is easier to understand and reason about, but it's an unfortunate reality that if you want scale, real-time, loose coupling etc. we have to break through the synchronous barrier. That has a knock-on effect on how you design your systems and individual components (services) and ultimately how they are managed (by a person or by some autonomic mechanism). "Design for testability" was a buzz-phrase from many years ago. What we need now (and what CDL integration gives us) is "design for correctness".

Monday, June 23, 2008

The term Business Activity Monitoring (BAM) is used to describe the real-time access to critical business performance metrics in order to improve the efficiency and effectiveness of business processes. Real-time process/service monitoring is a common capability supported in many distributed infrastructures. However, BAM differs in that it draws information from multiple sources to enable a broader and richer view of business activities. BAM also encompasses Business Intelligence (BI) as well as network and systems management. Plus BAM is often weighted toward the business side of the enterprise.

Within a distributed environment (and many local environments) services are monitored by the infrastructure for a number of reasons, including performance and fault tolerance, e.g., detecting when services fail so that new instances can be automatically started elsewhere. Over the years distributed system implementations have typically provided different solutions to specific monitoring requirements, e.g., failure detection (or suspicion) would be implemented differently from that used to detect performance bottlenecks. For some types of event monitoring this leads to overlap and possible inefficiencies. For instance, some approaches to detecting (or suspecting) failures may also be used to detect services that are simply slow, indicating problems with the network or overloaded machine on which the service resides. But where these ad hoc approaches have differed from BAM/BI is in their intended target audience: other software components (e.g., a load balancer) rather than humans.

This separation of audience is useful from a high-level perspective: business analysts shouldn't have to be concerned about low-level infrastructural details. But in many cases this ad hoc (bolt-on) approach to BAM and BI can lead to less information being delivered to the entities that need it at the time they need it. Therefore, within the Overlord project we are working on Service Activity Monitoring (SAM) and associated Service Intelligence (SI), which will provide an architecture (and corresponding infrastructure) that brings together many different approaches to entity monitoring within distributed systems (where an entity could be a service, a machine, a network link or something else entirely) and particularly SOIs. The emergence of event processing has also seen an impact on this general entity monitoring, where some implementations treat failure, slowness to respond etc. as particular events. This uniform monitoring includes the following:

• Message throughput (the number of messages a service can process within a unit of time). This might also include the time taken to process specific types of messages (e.g., how long to do transformations).• Service availability (whether or not the service is active).• Service Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR).• Information about where messages are sent.

The information is made available to the infrastructure so that it may be able to take advantage of it for improved QoS, fault tolerance etc. The streams may be pulled from existing infrastructure, such as availability probing messages that are typically used to detect machine or service failures, or may be created specifically for the SAM environment. Furthermore, streams may be dynamically generated in real-time (and perhaps persisted over time) or static, pre-defined information, where the SAM can be used to mine the data over time and based on explicit queries.

With the advent of SAM we will see BAM implementations that are built on it, narrowing the types of events of interest for the business analyst. The SAM approach offers more flexibility and power to monitoring and management over the traditional BAM approaches. As BPM and SOA move steadily towards each other, this kind of infrastructure will become more important to maintaining agility and flexibility.