The Process Virtual Machine

Introduction

This article will show how both business analysts and developers can benefit from workflow, Business Process Management (BPM), and orchestration. We'll explain the core essence of workflow engines in simple terms, and how this can be leveraged in a Java environment. While every self-respecting developer knows the relational model behind databases, such a model is absent for workflow engines. The Process Virtual Machine will provide that missing piece.

The Process Virtual Machine is the conceptual model that should be in every developer's repertoire, because it helps to explain all the diverse workflow engines that are out there. Furthermore, because Microsoft has come to the same approach independently (see the upcoming section The Other Side), we are convinced that it will be the foundation for all next-generation process engines. This article will clearly outline the goal and value of the Process Virtual Machine, and guide you through the most important parts of the Process Virtual Machine paper that is referenced below. You'll also learn what workflow technology is and when it makes sense in a software project.

For this article, the differences between workflow, BPM, and orchestration are not relevant, so we will refer to this collection as workflow for short. The common goal of most workflow languages is that long-running processes can be expressed in a graphical way and executed on a workflow engine. The big advantage of the graphical diagram is that developers and business analysts get to speak the same language. Typical workflow examples are:

Insurance claims

Expense notes

Hire of a new employee

But workflow technology can be applied to any aspect of software development that has state machine characteristics, as indicated by these additional examples of workflow:

Defining a coarse-grained web service as a function of other web services

Pageflow describing the navigation between pages in a web application

Job scheduling

Message queue orchestration

Traditional workflow engines have a number of flaws that the Process Virtual Machine will resolve:

Sole focus on the business analyst. Traditional workflow systems have a big focus on the graphical tooling. The idea is that a non-tech person can draw a business process. Then the workflow engine will automagically distill the software support for that process. If you think about it, do you want to put software in production created by a non-tech-skilled person? The Process Virtual Machine shows how to establish a productive collaboration between the business analyst and the developer. The number of processes that can be implemented with only point-and-click editing of a process graph is very limited. Apart from those, the Process Virtual Machine also supports the big portion of processes that require a combination of process logic and coding.

Process engines are monolithic systems. This creates a lot of complexity for deployment, testing, and coupling workflow transactions with application transactions. The Process Virtual Machine allows workflow engines to be embedded in Java applications or to be deployed in standalone mode.

One single, fixed process language. A process language is in fact a collection of graphical constructs, each with a specific, deterministic runtime behavior. In traditional engines, only one process language is supported natively, and it is impossible to plug in new process constructs. With the Process Virtual Machine, the process constructs themselves are made pluggable. This way, it can support multiple process languages. Also it can serve as a basis for building limited custom workflow languages, e.g., in a document management system.

One single environment. Each process language is typically targeted at one specific environment. Aspects of the environment could be, for example, standard Java, enterprise Java, Enterprise Service Bus (ESB), or JMS queues, with or without persistence. Most process languages, and hence most process engines, are only designed for one specific environment. We recognize that different environments require different process languages. While one process language is insufficient, we can still build the Process Virtual Machine as a single foundation on top of which these process languages can be built.

No easy binding with programming logic. In practice, it turns out that processes can provide a great backbone for the implementation of real-life business processes. But most often, it must be done in combination with other technologies like programming, for example. Would you consider developing your next project only with Java coding? No XML configuration files? No ORM mappings in Java? No ant build scripts? The same thing is true for automating business processes: process languages must be used in combination with other technologies. A workflow engine must take the whole integration with a Java programming environment into account. That is exactly what the Process Virtual Machine offers.

Lack of mindshare. The workflow market is completely fragmented today, and each engine has its own process language concepts and runtime execution concepts. The Process Virtual Machine is aimed to fill that gap. A common understanding on the basics of workflow, BPM, and orchestration should be part of every developer's repertoire.

Many aspects of software development are long-running, graph-based executions. For all of those use cases, the Process Virtual Machine can be leveraged as a base library. By using that library, we can significantly reduce the cost of building process languages. It also makes customization of process languages much more feasible.

This article is in fact the result of a collaboration between the leading open source communities, and it will take BPM, workflow, and orchestration to the next level. Red Hat (with JBoss jBPM) and Bull (with Bonita and Orchestra) have years of experience with very diverse process languages and engines. The following process languages are currently in the works on top of this single model: jPDL, XPDL, BPEL, pageflow, and threadflow.

The Process Virtual Machine combines the best ideas of finite state machines, Petri nets, and other models used for workflow.

Embeddability

Current BPM, workflow, and orchestration systems are built as monolithic engines that don't integrate well with Java software development. And yet, software projects that can be realized with only process technology are rare. In most cases, processes need to be combined with other technologies. For those use cases, it becomes vital that the runtime process engine integrates with Java on various levels: the deployment model (the whole engine as a library), transactions, and persistence in a relational database and user interface. For all of these aspects the workflow engine should fit naturally inside the application. The Process Virtual Machine has proven to provide that kind of embeddability in both the standard and enterprise Java platform.

Pluggability

The most important pluggability point is the node implementations. The runtime behavior of process constructs is implemented in Java. The Process Virtual Machine will provide an API for implementing node behaviors. If you look at it from that angle, the Process Virtual Machine is a component model for building process constructs.

To make that component model effective, the services used by the engine have to be pluggable, too, e.g., persistence, transactions, identity components, and logging. To discover how the Process Virtual Machine can be extended to support this pluggability, please refer to the full paper on the Process Virtual Machine.

The Other Side

The workflow engine market is completely fragmented, both in products and standards. Up to now, the search was mostly for the best process language. But there are so many different environments and different features that one process language will never be enough.

In Java land, the Process Virtual Machine approach is unique. But on "the other side," in Microsoft land, the approach in the Windows Workflow Foundation runs very similarly to what is being proposed in the Process Virtual Machine. Both are in fact component models. A process construct is considered a component, and an API and packaging technique are offered to code process constructs as components.

What these two technologies have in common is that they take the need for multiple process languages as a given. So they both support a component model for process constructs, which means that many process languages can be built on top of the base technology. This is a whole new shift for process engine products. Instead of creating bridges to leverage one single process language in many environments, the question is reversed: which use cases justify the creation of their own process language?

Some process languages might be general purpose, e.g., jPDL in a Java environment and BPEL in an ESB environment. But there are many opportunities for very limited and simple process languages--for example, a language to describe approval flows for enterprise content management system, a language to describe multithreaded concurrency, a language to describe pageflow navigation in a web application, and so on.

Basics

I think that is enough background to get started with the real meat. Here follow the basic principles of the Process Virtual Machine. The full paper will also describe many extensions needed to cope with all possible scenarios.

A process is a graphical description of an execution flow. For example, the procedure on processing expense notes is a process. It can be deployed in a process engine. One process can have many executions, e.g., my expense note of last Monday could have been handled by one execution of the expense note process. Figure 1 demonstrates an example process for an insurance claim.

Figure 1. An example process for an insurance claim

The basic structure of a process is made up of nodes and transitions. Transitions have a sense of direction, and hence a process forms a directed graph. Nodes can also have a set of nested nodes. Figure 2 shows how transitions and nodes can be modeled in a UML class diagram.

Figure 2. UML class diagram of nodes, transitions, and their behavior

Each node in the process has a piece of Java code associated as its behavior. Here is the interface to associate Java code with a node.

Now, let's look at the execution. An execution is a pointer that keeps track of the current position in the process graph, as indicated in Figure 3.

Figure 3. An execution points to the current position in the process graph

When a new execution is started for a given process, the initial node will be positioned in the initial node of the process. After that, the execution is waiting for an external trigger.

An external trigger can be given with the proceed(String transitionName) method on the execution. Such an external trigger is very similar to the signal operation in finite state machines. The execution knows how to interpret the process graph. By calling the proceed method, the execution will take the specified (or the default) transition, and it arrives in the destination node of the transition. Then, the execution will update its node pointer and invoke the node's behavior.

The node's behavior has access to the current state of the process through the execution that is passed in as a parameter. The extensions that are described in detail in the full paper show how, for example, variables or external services will be available through the execution.

On the other hand, the node's behavior has control over the propagation of execution. This means that the executable implementation can just behave as a wait state, continue execution, create concurrent executions, or update any information in the execution.

Let's look at two example node behavior implementations.

A Task Node

The reason why task management and workflow are so closely related is because tasks for humans often translate to wait states for a software system. Processes can easily combine software operations with human tasks in the following way.

The first thing that is needed outside of the process execution engine is a task repository, a place where tasks for people are kept. On top of this component, there is a user interface that allows for people to see their task list and complete them.

Then you can imagine the following behavior implementation of a task node. First, some external trigger has to be given (with the proceed method) so that the process starts executing and arrives in the task node. The node behavior implementation will create a new task for a given person in the task list component. That task also includes a reference back to the execution. Then, the node behavior returns without propagating the execution. This means that the execution will be positioned in the task node when the proceed invocation returns.

The taskName member field shows how configuration information that is specified in the process definition file should be injected into the behavior object.

So the execution can then be persisted while the system is waiting for the user to complete the task. After some time, when the user completes the task, the task management component will use the reference to the execution to provide a trigger. This is done with the proceed method. Then the execution resumes, leaves the task node, and continues.

A Decision Node

A decision node is different from a task node in the sense that it is automatic. A condition has to be evaluated and based on the outcome; the execution should immediately be propagated over one of the leaving transitions. That propagation of execution is done with the invocation of the proceed method at the end of DecisionNode's behavior implementation, as follows.

Concurrent Paths of Execution

The full paper on the Process Virtual Machine describes a whole set of extensions to the basic model presented above, e.g., process variables, actions, process composition, asynchronous continuations, and more. But the most important extension is probably concurrent paths of execution.

In some situations, one execution is not enough to keep track of the current state of an execution. For example, the billing part of a process might involve a number of steps and the shipping part might involve a number of steps. But shipping and billing can be done in parallel. Therefore, the execution can be extended with a parent-child relation. This means that an execution can have many child executions, each pointing to its own node in the process graph.

Asynchronous Architecture

Basically, architectures based on message queues or web services are always asynchronous architectures. Asynchronous communication means that the sender and receiver do not share the same thread. Receiving a message is done independently of sending it.

In the case of web services, the result or fault of a web service invocation is mostly returned synchronous. But the asynchronicity comes when you expect the partner to invoke your web service in response to your invocation of the partner's web service.

Now focus on one of these software systems in an architecture where many systems are communicating in an asynchronous fashion. When a message comes in, the system has to respond by sending one or more messages to other systems. Later, the system might expect related messages back.

A process can express the overall flow of related messages for one single system.

Conclusion

There are multiple process languages. Each language has its own environment and target use cases. Some languages are general purpose, and other languages might be limited and very specialized.

The Process Virtual Machine is a simple but powerful model that has proven to support all kinds of workflow, BPM, and orchestration languages. On top of that, it leads to a pluggable and embeddable design of process engines.

Current workflow technologies are focused on the business analyst only. The collaboration between business analysts and developers is largely ignored. The Process Virtual Machine gives more modeling freedom to the business analyst. Additionally, it enables the developer to leverage process technology embedded in a Java application.