2014-08-25

1 Proposed definition of the concept "capability"

English:capability, noun
[short] measure of the ability of a component to achieve a particular result
[long] measure of the proven possession of the characteristics and/or means and/or power and/or skills to achieve a particular result

Note1 : Is it necessary "... and/or power ..."?

Note 2: Capability is an attribute of a component - not a component in its own right.

component, noun
considered as a functional whole constituent part of a system

Note: component may be another system; in this case one can say subsystem

architecture, noun
fundamental orderliness (embodied in its components, their relationships to each other and the environment), and the principles governing the design, implementation and evolution, of a system

capacity, noun
specific feature of a component or an asset, measured in quantity and level of quality, over an extended period

characteristic of a component, noun
a distinguishing trait, quality, feature or property

asset, noun
something valuable that a component owns, benefits from, or has use of, in achieving a particular result

Adapted from: http://www.businessdictionary.com/definition/asset.html

business process, noun
explicitly-defined coordination for guiding the purposeful enactment of business activity flows

Note: A simple business process is an agreed plan to follow; the plan is a directed graph of (both parallel and sequential) business activities; the plan may include some variants and allow some changes.

business activity, noun
a unit of work

performance, noun
measurement that expresses how well something or somebody is achieving a particular result

key performance indicator, noun
quantifiable performance

throughput, noun
the rate at which a system achieves its goal

4 Groups

The meaning of the concept “capability” is discussed in the following LinkedIn groups:

5 Several variations in opinions about capability

Q1: Capability – a characteristic of a system (Ravi) or an element of a system (Stephen, Louise) or possibility to reconfigure a system (Lalen)? My current understanding – characteristic of a component

Q2: Capability – an internal characteristic of system or an external characteristic of a system (Christian)? My current understanding – internal characteristic (because I would like to have an opportunity to architect it)

Q3: Capability – a potential only, i.e. to-be, (Ravi, Ben) or can be also demonstrated during operations, i.e. as-is? My current understanding – both

Q5: Capability – dimensionless quantity or what is its dimension? My current understanding – mainly dimensionless but sometimes a capacity of the system (i.e. its limiting/design parameter) is used as its capability

Q6: Capability – binary value (only "yes" or "no", "capable" or "not capable") or as a continuous value, e.g. in the rage between 0 and 1? My current understanding – continuous value.

Q7: Capability –it can be applied to components; can it be applied to assets? My current understanding – capability can be applied to both components and assets.

2014-08-19

Context

In the digital age, the focus of enterprise/business/application/etc. architects is not the thing (strategy, policy, service, rule, application, process, etc.) – the focus is how the thing changes and how things change together.

In addition to being cheaper, faster, better it is mandatory to become to more agile, more synergetic (i.e. IoT), more comprehensive.

The goal of IT in the digital age is to be able to provide software-intensive solutions which are easy to evolve instead of classic monolithic applications which are difficult to evolve. This blogpost shows how to design and build process-centric solutions which are easy to evolve. Such solutions are explicit and executable aggregates of components. Aggregates are organised around business processes and components are microservices which wrap various process-related artefacts.

Note: Considering that microservices are autonomous units of functionality, one monolithic application as a big unit of deployment may become a few hundred of microservices as small units of deployment (although the size does not matter in this case).

Note: Evolution is related to the impact analysis, dependency management and optimisation.

It is considered that all artefacts are versionable and several versions of the same artefact may co-exist in the company’s computing environment. Traceability considerations are at maximum – everything (including changes and work done) is logged as records.

To achieve the versioning of artefacts it is necessary to understand how to treat relationships between artefacts (see 2.4.4 of the book).

We recommend that a system be evolved via some kind of transformation cycle as shown in Figure 1. Start with a stable configuration of approved artefacts. Then introduce a new version of the artefact B3 which is available only for one consumer (i.e. artefact A2) which has to be also versioned. After achieving higher confidence with these new versions, switch all other consumers (i.e. artefact A1) to the new version of the artefact B3. When it is considered that all new artefacts are functioning correctly, their old versions can be removed. The transformation is over and a stable configuration of approved artefacts is once again reached.

Figure 1 Figure 10.3 “Transformation cycle” from the book

In a properly architected system, you may carry out several transformation cycles at the same time.

Process-template and process-instance

A process-centric solution have several processes (actually process-template - a formal description of the process) and some stand-alone services (e.g. a stand-alone service may generate an event which launches one of the processes (actually the process-instance of a process-template - enactment of the process template).

The distinction between process template and process instance is very important. The life-cycle of process-template is controlled at design-time. The life-cycle of process instance is controlled at run-time. A process instance is created, maybe suspended & resumed and finally terminated. Many process instances may co-exist at the same time as shown in Figure 2.

Figure 2 Templates and instances

Process-centric artefacts

Process-centric artefacts and relationships between them are the following:

The business is driven by events

For each event there is a process to be executed

Process coordinates execution of activities (automated and human and sub-processes)

The execution is carried out in accordance with business rules

Each activity operates with some business objects (data structures and documents)

A group of staff member (business role) is responsible for the execution of each human activity

The execution of business processes produces audit trails

Audit trails (which are very detailed) are also used for the calculation of Key Performance Indicators (KPIs)

Also, one can read more about artefacts in chapters 7 and 11 of the book.

Events

Evolution of an event is very straightforward – just new version for any change. Usually, there is a mapping (or decision) table (implemented as a “dispatch” service – see 2.6 of the base blogpost) to provide the correspondence between events and processes. In the simplest policy, a particular event is linked to a particular process template (or a particular version of a particular process template). More sophisticated policies are possible, e.g. usage of the most recent version, time-based selected, etc.

Potential side-effects (evolving together): None at the moment (just explicitly ignore event if it does not launch any processes).

Process-template

Evolution of a process template is evolution of a composite object. The simplest policy is a very strict binding (also called “early binding”) – a particular version of the process template refers to a particular version of each components (actually microservices).

Figure 3 Early binding

More sophisticated policies are possible, e.g. process-template is using the most recent versions of each component available at the process-instance launching moment (also called “late binding”). Because, the process-template is actually a description then its versioning is not a big problem.

Potential side-effects (evolving together): As life-cycles of a particular process-template and its process-instances do not match, it is necessary to understand what should be done with the running process-instance in case of changing the process-template, although the process-template and its process-instances are different objects (similar to mother and her borne children).

Process-instance

Process-instance is a composite object and its evolution is better to avoid (like changing a running car). Evolution of a process-instance may be necessary for some legal purposes if a long-running process-instance should be modified in accordance with the evolution of related process-template. The related technique is described in http://improving-bpm-systems.blogspot.ch/2010/03/practical-process-patterns-mint.html . Of course, it is better to avoid evolution of process-instance at all, but small changes should be possible.

In practice, the main reason to evolve process-instance is for correcting various errors and exceptions, e.g. in data or in rules or in automation. If some of the components are expected to be quickly evolving or “shaky” then the relationships between the composite and these component should be indirect thus manageable externally.

Figure 4 indirect binding

Sometimes, it is necessary to create a version of an external component which must be used only by a particular process-instance. In general, all external components are re-usable from various aggregates.

Roles

Roles should be define in a suitable DSL externally from the process template and changes for a particular process instance should be possible. Usual technique is to have a set of dedicated functional roles (Responsible, Accountable, Consulted, Informed) for each human activity within a process and be able to provision these roles by various organisational and other roles externally from the process template.

Rules

Rules is a typical service which is implemented in a DSL (decision management notation). This service is stateless and easy to evolve.

Audit trails

Audit trails are easy to evolve. Ii is important to define them explicitly in processes, for example, like measurement-points. Audit trails must be kept outside the process engine in, for example, an enterprise data warehouse thus be independent from evolution of BPM suite itself. Typical process execution data (start/finish time for each activity, etc.) must be merged with some business data to associate separate process-instances which treated the same business objects.

KPIs

If audit trails are done correctly then KPIs are easy to evolve.

Human activity

Human activity is implemented as an interactive service. Sometimes, such a service is a generic tool (which is external to process-template) and such a tool should receive from the process-instance a reference to a human activity to be treated. This is an example of indirect relationship mentioned above.

Sub-processes

Typically, early or late binding is applied for selecting a version of a sub-process to be used (although it depends on capabilities of the business process engine). In the majority of situations, the late binding works fine – just remember to record the version of a sub-process template used in each invocation.

Data structures

As a good practice, business data structures are kept in a generic format (e.g. SDO) and transferred along the process as a black box. To implement some routing logic, additional technical or process-template-specific data structure is created. Bridging between business and technical data structures is done by in automated activities.

Documents

Documents are kept in external repositories, e.g. a document management system or ECM tool. They are referred via URLs and some metadata.

Automated activities

Automates activity is the most “shaky” component of the process (as an aggregate). The indirection binding which is used for them is done through a “robot” (see 2.3 in the base blogpost). Robot is a very stable service and the process-instance passes to it the name of the automation script to be executed as well as input and output parameters. The name of the automation script is a process parameter (thus changeable by the process-template administrator and the process-instance administrator) and input/output parameters are SDOs.

The typical error recovery practice discussed below. Figure 5 shows a “container” in which an automated activity “A” operates within the processes. The normal execution sequence is “E1-A-E2”. Because the automated activity may fail, the container contains the intermediate exception event "E3" and an activity for Error Recovery Procedure (ERP).

In case of failure, the recovery execution sequence will be “E1-E3-ERP-E1-A-A2”. ERP may be very trivial (just try again) or more intellectual (try three times and then ask a person to have a look at it).

In additional to exception, it is necessary to define time-out to prevent endless automated activities as shown in Figure 6.

Automation activity is an automation script which is executed by robot. Typical automation script is an aggregate (usually in an interpreted language) of several micro-services and this aggregate should be executed as a one transaction (see Figure 7).

Figure 7 Execution of an automation script by the robot

Again, normal execution sequence is “E1-A1-A2-A3-E2”. In case of failure of “A2”, the sequence will be “E1-A1-A2-E3-ERP1-E1-A1-A2-A3-E2”. The double execution of “A1” is possible because of that all micro-services are idempotent (see 2.10 in the base blogpost). If “ERP1” is a human activity then the correction of “A” automation script may be carried out within this human activity.

Note: Processes with only automated activities must be idempotent.

Of course, there is no a robot per each automated activity because the robot must be able to handle concurrently several automation scripts as the same time (as several process-instances of the same process-template may be executed at the same time). Instead, there is a queue of jobs for a group of similar robots. An automation activity of a process-instance puts an automation script into a queue and waits for a robot to execute this script and inform the process-instance that this automation activity is completed (see Figure 8).

Figure 8 Queuing of jobs for robots

The queue is shared between various process-instances and it is possible to have several specialised queues. The queue size and robots are monitored.

In some sense, robots work as humans – wait for a job from process-instances, execute jobs when they can and inform a particular process-instance that a particular job is completed.

Conclusion

The describe approach was used (since the year 2000) for a production system comprising about 3 000 complex products per year, 50 persons, about 50 different activities, 3 production chains, 6 repositories and 40 IT services (actually, a couple hundred of micro-services). The system was in place for several years. The maintenance and evolution of this production system required several times less resources. Also, several successful (and easy to do) migrations of its big components were undertaken.

Owns its own data storage (AS: a microservice may have its own data storage)

2Implementation techniques for
process-centric solutions

Note, you may want to glance at the chapters 9, 10 and 11 (which provide some information about BPM) before reading this chapter.

2.1Guiding principles

Speed of developing automation is the primary factor of agility of a process-centric solution.

Automation and process template have different speed of changes – keep automation outside the process template.

Automation may be long-running and resource-consuming.

Automation may and will fail.

Failures maybe because of technical (no access to a web service) or business (missing important data) reasons.

Recovery after failure should be easy.

Automation’s problems (failures, resource consuming) must not undermine the performance of process engine.

2.2Interpretive languages

Business routines are usually built on existing APIs to access different enterprise systems and repositories. They look like scripting fragments to manipulate some services and libraries. Thus, a combination of interpreted and compiled static programming languages will bring extra flexibility – interpreted language for “fluid” services (business routines) and compiled language for “stable” services (libraries, business objects, data). Examples of such combinations are: Jython and Java, Groovy and Java, etc. In combining them, it is important to use the strong typing to secure interfaces, enjoy introspection, and avoid exotic features.

2.3Robot as a generic microservice

Keeping microservices for “business routines” outside the process description allows some quick modifications even within a running process instance. The execution of such microservices can be carried out by a universal service which receives a reference to a text fragment to be interpreted, fetches this text fragment and interpret it. We call this service “robot”; universal robots and specialised robots may co-exist. Robots must be clonable (for scalability, load-balancing and fault-tolerance).

A crash of a robot will not disturb the process engine except that the activity, which caused the crash, will be marked in the process instance as “late” or “overdue”.

2.4 Monitoring

Ruthless monitoring of all services (including robots, other systems and repositories).

Not just checking that a port is bound, but asking to do a real work; for example, echo-test.

Service should be developed in the way to facilitate such a monitoring.

System should be developed in a way to facilitate such a monitoring.

Also, robots proactively (before executing automation scripts) must check (via monitoring) the availability of services to be used in a particular automation script.

It is better to wait a little than recover from an error.

2.5 Explicit versioning of everything

The intrinsic separation between process template and individual process instance in process-centric solutions allows the use the full power of microservice versioning. A lot of variants are possible:

Process instance may use the “current” version of a particular microservice.

Process instance may use the particular version of a particular microservice.

In case of some compliance requirement:

Since 1st of April all new process instances will use process template v2

Already running process instances must remain at process template v1

Some already running process instances will remain at process template v1 (if those instances are close to the completion)

Some already running process instances may be migrated to process template v2 (if those instances are far from the completion)

Thus everything (process templates, XSD, WSDL, services, namespaces, documents, etc.) must be explicitly versioned and many versions of the “same” should easily co-exist. We also recommend to use the simplest version schema – just sequential numbering: 1, 2, 3, etc.

2.6 Use of other types of coordination in addition to classic process templates

Business rules is another DSL which is very popular in BPM. We recommend to follow the TDM approach ( see http://www.kpiusa.com/ ).

We recommend centralising the treatment of important business events (all external ones and some internal ones) as one service called “dispatch”. The “dispatch” service analyses business events and decides which business process should be initiated. Each process should send to this service an internal business event when the work has been completed (see Figure 1).

2.7 Pattern PDP (Pre-processing, Doing, Post-processing)

pre-processing or preparation, e.g. receipt of information from various sources in different formats, or from different repositories, and conversion into a standard presentation;

doing or processing, i.e. data or information processing in accordance with a standard presentation;

post-processing or finalisation, e.g. conversion from a standard presentation into a particular presentation.

Figure 2 The PDP pattern

Note, the PDP pattern may be used at the scale of the whole processes.

2.8 Pattern AHA (Automated, Human and Automated)

The AHA pattern is a variant of the PDP pattern aimed at facilitating human work, e.g. collection of data and maybe documents for a human activity (in the same way as a good assistant prepares documents for his/her boss) followed by automation of the follow-up activities. We recommend using this pattern to model all intellectual and verification human activities (see Figure 3).

Figure 3 The AHA pattern

Although in some cases the analysis may define that the pre- or post-processing activity is empty, we recommend that these activities are always inserted – in this way the addition of some automation later will be easy because no changes to the process will be required.

2.9 Pattern ERL (Error Recovery Loop)

Any service invoked within a process may fail. The error must be acted upon in some way, e.g. to re-invoke a service, or to suspend or terminate the process. Figure 4shows a possible approach to treat a service failure – here we ask a human to do something to correct the service and then re-invoke the service. In this diagram we consider that the activity Service returns an error flag which is analysed in the gateway G01.

Figure 4 The ERL pattern (with error return)

If the activity Service raises an exception then the diagram should be as shown in Figure 5.

Figure 5 The ERL pattern (with exception). Note, after “Error recovery” activity the execution continues from the end of respective sub-process, i.e. just before the gateway “G01”.

Activity “Error recovery” may be a human activity for a person who is responsible to carry out necessary corrections actions. Depending on the kind of error, this activity may be assigned to different people.

2.10 Pattern IRIS (Integrity Reached via Idempotency of Services)

To achieve integrity within a process, shall we use the ERL pattern “around” each invocation of a service or not? In general yes, but idempotent services can be grouped (as shown in Figure 6). Idempotency of a service means that it can be invoked many times with the same effect. Any state-less service is idempotent. Some state-full services can have this quality also, e.g. a service to add a new version to a document may ignore the request if the most recent version of this document is exactly the same as the requested one.

The process in Figure 6 may have the following audit trail:

Activity01 – finished

Activity02 – failed and raised an exception

Error Recovery – did something

Activity01 – finished again thanks to idempotency

Activity02 – finished

Activity03 – finished

Figure 6 The IRIS pattern. Note, after “Error recovery” activity the execution continues from the end of respective sub-process, i.e. just before the gateway “G01”.

Note, idempotence (pron.: /ˌaɪdɨmˈpoʊtəns/eye-dəm-poh-təns) is the property of certain operations, that can be applied multiple times without changing the result beyond the initial application.

Certainly, I can see a lot of similarities between microservices architecture and process-centric solutions in Figure 8 which is from my book about BPM ( www.samarin.biz/book ), published in the year 2009.

The question is how to coordinate separate microservices. The obvious choice is ESB (as shown in Figure 9).

Figure 9 Flow of data

This means that all microservices should be on this picture with potential connectivity everyone to everyone which has the N*(N-1)/2 complexity. Where N is number of microservices resulting in “explosion” of an application. We estimate this number at about 100 per application (or 300 from http://www.infoq.com/interviews/goldberg-microservices).

Where to keep the state for this composite service (i.e. ex-application)? If in ESB then this makes ESB too complicated.

Is ESB cloud-friendly? Just imaging a re-start of the VM with the ESB.

It seems that ESB is necessary but not sufficient. What is missing? We believe that the flow of control is more important than the flow of data (as shown in Figure 10).

Figure 10 Flow of control

In the former, the primary importance is exchange of data. In the latter, the primary importance is the result of working together, but not individual exchanges of data (like in football). Of course, both are necessary, but only ESB is not enough. Considering that more than one coordination techniques may be used by a solution then Figure 11 is more realistic.

Figure 11 Several coordination techniques

The issues (complexity, state and cloud) are answered as following:

Complexity is much lower because only “business routine” services (which are interacting with the process) are depicted.

This classification helps to understand which microservices may be provisioned from clouds.

4 Easy recovering from errors (by design)

We all know that the main difference between a monolithic applications and distributed solutions is in the error recovery practices. We need distributed solutions because of the scalability, fault-tolerance and cloud-based provisioning. At the same time, we have to architect the recovery from losing connectivity between nodes and service failure (VM reloading or note failure).

If a subordinated service (relatively to the coordination service) has failed then the coordination service will recover via error recovery loop (see 2.8 and 2.9).

If the coordination service has failed then some of running its subordinated services cannot complete their associated activities; after the restart of the coordination service, those activities will fail by timeout (because each activity has its SLA).

If a resource may change its state without the control of the process then the process must interrogate the state of such a resource before its usage.

Because of processes which provide clear and detailed context, the identification of problems is very quick.

5 Defining microservices

BPM helps to provide context, define, coordinate microservices. It helps to eliminate endless discussions about the necessary “granularity” of the services:

“If we select a top-down style then we will create coarse-grained business-related services, but we are not sure whether such services are implementable or reusable. If we follow a bottom-up style then we will implement too many fine-grained services for which the business value is not obvious.”

Actually, the native flexibility business processes and explicit versioning allow the rapid and painless adaptation of services to increase or decrease their granularity. Any wrong decisions are easily corrected; services are quickly adapted to the required granularity.

8.4 Smart endpoints and dumb pipes

Sure, ESB just a reliable communication mechanism without any business intelligence. Everything is happened in services, even process-centric coordination.

8.5 Decentralized Governance

Sure.

8.6 Decentralized Data Management

Sure again.

8.7 Infrastructure Automation

Yes, also process provides the context for services thus test cases. Process itself is an integration test for its services.

8.8 Design for failure

Sure.

8.9 Evolutionary Design

There are several tempos of design: process, process-specific microservices, common microservices, common operating environment (testing, deployment, monitoring, etc.) and the overall architecture.

Processes make easier to use the power of total versioning. Thus process-specific microservices should mature very quickly.

9 Briefly about Business Process Management (BPM)

BPM (see Figure 12) is a trio: 1) discipline how to better manage an enterprise, 2) COTS and FOSS tools known as BPM suite and 3) an enterprise portfolio of the business processes as well as the practices and tools for governing the design, execution and evolution of this portfolio.

Figure 12 BPM as a trio

The key concept of BPM is business process which is explicitly-defined coordination for guiding the purposeful enactment of business activity flows. In other words, a business process is an agreed plan which is followed each time a defined sequence of activities is carried out; the plan may include some variants and will possibly allow for some unplanned (i.e. unanticipated) changes. (see other BPM-related definitions http://improving-bpm-systems.blogspot.ch/2014/01/definition-of-bpm-and-related-terms.html ).

For software architects, it is important to know BPM consider business processes explicit (i.e. formally defined to be understandable by different participants) and executable (conceptually, the process instance executes itself, following the BPM practitioner’s model, but unfolding independent of the BPM practitioner; process instances are performed or enacted, which may include automated aspects).

10 Structuring executable processes and services

An executable process coordinates the execution of some services. Such a process is expressed in a particular language (i.e. BPMN) and it invokes some services. In Figure 13, the process is in the pool “COOR”, interactive services are in the two pools above it and automated services are in the two pools below it. Note, BPMN is a typical DSL.

Figure 13 Process coordinates some services

This is a classic picture, but how to bring microservices to it?

Each enterprise is a complex, dynamic, unique (for each enterprise) and recursive (i.e. like “Russian doll”) relationship (see Figure 14) between services and processes:

All processes are services

Some operations of a service can be implemented as a process

A process includes services in its implementation

Figure 14 Recursive nature of relationship between processes and services

Thus, some “big” services are implemented as explicit and executable processes until only microservices are used.

The relationship does not force to have a “pure” structure, but brings flexibility of converting processes to services and vice-versa as necessary, e.g. to use services provisioned from cloud (as shown in Figure 15).

11 Multi-layered structuring of process-centric solutions

Because a process coordinates various business artefacts , e.g. “Who (roles) is doing What (business objects), When (coordination of activities), Why (business rules), How (business activities) and with Which Results (performance indicators)”, these artefacts can be structured around processes.

This structure arranges different artefacts on separate layers as shown in Figure 16. Each layer is a level of abstraction of the business and addresses some particular concerns.

Each layer has two roles: it exploits the functionalities of the lower layer, and it serves the higher layer. Each layer has a well-defined interface and its implementation is independent of that of the others. Each layer comprises many services that can be used independently – it is not necessary that all layers be fully implemented at the same time or even be provided in a single project.

Another practical observation is that different layers have lifecycles of different time scales: typical repositories have a 5- to 10-year life-span while the business requires continuous improvement. Because of the implementation independence of the different layers, each layer may evolve at its own pace without being hampered by the others.