Monday, October 17, 2016

Microservices are loosely coupled independently deployable services. Although a well designed service will not directly operate on shared data it may still need to ensure that that data will ultimately remain consistent. For example, the requirement to debit an account to pay for an on-line purchase creates a dependency between the customer and supplier account balances and the stock database. Historically a distributed transaction has been used to maintain this consistency which in turn will employ some flavour of distributed locking during the data update phase. This dependency introduces tight coupling, higher latencies and greater lock contention, especially when failures occur where the locks cannot be released until all services involved in the transaction become available again. Whilst the user of the system may be satisfied with this state of affairs it should not be the only possible interaction pattern. A more common approach will employ the notion of eventual consistency where the data may sometimes be in an inconsistent state but will eventually come back into the desired state: in our example the stock level will be reduced, the payment has been processed and the item delivered.

I have, from time to time, seen blogs and articles that recognise this problem and suggest solutions but they seem to mandate that either service calls naturally map on to a single data update or that the service writer picks one of the services to do the coordination taking on the responsibility of ensuring that all services involved in the interaction will eventually reach their target consistent state (see for example a quote from the article Distributed Transactions: The Icebergs of Microservices: "you have to pick one of the services to be the primary handler for the event. It will handle the original event with a single commit, and then take responsibility for asynchronously communicating the secondary effects to other services"). This sounds feasible but now you have to start thinking about how to provide the reliability guarantees in the presence of failures, how to orchestrate services, storing extra state with every persistent update so that the activity coordinator can continue the interaction after failures have been resolved. In other words, whilst this is a workable approach it hides much of the complexity involved in reliably recovering from system and network failures which at scale will surely happen. A more robust design for microservice architectures is to delegate the coordination component of the workflow to a specialised service explicitly designed for this kind of task.

We have been working in this area for many years and one set of ideas and protocols that we believe are particularly suited to microservices architectures is the use of compensatable units of work to achieve eventual consistency guarantees in this kind of loosely coupled service based environment. I produced a write up of the approach and accompanying protocol for use in REST based systems back in 2009 (Compensating RESTful Transactions) based on earlier work done by Mark Little et al. Mark also wrote some interesting blogs in 2011 (When ACID is too strong and Slightly alkaline transactions if you please ...) about alternatives to ACID when various constraints are loosened and his summary is relevant to the problems facing microservice architects.

The use of compensations, coordinated by a dedicated service, will give all the benefits suggested in Graham Lea's article referred to earlier, but with the additional guarantees of consistency, reliability, manageability, reduced complexity etc in the presence of failures. The essence of the idea is that the prepare step is skipped and instead the services involved in the interaction register compensation actions with a dedicated coordinator:

The client makes service invocations passing the coordinator url by some (unspecified) mechanism

The service registers its compensate logic with the coordinator and performs the service request as normal

When the client is done it tells the coordinator to complete or cancel the interaction

in the complete case the coordinator has nothing to do (except clean up actions)

in the cancel case the coordinator initiates the undo logic. Services are not allowed to fail this step. If they are not available or cannot compensate for the activity immediately the coordinator will keep on trying until all services have compensated (and only then will it clean up)

We do not have an implementation of this (JDI) protocol but we do have an implementation of an ACID variant of it (called RTS) which has had extensive exposure in the field (and this can/will serve as the basis for the implementation of the JDI protocol). The documentation for RTS is available at our project web site. The nice thing about this work is that it can integrate seamlessly into Java EE environments and additionally is available as a WildFly subsystem. This latter feature means that it can be packaged as a WildFly Swarm microservice using the WildFly Swarm Project Generator. In this way if your microservices are using REST for API calls then they can make immediate use of this feature.

Finally, we have a solution where we allow the compensation data to be stored at the same time as the data updates in a single (one phase) transaction thus ensuring that the coordinator will have access to the compensation data. This technique works particularly well with document oriented databases such as MongoDB

Friday, June 3, 2016

It’s been available for over a month now, so some of you might have used it already. But I’m writing this post in order to give a better explanation of how to use Narayana transaction manager in your Spring Boot application.

First of all, Narayana integration was introduced in Spring Boot 1.4.0.M2, so make sure you’re up to date. At the moment of writing most recent available version is 1.4.0.M3.

Once you have versions sorted out, it’s a good idea to try it out. And in the rest of this post I’ll explain the quickstart application and what it does. After that you should be good to go with incorporating it in your code. The source code of this quickstart can be found in our GitHub repository [1].

After that Narayana will become a default transaction manager in your Spring Boot application. From then on simply use JTA or Spring annotations to manage the transactions.

Narayana configuration

Subset of Narayana configuration options is available via Spring’s application.properties file. It is the most convenient way to configure Narayana, if you don’t require to change a lot of its settings. For the list of possible options see properties prefixed with spring.jta.narayana in [2].

In addition, all traditional Narayana configuration options are also available. You can place jbossts-properties.xml in your application’s jar as well as use our configuration beans.

Quickstart explanation

Our Spring Boot quickstart [1] is a simple Spring Boot application. By exploring its code you can see how to set up Narayana for Spring Boot as well as configure it with application.properties file.

We have implemented three scenarios for you to demonstrate: commit, rollback, and crash recovery. They can be executed using Spring Boot Maven plugin. Please see the README.md for the exact steps of executing each example.

Commit and rollback examples are very straightforward and almost identical. They both Start the transaction, save the entry with your passed string to the database, send a JMS message, and commit/rollback the transaction.

Commit example outcome should look like this:

Entries at the start: []Creating entry 'Test Value'Message received: Created entry 'Test Value'Entries at the end: [Entry{id=1, value='Test Value'}]And rollback example outcome should be like this:Entries at the start: []Creating entry 'Test Value'Entries at the end: []

Crash recovery scenario starts off the same as the other two, but then crashes the application between prepare and commit stages. Later, once you restart the application, the unfinished transaction is recovered. I need to note, that in this example we’ve added a DummyXAResource in order to allow us to crash the application on the right time. Feel free to ignore it, because it is in there only for the purpose of this example.

After the application is crashed you console outcome should look like this:

Entries at the start: []Creating entry 'Test Value'Preparing DummyXAResourceCommitting DummyXAResourceCrashing the system

Performance

Alongside the usual selection of enhancements and bug fixes we have been working on sharing performance figures comparing ourselves against a selection of other projects available in the open source community with a view to checking that the release remains competitive. We haven't been particularly been working on performance enhancements, rather the development of a microbenchmark of 2PC that is fair and consistent in our environment - you will almost certainly see different numbers in your particular environment based on the spec of your machine etc but we would expect the general ranking to be consistent. The tool we have found works for us is called JMH (a micro benchmark harness created by the OpenJDK project team available from http://openjdk.java.net/projects/code-tools/jmh/).

We have attempted to configure each product on an equal footing by choosing sensible defaults for each tunable parameter and by ensuring that recovery is enabled, although we do configure Narayana with the journal store, which is our best performing transaction log storage mechanism. If you have any recommendations for other transaction managers or how to tune the configuration then please let us know so that we can update our test job.

The benchmark runs a transaction containing two dummy resources.

We will let the figures speak for themselves, suffice to say that when more and more threads are thrown at the workload we scale better showing that we have excellent control over parallelism.

Sure, when you're looking at using Swarm it's likely that at least initially you'll be coming at a problem from the perspective of Java EE, but the more you look to decompose your application into constituent (micro) services the more chances there are that you'll also start to look at functionality and frameworks that aren't necessarily just about Java EE. As we've mentioned before, STM is compatible with JTA and JTS transactions as well, as long as you understand what it means to mix-and-match them. Therefore, we've added an example of STM usage within WildFly-Swarm, which hopefully will become part of the mainline Swarm examples eventually. Take a look and give us any feedback, either in the Swarm group/IRC/Twitter or the usual Narayana routes.

We use Docker linking technology to make interaction between JTS and JacORB containers more smooth. Thus JTS container expects certain environments to be available. Not to worry though, they are automatically created by Docker.

And that is it. Now you can connect to the name server from your application and retrieve the IOR of the transaction manager. Pretty easy.

Of course, since Docker container storage is removed once the container is removed, this is not the best way to use any transaction manager as you will want to make sure your transactions are completed even in case of the system failure. To avoid such scenarios and make the transaction log reliable you have two options: mount a host directory or use JDBC object store.

Thursday, September 10, 2015

I was recently forwarded a link to an article regarding the use of Springs chained-transaction manager facility wherein the author had utilised this facility to coordinate updates to multiple one-phase resources. This gave me the opportunity to show-case a Narayana facility which has existed for many years that allows you to build something with similar a similar purpose and possibly richer properties.
What we will create is an application that uses multiple one-phase resources (for example, some hypothetical none-XA database and message queue). We will use Narayanas AbstractRecord extension mechanism to order the commit messages to the resource managers in any way that would be appropriate for the application. We will then take a look at some options for failure recovery options.

Notes:

Applications of this style (i.e. multiple 1PC) are only suited for certain classes of applications. Where possible it is almost always preferable to use 2PC resources to provide spec-compliant transactional outcomes.

The code I am going to use to demonstrate some of this is derived from a unit test in our repository but I will extract some of the code to below. I won't actually use resource managers in this example to try to illustrate the pattern as clearly as possible.

Transactional business logic

The general layout of the application follows the same pattern of any other transactional application:

// Get references to resource managers
ResourceManager1 rm1 = ...;
ResourceManager2 rm2 = ...;
// Create a transaction
AtomicAction A = new AtomicAction();
A.begin();
// Enlist resource manager in transaction
A.add(new OrderedAbstractRecord(rm1));
A.add(new OrderedAbstractRecord(rm2));
// Do business logic
// rm1.sendMessage()
// rm2.doSQL()
// Commit the transaction, the order will be defined in OrderedAbstractRecord rather than
// the business logic or AtomicAction::add() order
A.commit();

Guaranteeing the order of commit messages

The ordering of the list for transaction completion events is dictated by the RecordList class. At the most fundamental level, for AbstractRecords of the same type it is determined by the Uid returned in the AbstractRecords order method. As Uids are sequentially numbered at some level, this basically means that if you return a Uid lower to a peer, your record instance will be committed before that one.

So for example, the order of Uid you allocate to the following class will determine the order AbstractRecord::topLevelCommit() is called:

Failure tolerance properties

A final observation to make is that by using the Narayana AbstractRecord facility, it allows you to know that in the presence of a failure, during crash recovery you will receive callback where it may even be possible to re-do some of the work in the later resources.

For example, in the AbstractRecords save_state you could save some of the content of the JMS message which can then be used in a possible recovery scenario to resend a similar message.