Monday, February 12, 2018

Deploying systems with circular dependencies using Disnix

Some time ago, during my PhD thesis defence, one of my committee members asked me how I would deploy systems with Disnix in which services have circular dependencies.

It was an interesting question because Disnix defines dependencies between services (that typically involve network connections) as inter-dependencies that have two properties:

They allow services to find services they depend on by providing their connection properties

They ensure that any inter-dependency is activated before the service itself, so that no failures will occur because of missing dependencies -- in Disnix, a service is either available or unavailable, but never in a broken state due to missing inter-dependencies at runtime.

In a system with circular dependencies, the ordering property is problematic -- it is impossible to activate one dependency before another without having broken connections between them.

During the defence, I had to admit that I have never deployed such systems with Disnix before, but that there were a couple of possible solutions to cope with such constraints. For example, you can propagate properties of the distribution model directly to a service, as opposed to declaring circular inter-dependencies. Then the ordering requirement is not enforced.

I also explained that systems should not have any hard cyclic requirements on other services, but instead compose their (potential bidirectional) communication channels at runtime. Furthermore, I explained that circular dependencies are bad from a reuse perspective -- when two services mutually depend on each other, then they should ideally be one service.

Although the answer sufficed (e.g. it provided the answer that it was possible), the solution basically relies on unconventional usage of the deployment tool. Recently, as a personal exercise, I have decided to dig up this question again and explore the possibilities of deploying systems with circular dependencies.

Chord: a peer-to-peer distributed hash table

When thinking of an example system that has a circular dependency structure, the first thing that came up in my mind is Chord: a peer-to-peer distributed hash table (a copy of the research paper written by Stoica et al can be found here). Interesting fact is that I had to implement it many years ago in the lab course of the distributed algorithms course taught by another member of my PhD thesis committee.

A Chord network has circular runtime dependencies because it has a a ring structure -- in a network that has more than one node, each node has a successor and predecessor link, in which no node has the same predecessor or successor and the last successor link refers to the first node:

The Chord nodes (shown in the figure above) constitute a distributed peer-to-peer hash table. In addition to the fact that it can store key and value pairs (all kinds of objects), it also distributes the data over the nodes in the network.

Moreover, its operations are decentralized -- for example, when it is desired to search for an object or to store new objects in the hash table, it is possible to consult any node in the network. The system will redirect the caller to the appropriate node that should host the data.

Various kinds of implementations exist of the Chord protocol. The official reference implementation is a filesystem abstraction layer built on top of it. I experimented with the Java-based OpenChord implementation that is capable of storing arbitrary serializable Java objects.

More details about the implementation details of Chord operations can be found in the research paper.

Deploying a Chord network

One of the challenges I faced during the lab course is that I had deploy a test Chord network with a small collection of nodes.
At that time, I had no proper deployment automation. I ended up writing a bash shell script that spawned a collection of processes in parallel.

Because deployment was complicated, I never tried more complex scenarios than running a small collection of processes on a single machine. Because it was not required for the lab course to do more than just that I, for example, never tried any real network communication deployments in which I had to distribute Chord nodes over multiple computer systems. The latter would have introduced even more complexity to the deployment process.

Deploying a Chord network basically works as follows:

First, we must deploy an initial node that has no connection to a predecessor or successor node.

Then for each additional node, we call the join operation to attach it to the network. As explained earlier, a Chord hash-table is decentralized and as a result, we can consult any node we want in the network for the join process. The join and stabilization procedures decide which predecessor and successor a new node actually gets.

There are various strategies to join additional nodes to the network, but I what I ended up doing is using the initial node as a bootstrap node -- all successive nodes, simply join to the bootstrap node and the network stabilizes to become a ring.

(As a sidenote: you could argue whether this is a good process, since the introduction of a central bootstrap node during the deployment process violates the peer-to-peer contraint, but that is a different story. Obviously, you could also think of other bootstrap strategies but that is beyond the scope of this blog post).

Automating a Chord network deployment with Disnix

To experiment with a Chord network, I have decided to create a simple server process (using the OpenChord API) whose only responsibility is to store data. It can optionally join another node in the network and it has a command-line interface allowing me to conveniently specify the connection parameters.

The deployment strategy using the initial node as a bootstrap node can be easily automated with Disnix. In the Disnix services model, we can define the bootstrap node as follows:

As can be seen in the above Nix expression, the dependsOn attribute specifies that the node has an inter-dependency on the bootstrap node. The inter-dependency declaration provides the connection settings of the bootstrap node to the command-line utility that spawns the service and ensures that the bootstrap node is deployed first.

By providing an infrastructure model containing a number of machines and writing a distribution model that maps the node to the machine, such as:

Defining services with circular dependencies in Disnix

As shown in the previous section, the ring structure of a Chord hash table is constructed at runtime. As a result, Disnix does not need to manage any circular dependencies. Instead, it only has to know the dependencies of the bootstrap phase which are not cyclic at all.

I was also curious whether I could modify Disnix to properly define circular-dependencies, without any workarounds such as directly propagating properties from the distribution model. As explained in the introduction, inter-dependencies have two properties in which the second property is problematic: the ordering constraint.

To cope with the problematic ordering property, I have introduced a new property in the services model called: connectsTo allowing users to specify inter-dependencies for which the ordering does not matter. The connectsTo property makes it possible for services to define mutual dependencies on each other.

As an example case, I have extended the Disnix composition examples (a set of trivial examples implementing "Hello world" testcases) with a cyclic example case. In this new sub example, I have created a web application that both contains a server returning the "Hello world!" string and a client displaying the string. The result would be the following screen:

(Does it look cool? :p)

A web application instance is capable of connecting to another web service to obtain the "Hello world!" message to display. We can compose two web application instances that refer to each other to accomplish this.

As may be observed in the above code fragment, the first service has a dependency on the second, while the second also has a dependency on the first. They are allowed to refer to each other because the connectsTo property disregards ordering.

By mapping the services to a network of machines that have Apache Tomcat hosted:

We end-up with a deployment architecture of two services having cyclic dependencies:

To produce the above visualization, I have extended the disnix-visualize tool with support for the connectsTo property that displays inter-dependencies as dashed arrows (as opposed to solid arrows that denote ordinary inter-dependencies).

In addition to the option to specify circular dependencies, the connectsTo property has another interesting use case -- when services have inter-dependencies that may be broken, we can optimize the duration of an upgrade processes.

Normally, when a service gets upgraded, all its inter-dependent services will be reactivated. This is an implication of Disnix's strictness -- a service is either available or unavailable, but never broken because of missing inter-dependencies.

However, all the extra reactivations in the upgrade phase can be quite expensive as a result. If a link is non-critical and it is permitted to be down for a short while, then redeployments can be made faster.

Conclusion

In this blog post, I have described two deployment experiments with Disnix involving systems that have circular dependencies -- a Chord-based distributed hash table (that constructs a ring structure at runtime) and a trivial toy example system in which two services have mutual dependencies on each other.

Availability

The newly introduced connectsTo property is part of the development version of Disnix and will become available in the next release.