Page not found – Ben Morris. Software architect.https://www.ben-morris.com
Ben Morris is an agile software architect building web sites, services and systems integrations.Tue, 19 Feb 2019 08:04:27 +0000en-UShourly1https://wordpress.org/?v=5.0.3Now that Kubernetes is on Azure, what is Service Fabric for?https://www.ben-morris.com/azure-service-fabric-kubernetes/
Sat, 09 Feb 2019 09:48:29 +0000https://www.ben-morris.com/?p=2814Now that Kubernetes has gained traction in Azure there is some confusion over what Service Fabric is really for. Is it an orchestrator for microservices? A means of lifting and shifting legacy applications into the cloud? An application development framework?

Large areas of the Azure infrastructure have been built out using Service Fabric, but that doesn’t necessarily make it an ideal candidate for building modern, cloud-native applications. Microsoft may have eaten their own dogfood internally, but the recent popularity of Azure Kubernetes Service implies that Service Fabric is may not be an ideal choice for running container-based applications.

One of the sticking points of Service Fabric has always been the difficulty involved in managing a cluster with patchy documentation and a tendency towards unhelpful error messages. The recent public preview of Service Fabric Mesh offers a PaaS-based implementation of Service Fabric which may eliminate much of this operational baggage. However, it doesn’t change the experience of building applications using the underlying SDKs.

Building applications for Service Fabric

The main problem with targeting applications for Service Fabric is that they will lack portability. The Service Fabric SDK is incredibly opinionated. If you commit to Service Fabric, you will be tied into a specific SDK and application server for good. This is some way from the kind of cloud-native, twelve factor applications typically associated with container-based development.

Service Fabric isn’t directly comparable to container
orchestrators such as Kubernetes as it is more of an application server that supports a specific style of distributed system. This is based on applications, which serve as the boundary for upgrades in that
different application versions can be run on the same cluster. The Service
Fabric runtime manages the individual nodes in the cluster, providing
capabilities to deploy, scale and monitor individual applications.

These applications can be composed of any number of services. The SDK encourages application
code to be combined with deployment details in a single Visual Studio solution
which is deployed to the cluster. The overhead of setting up a new solution tends
to encourage larger applications made up of several services, rather than more
numerous, isolated microservices.

Native Service Fabric services are based on very specific styles of implementation. Reliable services can be either stateless where state is managed externally, or stateful where state is managed by the Service Fabric runtime. Both types of service require base classes such as StatelessService to define their entry points, coupling them to the underlying SDK.

Service Fabric also offers reliable actors, a pattern where state and behaviour are combined into small, isolated units. The runtime looks after the persistence and life-cycle and persistence of these actors. The approach can be a good fit for scenarios that require large numbers (i.e. thousands or more) of these units, such as shopping carts or user sessions.

The catch is that reliable actors tie services irrevocably to a very specific application server. They can also be abused in much the same way as any shared database, often being used to maintain global state, provide a cache or even acts as a queue. There are usually other ways to solve the problem that don’t involve such a strong platform lock-in.

Using Service Fabric as an orchestrator

Service Fabric also supports two other types of application: guest executables and containers. Guest executables provide a mechanism for running legacy windows applications in the context of a service orchestrator. Hey presto – instant microservices. Kind of.

Service Fabric’s container support provides a lifeline for those who have made a commitment to Service Fabric yet are looking to get into container-based services. The catch is that containers in Service Fabric do feel like second class citizens. The process of configuring and running containers in Service Fabric does not compare well with a “pure” container orchestrator like Kubernetes.

At the time of writing, it seems that Service Fabric’s
external application support is geared more around lifting, shifting and
modernising .Net applications using Windows containers. If you want to build
container-based applications, Azure seems to be evolving several complimentary
services targeted at different processing needs.

The Azure Kubernetes Service is evolving towards providing a PaaS-based implementation of Kubernetes, which is ideal if you want to orchestrate applications without operational overhead. You can run a single application based on one or more container images using the Azure App Service. Container Instances can be used to spin up single instances and can be a useful means of spinning up occasional jobs or providing burst capacity. Azure Batch is optimised more towards repetitive compute jobs.

Each of these services provide a more “container native”
approach to development than Service Fabric. There really doesn’t seem to be
much reason to adopt Service Fabric unless you have a bunch of legacy services
kicking around that you want to run in the context of an orchestrator.

A shrinking set of use cases

You can see why Service Fabric happened. Microsoft didn’t
have a story for running .Net applications as autonomous services in a flexible
cluster-style arrangement. Genuine cloud native services in the.Net ecosystem
were a long way off. It was difficult to foresee how the Docker ecosystem was
going to mature and that Kubernetes would become so popular.

Service Fabric does seem to be getting squeezed out of a
growing Azure container ecosystem. If you’ve already made the investment in
building applications for Service Fabric there’s no compelling reason to
abandon it immediately. However, the use cases that justify its adoption now
are rapidly shrinking.

]]>Finding service boundaries: more than just the bounded contexthttps://www.ben-morris.com/finding-service-boundaries-more-than-just-the-bounded-context/
Sun, 06 Jan 2019 14:23:34 +0000https://www.ben-morris.com/?p=2773Domain Driven Design (DDD) can provide a good starting point for discovering service boundaries. It assumes that you can’t define a large and complex business domain in a single model, so you should break it down into a series of smaller, self-contained models, or “bounded contexts”.

Each of these bounded contexts is a cohesive collection of data and behaviour that could represent a service boundary. The emphasis on capabilities is important, as services should be more than collections of data entities and CRUD methods. There is also a recognition that different services will have different views of the same concept. A classic example is a customer, where a billing service associates them with payment details, while a shipping service will only be interested in their delivery address.

Although DDD provides a useful theoretical framework to identify ideal service boundaries, many service implementations are driven by more practical concerns. Pragmatism should be a big part of service design and it makes sense to factor in a range of organisational and technical concerns into your boundaries.

Style of service

The desired characteristics of a service can help to determine its boundaries. Bounded contexts tend to describe the largest grouping of capabilities that still maintain internal cohesion, so they may not be useful if you want your services to be relatively small. On the other hand, if you want services to be fully autonomous and look after their own data then this may imply a service that is large enough to encapsulate a self-contained business process. If want services to be owned by a single team and use independent deployment pipelines, then this may also place practical limits on their optimal size.

Data processing

Services can also be defined according to how they process data. You may want to distinguish reporting from operational data, which tend to require very different styles of solution. Similarly, patterns such as Command Query Responsibility Segregation imply separate service implementations to accommodate different processing contexts.

The volume of data is also significant. If you need to provide an interface that supports batch processes, then you may want to isolate this larger-scale processing in a different implementation. This can often be preferable to hardening the entire service for high volumes.

Resilience

Genuine high availability requires a greater investment than those services that can tolerate more failure. You may want to consider isolating those interfaces that really must not fail into separate implementations to reduce the overall burden of complexity.

Organisation

Developing services across teams tends to be much slower than keeping it within a single organisational unit. This is largely a matter of communication, which tends to be more immediate within teams rather than between teams. It is also one of ownership, as it is easier to maintain clarity over the design of a service when it is controlled by a single team.

This is where Conway’s law comes into play, which suggested that the interfaces in a system will tend to reflect the organisational boundaries within it. There is nothing necessarily wrong in this, as drawing boundaries along team lines can often be the most efficient means of delivering a service.

Scope of change

Dependencies between services can be very hard to manage, quickly giving rise to “death by co-ordination meeting”. The conceptual neatness of a bounded context should also be considered alongside the likely scope of future changes. If these are going to regularly sweep across multiple services, then it may make sense to aggregate services to make change easier to deliver.

Overheads

Some design decisions can be driven by a desire to reduce the incremental overhead associated with each new service.

Services have operational cost. Each one involves some effort in terms of development, deployment, versioning, monitoring, tracing and resilience. This can suddenly overwhelm your infrastructure and processes if you have not prepared for scale. There is also the cognitive overhead to consider as a large, sprawling estate can be difficult to comprehend and navigate.

Security

Each new service increases the attack surface area for the system. As the number of moving parts increases it can become more difficult to automate security updates, assert best practice and ensure consistent security scanning. You need to ensure that your security strategies, practices and policies can scale in step with your service infrastructure or you will fall foul of unexpected vulnerabilities and insecure communication.

Concurrency and consistency

Although services should be autonomous, they inevitably duplicate data to some degree. This is where CAP theorem could apply: distributed processing forces you to choose between consistency and availability. If you want to avoid direct, real-time coupling between services then you will have to become accustomed to eventual consistency.

This can have implications for service design, i.e. if data needs to be synchronised between processes then it makes sense to keep it within the service.

Third party products

Third party products don’t always support a clear or consistent boundary, particularly if they have been inherited from the distant past. Their scope tends to change over time and can be defined as much by company politics as any coherent understanding of the domain. Third-party platforms such as ERPs can creep ominously across the architecture, gobbling up responsibilities as they go.

One way of protecting the wider architecture against this amorphous boundary is to use an anti-corruption layer. This can force a clear definition of a third-party platform’s responsibilities as well as acting as a barrier to scope creep.

Time and understanding

Your understanding of a domain tends to improve over time, so it can be difficult to get service boundaries right first time. The more services you have the more difficult it gets to re-draw these boundaries.

It makes sense to limit the number of separate services to begin with. It tends to be easier to decompose a larger service than aggregate several smaller services. This implies much larger services but with fewer mutual dependencies. You will still reap the benefits of scalability, resilience and flexibility, without being hamstrung by premature decomposition.

Available resources

In the real world of deadlines and limited resources, the way you organise data and behaviour can be defined by who is available to do the work. This is particularly the case if you assert clear code ownership by teams. If a team that looks after a specific service does not have the bandwidth to develop a certain feature, then the pragmatic choice may be to have it implemented by another team – and in a different service.

Decomposition strategy

If you are moving from a monolith to something a little more distributed there is always a clear and present danger of over-decomposition. Where bounded contexts can give rise to quite large service implementations, there can be a temptation to apply the single responsibility principle to service design and produce lots of small, specialised services.

This can be a mistake that is difficult to correct. If you over-decompose your domain then the cost of change can start to increase. You start to be weighed down by increased overheads and growing dependencies.

]]>Custom token authentication in Azure Functions using bindingshttps://www.ben-morris.com/custom-token-authentication-in-azure-functions-using-bindings/
Tue, 18 Dec 2018 15:56:51 +0000https://www.ben-morris.com/?p=2819Azure Functions only provides direct support for OAuth
access tokens that have been issued by a small number of providers, such as
Azure Active Directory, Google, Facebook and Twitter. If you want to validate
tokens issued by an external OAuth server or integrate with a custom solution,
you’ll need to create the plumbing yourself.

In the .Net world the ideal mechanism would be to find some way of injecting a ClaimsPrincipal instance into the running function. Validating access tokens based on Json Web Tokens (JWTs) is relatively straightforward, but there’s no middleware in Azure Functions that you inject the result.

You could add some boiler plate at the beginning of every
function, but this is a little messy and difficult to test. Ideally you need to
separate function definitions from the authentication mechanism they are using,
so they can just consume a ClaimsPrincipal that has been created elsewhere.

This can be done by creating a custom input binding that can be used to make the ClaimsPrincipal part of the function definition. The method signature below shows what this looks like – the principal argument has been decorated with a custom binding argument called AccessToken:

This approach eliminates the need for boiler plate and makes
the validation of access tokens an external concern. It also makes the function
more testable as you can inject security principals into the function from test
code.

The full code for this example is posted in GitHub, but the idea was taken from Boris Wilhem’s on-going work around implementing dependency injection in Azure Functions. He uses a similar approach to allow you to define dependencies in start-up code that are injected into methods at run-time.

Creating the custom input binding

The implementation involves creating half a dozen small
classes to wire everything into the Functions SDK:

An attribute that is used to annotate the
ClaimsPrincipal argument in the function definition

A custom binding made up of three classes that
reads the access token in the incoming request and creates a ClaimsPrincipal to
be returned to the function

An extension configuration provider that wires
the attribute and the custom binding together.

An extension method that lets you register the binding
when the Azure Function host starts up.

The attribute definition can be a simple, empty attribute class definition that is decorated with a Binding attribute.

Custom bindings can be straightforward, though this
implementation is complicated by the need to access the underlying HTTP request
for the access token. This requires three classes:

A custom binding
provider that implements IBindingProvider – this will be associated
directly to the attribute definition.

The provider will be expected to return a binding that implements IBinding. The
binding class is where you have access to the underlying request context so can
obtain application settings and the HTTP request header.

The binding will be responsible for returning a value provider that implements
IValueProvider. This is where you do the work to crack open the JWT and create
the ClaimsPrinciple.

To wire attribute and binding together an extension configuration provider is required that implements IExtensionConfigProvider. All this class does is define a rule for the attribute definition that will be picked up by the Azure Functions runtime. This rule can associate the attribute with a custom binding as shown below:

Finally, you’ll need to tell the Azure Functions host about the binding when it starts up. Firstly, you create an extension method that lets you add the binding to the host’s IWebJobsBuilder context as shown below:

This code is executed in a custom Startup method that you’ll need to add to your project. The code below demonstrates this – note the use of the assembly attribute that tells the Azure Functions runtime to use the Startup class when the host initializes.

Validating the token

All the work around token validation happens in the value provider
class – AccessTokenValueProvider. This should receive all the configuration and context information it needs from the binding class, allowing for a clean and testable implementation that generates a ClaimsPrincipal from the incoming token.

The token will be decrypted using the key specified in the IssuerSigningKey property.

It will also validate the token’s issuer and intended audience against the values in the ValidIssuer and ValidAudience properties

The token’s lifetime will be checked to ensure that it hasn’t expired.

Assuming that the token is being supplied as a “bearer token”, you’ll need to take it from the “Authorization” header and strip off the leading “Bearer ” text. The actual token validation only requires a few lines of code:

A caveat – returning an appropriate HTTP status

Note that this implementation is incomplete in one important respect. Any validation failures will throw an exception to the Azure Functions run-time and a 500 Internal Server Error status code will be returned to the user with no message. There is no way of interrupting this response and returning something more useful to the client, e.g. a 403 Forbidden response.

To do this you’ll need to wrap the principal into along with a status and any exceptions that occurred while validating the token. This wrapped result should be returned to the Azure function which will then be responsible for interpreting the result and returning the appropriate HTTP response.

]]>Writing ArchUnit style tests for .Net and C# to enforce architecture ruleshttps://www.ben-morris.com/writing-archunit-style-tests-for-net-and-c-for-self-testing-architectures/
Tue, 27 Nov 2018 23:57:49 +0000https://www.ben-morris.com/?p=2748It can be quite a challenge to preserve architectural design patterns in code bases over the long term. Many patterns can only be enforced by convention, which relies on a rigorous and consistent process of code review.

This discipline often breaks down as projects grow, use cases become more complex and developers come and go. Inconsistencies creep in as new features are added in any way that fit, giving rise to pattern rot that undermines the coherence of the code base.

ArchUnit attempts to address these problems for java-based code bases by providing tests that encapsulate conventions in areas such as class design, naming and dependency. These can be added to any unit test framework and incorporated into a build pipeline. It uses a fluid API that allows you to string together readable rules that can be used in test assertions. A couple of examples are shown below:

This library doesn’t have any direct, open source equivalent in the .Net world. There are plenty of excellent static analysis tools that evaluate application structure, but they are aimed more at enforcing generic best practice rather than application-specific conventions. Many tools can be press-ganged into creating custom rules for a specific architecture, but it’s more difficult to integrate them into a test suite and create a self-testing architecture.

For example, you can define project-specific rules in SonarQube if you’re prepared to brave the SDK, but these are installed onto a separate application server. The LINQ implementation in nDepend lets you interrogate code structure to a very detailed level, but the catch here is that it’s a proprietary toolset that comes with per seat license fees.

The .Net ecosystem does not appear to have any low friction means of creating architectural rules that act on compiled components. This is less about the generic analysis of static code of assess code quality and more to do with checking compiled assemblies against application-specific constraints.

Determining type-level dependencies in .Net

NetArchTest is a basic implementation created to fill this gap that is inspired by parts of the ArchUnit API. The intention was to write a simple .Net Standard library that can be used to assert self-testing architectures on both .Net Framework and .Net Core projects.

The main challenge for implementing this is in determining dependencies between different classes. This is where the real architectural rot sets in over time. Layers start to bypass each other, interfaces are only implemented selectively, naming conventions start to fall by the wayside.

You can determine dependencies at an assembly easily enough level using reflection’s GetReferencedAssemblies() method. The real challenge comes in determining dependencies between different types. This level of dependency analysis requires you to analyse the underlying code line-by-line. A dependency might be lurking in a variable declaration buried within a method.

The Roslyn project exposes code processing functionality used in the .Net compiler, but it works by parsing code files rather than reading compiled assets. This makes it a little trickier to incorporate within unit tests as part of a build pipeline as you need access to all the source code. Also, at the time of writing Roslyn does not directly support loading .Net Standard projects.

A more straightforward alternative is the Mono.Cecil library. This was written to read the Common Intermediate Language format which both .Net Framework and .Net Core projects are compiled down into. As well as providing an API that lets you walk through most of the constructs in a type (e.g. properties, fields, events, etc.) you can also inspect the instructions in each individual method. This is where you can pick up those dependencies that are buried within methods.

Building an fluent API for .Net

Building a fluent API for NetArchTest was quite straightforward, allowing users to create coherent sentences made up of predicates, conjunctions and conditions.

The starting point for any rule is the static Types class, where you load a set of types from a path, assembly or namespace.

var types = Types.InCurrentDomain();

Once you have selected the types you can filter them using one or more predicates. These can be chained together using And() or Or() conjunctions:

Once the set of classes have been filtered you can apply a set of conditions using the Should() or ShouldNot() methods, e.g.

types.That().ArePublic().Should().BeSealed();

Finally, you obtain a result from the rule by using an executor, i.e. use GetTypes() to return the types that match the rule or GetResult() to determine whether the rule has been met.

Enforcing layered architecture

One interesting aspect of the ArchUnit API is the ability to define slices or layers. This can be used to enforce dependency checking between different parts of an architecture. Although I have not explicitly added layer definitions to the API, they can be identified by namespace as shown below:

Application specific rules

The following are some examples of the kind of application-specific rules that can be created with the API. I have shied away from asserting the kind of best practice conventions or type design guidance asserted by static analysis tools. The objective is to create rules that express specific system conventions rather encapsulating generic best practice.

// Only classes in the data namespace can have a dependency on System.Datavar result = Types.InCurrentDomain()
.That().HaveDependencyOn("System.Data")
.And().ResideInNamespace(("ArchTest"))
.Should().ResideInNamespace(("NetArchTest.SampleLibrary.Data"))
.GetResult();
// All the classes in the data namespace should implement IRepositoryvar result = Types.InCurrentDomain()
.That().ResideInNamespace(("NetArchTest.SampleLibrary.Data"))
.And().AreClasses()
.Should().ImplementInterface(typeof(IRepository<>))
.GetResult();
// Classes that implement IRepository should have the suffix "Repository"var result = Types.InCurrentDomain()
.That().ResideInNamespace(("NetArchTest.SampleLibrary.Data"))
.And().AreClasses()
.Should().HaveNameEndingWith("Repository")
.GetResult();
// All the service classes should be sealedvar result = Types.InCurrentDomain()
.That().ImplementInterface(typeof(IWidgetService))
.Should().BeSealed()
.GetResult();

]]>Message design anti-patterns for event-driven architecturehttps://www.ben-morris.com/event-driven-architecture-and-message-design-anti-patterns-and-pitfalls/
Fri, 12 Oct 2018 08:14:37 +0000https://www.ben-morris.com/?p=2769Event-driven architecture allows services to collaborate by publishing and consuming events. In this context an event describes a simple change in state. A service can broadcast events to one or more consumers without needing to know who might be listening and how they might be responding.

This approach encourages loose coupling between services by enforcing an indirect style of collaboration where services don’t need to know about each other. The sender doesn’t care who is receiving the event, while the consumer doesn’t necessarily care who sent it.

Services that are integrated via asynchronous event streams tend to scale better than those that use direct API calls or shared data stores. Resilience is also improved as a service outage is less likely to give rise to cascading failure. Events are also highly versatile so can be used to represent pretty much any business process.

The catch is that these advantages of scalability, resilience and flexibility are very much dependent upon the design of events. There are plenty of traps for the unwary that can undermine the potential benefits of event-based integration.

Dependencies between events

Events should be autonomous and atomic. They should not have any dependencies on any other events. A consumer should be able to process an event in its entirety without having to wait for another event to turn up (the chances are that it never will!).

This implies that an event should contain all the information that a consumer needs to process the event. This doesn’t necessarily mean the event needs to be enormous and contain every piece of related data. The design challenge is to include “just enough” data to allow a downstream service to make sense of the event. Consumers only need to be aware of the information that directly relates to the change of state and can be tolerant of missing information if needs be.

Leaky events

An event should be an abstraction that represents a business process. It shouldn’t include the internal implementation details of a service. This kind of information leakage can cause knowledge coupling between services as they become a little too aware of each-other’s internal workings.

Entity-based events

Designing events that reflect an underlying relational database model is another type of leaky event. The events are sent in response to changes in database entities and they represent CRUD actions rather than business processes.

This type of entity-based event lacks clarity. It’s not immediately obvious what business process is being represented by an event like “order updated” – has the order been placed, adjusted, picked or shipped? Events should not be used to replicate databases as this tends to leak implementation detail and couple services to a shared data model.

Entity-based events also tend to be very inefficient. Events are sent for every inconsequential change to the entity, creating very “chatty” integrations that put unreasonable pressure on consumers.

Generic events

Events should be specific in that they model a single business process. It should be possible to understand what an event means from the title alone – e.g. order placed. This helps with clarity and makes it easier for a downstream service to decide whether it needs to process the event.

This clarity is undermined if you create more generic events that use switches or flags to clarify the intent. This is a similar problem to entity-based events where events are based more on an internal data model than an external business process. It tends to give rise to an unclear and inefficient integration.

Implementing a sequence

Given that messaging is asynchronous you cannot guarantee the order in which events will be sent, received and processed. Even “first-in-first-out” guarantees cannot necessarily ensure that messages will be processed in the order in which they were originally sent.

Trying to assert ordering on any event-based architecture tends to add complexity with very little benefit. It will also undermine many of the benefits of event-based messaging, i.e. decoupling, buffering and scalability.

The best solution for ordering is to design it out of your events. This isn’t as difficult as it sounds so long as you are modelling genuine business processes rather than monitoring changes to entities. If you are really stuck then the sender can implement a sequence number, though this is not trivial to implement particularly if you have multiple senders.

Assumed knowledge

Events should be self-contained. Publishers should not make any assumptions around how events will be processed by a consumer. This kind of assumed knowledge couplies services together via events.

For example, a publisher may expect to receive a specific type of event back from a consumer to acknowledge that the original event has been processed. This is using asynchronous messaging to model a request\response exchange. The two services are coupled into a process and you may want to consider re-drawing service boundaries or refactoring your events.

Commands in disguise

Many event-based architectures also implement commands. There are requests from one service to another to do something. Commands often use the same messaging infrastructure as events, except with different semantics and delivery to a single consumer.

Commands can be useful, but they can undermine service autonomy so should be used sparingly. A service needs to have intimate knowledge of what another service can do before it issues a command. You are also allowing services to dictate to each other, which inevitably increases coupling.

My own preference is to do away with commands altogether as you can model pretty much any interaction as an event. Most commands have an equivalent event – e.g. instead of a “place order” command you can have an “order placed” event. The risk here is that you end up with events that are just commands in disguise, i.e. events with a single recipient where the sender expects a response through a related event (e.g. “order accepted“).

Events as method calls

Once you have an event infrastructure in place it can be easy to get carried away and publish too many events. Events should have some cost associated with them or the boundaries between services will start to feel meaningless.

Too many events can give rise to “chatty” integrations between services where events are used as commonly as a method call. With this style of event you will find over time that services are having to send and receive an escalating number of messages. This can place quite a burden of downstream systems as they need to work harder to keep up with the pace of the message flow.

Too few messages

The opposite problem to using events as method calls is environments where there aren’t enough events. This often happens in legacy environments when a shared database is lurking somewhere within the architecture. Services are accustomed to being able to read and write from a shared store so adding a new event seems like an overhead.

It can take some time for event-driven integration to take hold in a legacy environment, but it’s absolutely worth the investment. You will eventually build up a critical mass of events that allows you to break free from shared databases. This requires that you keep the discipline of implementing event messages in place more immediate database calls.

Infrastructure mismatch

The choice of underlying messaging platform will have a significant impact on the way that messages are sent and consumed. A message broker such as RabbitMQ will track the status of individual messages, allowing features such as transactional semantics (i.e. retries) and duplicate detection to be handled by the infrastructure.

On the other hand, a high-volume streaming platform like Kafka exposes event logs, leaving it up to the client to track their position and implement any retry logic. The advantage is that it’s easier to scale delivery and provide an “event firehose” where services can consume very high volumes of events and replay streams more easily. The downside is that client implementations become significantly more involved.

It’s important that any messaging technology is “right sized” against likely throughput. Message brokers can become extremely expensive once you are processing hundreds of millions of messages per month. However, it doesn’t make sense to sacrifice a broker’s messaging features unless you really are going to scale beyond that point.

]]>Implementing a Docker HEALTHCHECK using ASP.Net Core 2.2https://www.ben-morris.com/implementing-a-docker-healthcheck-using-asp-net-core-2-2/
Fri, 21 Sep 2018 13:44:06 +0000https://www.ben-morris.com/?p=2757It’s easy enough to tell if a container is running, but more difficult to tell whether the container is ready to accept instructions. Docker supports a health check mechanism that allows you check that a container is doing its job correctly.

There are several ways of setting this up, but the HEALTHCHECK instruction allows you to bake this kind of readiness checking into an image definition. This directive specifies a shell command that returns a zero if all is well and one if the container is unhealthy. It can be picked up by an orchestrator, such as a readiness probe in Kubernetes or the system health reporting in Service Fabric.

A typical health check declaration uses curl’s –fail option to call an HTTP endpoint and regard any error response status as unhealthy. The example below augments the shell command with some switches to set an interval along with retries and a timeout:

Ideally a readiness check like this should be lightweight. This is not an advanced diagnostic function but a simple status check that is repeated by the orchestrator to check that the lights are on.

Implementing health checks in ASP.Net Core

Given that a Docker HEALTHCHECK allows you to define a shell command you are free to use any mechanism for returning the result. An HTTP end-point is the most obvious approach for an ASP.Net Core application, and curl is included in the aspnetcore-runtime Docker images.

Although you are free to implement your own end-point, a health check implementation is being added into version 2.2 of Asp.Net Core via some new extension methods.

You can set up health-checking in your StartUp class by invoking extension methods for the services collection and application builder as shown below:

This check is executed every time the health check endpoint is called and the result added to a HealthReport object. If the check fails, the default health check end-point will return an “Unhealthy” response with a 503 Service Unavailable HTTP status.

You can modify this output using the extension methods on the application builder. The example below dumps out the contents of the HealthReport object to provide more verbose output.

Note that a Docker HEALTHCHECK does not care about any of this extra information. The basic result is unchanged, i.e. a zero is returned for a healthy service while a one is returned in response to any HTTP error code. Other health check mechanisms might find this scope for verbose output more useful.

Checking the output

You can see a health check in action by inspecting the running image. If you execute a docker ps command immediately after running a container then you’ll see the health status set to “starting”. Once the check has been executed successfully it will switch to “healthy”.

]]>Hosting .Net Core containers in Service Fabrichttps://www.ben-morris.com/hosting-net-core-container-applications-in-azure-service-fabric/
Wed, 12 Sep 2018 15:16:52 +0000https://www.ben-morris.com/?p=2778You can host containers in Service Fabric, but it is first and foremost an application server. Service Fabric projects typically use services hosted by the Service Fabric runtime and are dependent on APIs defined in the Service Fabric SDK.

This can give rise to a very specific style of application. Base classes such as StatelessService and StatefulServiceBase tend to undermine portability , while patterns such as reliable actors tie services irrevocably to a run-time application server. The SDK also encourages application code to be combined with deployment details in a single Visual Studio solution. This is some way from the kind of cloud-native, twelve factor applications typically associated with container services.

Given the recent rise of services such as Azure Kubernetes Service, the container support in Service Fabric seems to be targeted more towards lifting and shifting existing .Net applications. You can use it as an orchestrator for cloud-native services, but you are inevitably made to feel like a second-class citizen in doing so. The process of configuring and deploying container-based applications to Service Fabric does not compare well with a “pure” orchestrator like Kubernetes.

Using Visual Studio tooling. Or rather not.

Visual Studio provides a tool for adding container orchestration support for Service Fabric, though it may do more harm than good. If you right-click on the project and select Add -> Container Orchestrator Support the following changes will be made to your project:

A “PackageRoot” directory will be added containing the service manifest and configuration file

A (pretty basic) Dockerfile will be added that attempts to push the build artefacts into a nanoserver-based ASP.Net Core image

An IsServiceFabricServiceProject element is added to the project file.

This has the effect of embedded configuration detail about the orchestrator into your service code. Ideally you should avoid adding this kind of run-time implementation detail as it undermines the portability of the service.

The good news is that you don’t need this guff and can develop containerised services in .Net Core that have absolutely no knowledge of Service Fabric. You will need to take care of the runtime images that you use though, as you may find that your cluster only supports nanoserver based images as opposed to the more generic core ones (i.e. dotnet:2.1-aspnetcore-runtime-nanoserver-sac2016 rather than dotnet: 2.1-aspnetcore-runtime).

Creating Service Fabric applications for containers

This is where things start to get hair-raising. A Service Fabric application is analogous to the Kubernetes pod in that it is the main unit of scaling that can host one or more containers. You use the SDK templates to create a project that deploys one or more container to a cluster.

This template boasts nine XML files, a PowerShell script and a configuration file. This configuration system has been adapted to support containers, so details will be scattered across the two main XML-based manifest files: a separate ServiceManifest.xml for each individual running image and an ApplicationManifest.xml for the application.

Setting the container image and registry

The CodePackage element in the ServiceManifest.xml file defines the actual service executable. This contains a ContainerHost element where you specify an image that is hosted in a container registry, i.e.

To define the credentials for a custom container registry you’ll need to add them in a RepositoryCredentials element in the ApplicationManifest.xml file. You should encrypt any passwords using the Service Fabric SDK, though this is dependent on the specific certificate that you install on the target cluster. The example below shows an unencrypted password which is not a recommended approach outside of any development environment:

Ports and binding

The external port that a running application uses is defined in an EndPoint section in the ServiceManifest.xml file. The example below creates an endpoint on port 8081 that the Service Fabric runtime will treat as an entry point for requests to the application:

Isolation mode

Containers running on the same host will share the kernel with the host by default on Windows Server hosts. You can choose to run them in isolated kernels by specifying the isolation mode in the ContainerHostPolicies section in ApplicationManifest.xml as shown below:

<ContainerHostPoliciesCodePackageRef="Code"Isolation="hyperv">

Resource governance

Resource governance in Service Fabric is used to control the “noisy neighbour” problem of services that consume too many resources and starve other services. You can set policies that restrict the resources that a container can use, mainly around CPU and memory usage. This is similar to the resource limits that you can request for containers in Kubernetes.

This is defined in the ApplicationManifest.xml file using a ResourceGovernancePolicy element:

<ResourceGovernancePolicyMemoryInMB="512"CpuPercent ="25" />

Supporting Docker HEALTHCHECK

The Docker HEALTHCHECK directive allows you to bake health check monitoring into your container image definition. You can wire this up to Service Fabric’s health check reporting by adding a HealthConfig section to the ApplicationManifest.xml file:

Running the container application

The most difficult aspect of running a container application to work is in getting a Service Fabric cluster up and running in the first place. Setting up a local cluster can be infuriating. The tooling does not feel particularly mature and troubleshooting it can be a frustrating experience.

One option is using the Azure-based “party clusters” that Microsoft maintain so you can experiment with Service Fabric. These are free clusters that are provisioned to you for an hour. Alas, this is unlikely to be enough time to provision anything meaningful without prior experience of managing a cluster.

You can provision a Service Fabric cluster in Azure but be aware that you will be charged by the hour for all the VMs, storage and network resources that you use. The cheapest test cluster will still require three VMs to be running in a virtual machine scale set. Cost management is tricky. De-allocating the set of VMs stops the clock ticking on VM billing but it effectively resets the clusters (and the public IP address), forcing you to redeploy everything when it comes back up.

The act of publishing your application is relatively simple due to the magic of PowerShell. You will be expected to have a certificate installed locally and this credential needs to be specified in the Cloud.xml settings file in the Service Fabric project.

When you first deploy an application, be prepared for Service Fabric to report that the service is unhealthy while it downloads and installs the container image. Once the service is running you can use rolling upgrades to push new versions without downtime, though be aware that upgrading requires you to change the version number on your container image tag. If you tag your images with “latest” then service fabric does not download them when you run an update.

Be warned that Service Fabric can be mysterious to the uninitiated and it takes time to deciphering some of the error messages. For instance, the frequently encountered “Partition is below target replica or instance count” error can mean an unhandled exception in service start-up, a configuration error, or an environment problem such as a lack of disc space. You need to play detective to find out what might be wrong.

Is it worth the effort?

Let’s face it – you wouldn’t choose Service Fabric as a container orchestrator unless you had to. it feels like you are running applications in an environment that was not explicitly designed to host containers. Sure, it works, but getting there takes more effort than it ought to.

Perhaps Service Fabric’s support for containers could be seen in the context of supporting a longer-term migration strategy. If you’ve already made a significant investment in Service Fabric, then you can start to migrate towards a more “cloud native” style of service without having to replace your runtime infrastructure.

]]>How I learned to love the “Agile Industrial Complex”https://www.ben-morris.com/how-i-learned-to-love-the-agile-industrial-complex/
Thu, 30 Aug 2018 12:18:24 +0000http://www.ben-morris.com/?p=2717Now that agile adoption has gone mainstream, there is a growing sense of unease around how larger organisations have implemented it. In particular, there is a tendency towards centralised control that can be at odds with the agile preference for individuals over process.

Some of the more influential early practitioners of agile bemoan what it has become. Ron Jeffries is particularly critical of enterprise agile “roll-outs” that impose “correct” process on teams. He refers to “Dark Scrum” as the result of impatient and poorly-supported Scrum implementations that don’t allow self-organisation to emerge.

Jeffries has suggested that developers should “abandon agile” by detaching their thinking from any framework or method. Instead, they should focus on those technical practices that empower developers to overcome the impediments to building great software. This means working software, incremental releases and clean design.

Martin Fowler has suggested that many organisations use the word “agile” without adopting the values and practices it is supposed to represent. He referred to the “Agile Industrial Complex” as the collection of consultants, experts and managers who try to assert “best practice” rather than letting teams decide how best to do their work.

Problems of vague definition

Fowler’s argument is that the principles matter most, yet the principles outlined in the agile manifesto are vague value statements. The ideas on the left are “favoured” but the ideas on the right are not excluded. What is the correct balance between working software and documentation? How do you weigh up interactions and process in a shop with hundreds of developers and millions of lines of code?

The obvious retort is that “the team should decide”, but this can feel a little impractical. If the team chooses its own path, how can we be certain that this dovetails with the wider organisation’s needs? How do you know when a team is working effectively and how do intervene when something has gone wrong?

The notion of “self-organising” teams can be taken a little too literally. An organisation also has needs that are not addressed by the more team-centric principles of the agile manifesto. It will need to achieve some economies of scale, facilitate large-scale integrations, avoid repetition, attempt to share best practise and monitor progress across dozens of different self-organising teams.

Allowing teams to be self-organising assumes they have the skills, experience and perspective to make the right choices. They also need the maturity and self-awareness to be able to when change and improvement is required (i.e. all the time). This is not always the case, hence the growth industry of “agile coaches” to mentor teams in their journey towards agile nirvana.

The need for guidance rather than control

The word “agile” has become a morally loaded term. It has inspired more than its fair share of unhelpful zealots and obstructive cant. We should not be framing the problem in terms of the struggle of humble development teams against the machinations of ruthless management. It has more to do with the need to balance the interests of the organisation with the desire to create a productive environment for development teams.

The “Agile Industrial Complex” does have an important role to play in ensuring that the needs of the organisation are catered for. It just needs to resist the urge to assert common process in favour of guiding teams towards realising approaches that suit their unique technical and commercial context.

For example, many teams are not developing the kind of small to medium-sized applications that lend themselves to agile. It is not always possible to identify discrete features that can be delivered in the context of a regular cadence. The meat and drink of larger corporate organisations is more likely to be bug fixing on a code base old enough to drink alcohol or slogging through a complex yet lucrative data integration for a corporate ERP system.

Development process can also be shaped by the wider commercial context. Iterative delivery can be a hard sell to customers who are reluctant to accept what they perceive to be greater uncertainty. Organisations with a small number of highly influential customers can be put under intolerable pressure to provide guarantees around scope and delivery timelines.

Agile purists often interpret the agile preference for “customer collaboration” as never having to agree a deadline. A more liberal reading of the agile manifesto allows for flexibility so teams can accommodate the wider commercial reality. This can be difficult for inward-facing, technically-orientated development teams to grasp without some guidance.

Attempts at providing guidance are often interpreted as the actions of a management class who are unwilling and unable to relinquish control. This rhetoric can be unhelpful. Sure, there are occasional control freaks out there, but more often than not, this is an attempt to bring a wider organisational perspective to bear on insular development teams.

Improving the lot of developers

Making it easier for developers to produce working software is the central concern of agile. Perhaps what’s missing from large-scale agile implementations is a focus on the developer experience.

Developers have come to associate agile with excessive bureaucracy and time wasted in too many meetings. This was never the intention of agile, which always sought to remove the impediments to doing great work.

I have seen teams energised by agile processes where they seize control of their work and find solutions for long-standing problems that always eluded their managers. It really does work on a team level, though does not offer any solutions to the problems of organising development at scale.

This is where the “Agile Industrial Complex” comes in. It is trying to solve some very important problems, it’s just that these don’t have much to do with the team-centric concerns of agile.

]]>Building Twelve Factor Apps with .Net Corehttps://www.ben-morris.com/building-twelve-factor-apps-with-net-core/
Sun, 12 Aug 2018 10:11:21 +0000https://www.ben-morris.com/?p=2731Twelve factor apps provide a methodology for building apps that are optimised for modern cloud environments. The intention is to provide autonomous, stateless components that can be easily deployed and freely scaled up. This means designing for easier automation and maximising portability between different environments by removing any dependencies on external frameworks or tools.

The methodology is based on twelve tenants that provide a mixture of process guidelines and implementation detail. Although the methodology has been around for years, it’s only really been achievable in the Microsoft world since the advent of .Net Core.

I. Codebase

One codebase tracked in revision control, many deploys.

This defines the basic unit of organisation for an app, i.e. a single code base that can be deployed to multiple environments. This should mean hosting the app in a single repository, though this is not enough on its own. An app needs to be a cohesive unit of code rather than several different applications hosted in the same repository.

This might seem rather basic, but it does help to define the scale and scope of an application. It encourages smaller, more manageable units that have clearer responsibilities and are easier to automate.

II. Dependencies

Explicitly declare and isolate dependencies

You should be able to define all the dependencies that an application uses in a manifest that lives within the application. There’s no room for system-wide frameworks or external tools. This makes it easier to set up new environments, particularly for developers as any dependencies can be automatically acquired through a packaging system such as NuGet.

This used to be all but impossible with .Net as you relied on the existence of a system-wide framework. This created a prerequisite for an environment to be configured with a specific set of framework dependencies. Applications written in .Net Core on the other hand can be packaged to all their dependencies, including any underlying framework.

III. Config

Store config in the environment

In this context, configuration is defined as anything that is likely to vary between deployments and environments. There should be explicit separation between any configuration settings and code. A good litmus test for this is whether you can open source the code base at any moment without compromising any credentials.

Ideally, configuration should be factored into environment variables where they are easier for operational teams to manage. Above all you need to avoid relying on JSON based configuration files that ship with code or storing multiple configurations for different environments.

This approach is supported by the configuration APIs in .Net Core as they provide a hierarchy of configuration sources. This allows you to define a baseline configuration in an appsettings.json file and override any setting with environment variables and an external secrets store. This is demonstrated in the bootstrap code below:

IV. Backing services

Treat backing services as attached resources

A backing service is anything that is consumed over a network, such as databases or external APIs. There is no distinction between services you manage locally and those that belong to third parties – they should all be treated as resources that can be attached and detached without requiring code changes.

V. Build, release, run

Strictly separate build and run stages

This requires an explicit separation between the processes that build, release and run software.

A build process should create an executable bundle and execute any tests you have along the way. A release is an immutable combination of the built artefact and configuration. The process of running an application should be handled by a process orchestrator and have as few moving parts as possible. Suffice to say, each of these stages should be automated rather than relying on human intervention.

VI. Processes

Execute the app as one or more stateless processes

A “process” is defined as the single execution of an application. An application should run in a separate process that does not share information with other running applications. This means not using techniques such as sticky sessions or caches to share state between processes.

VII. Port binding

Export services via port binding

A twelve-factor app should be completely self-contained and not rely on the run-time injection of an external host or server. If you want to expose an API over HTTP then an app should export the service and bind it to a specific port rather than having it implemented via a web server.

In .Net Core you expose an HTTP service using Kestrel, which is a self-contained HTTP server embedded within the application. You set up the service and bind it to an external port using bootstrap code below:

WebHost.CreateDefaultBuilder().UseKestrel()

It then becomes the responsibility of the environment to route requests to the port that the application is bound to. This is a very different approach to using a web server such as IIS to create an HTTP-based service.

VIII. Concurrency

Scale out via the process model

This follows on from the notion of treating running applications as stateless processes. It gives you the freedom to increase throughput by adding more of these processes in response to load. There is some nuance in here, as ultimately your scaling will be affected by other parts of your infrastructure, particularly any data storage infrastructure. That said, the running process should provide a basic unit of scale that enables more elastic scaling.

IX. Disposability

Maximize robustness with fast startup and graceful shutdown

Genuinely responsive elastic scaling requires that you can spin up instances quickly and kill them off without much ceremony.

You will need to take the time to ensure that applications shut down gracefully. Once they receive a command to close – e.g. a SIGTERM signal from a process manager – they should stop accepting new requests and return any unprocessed items back to wherever they came from.

In .Net Core you can use an IApplicationLifetime instance to implement a graceful shutdown by hooking into events such as ApplicationStopping and ApplicationStopped. Note that these will not necessarily fire in a debug environment but they will fire when being managed by an orchestrator. You can also use CancellationToken in your processing to check that a shutdown has not been signalled before doing any work.

Consideration should also be given to “sudden death” scenarios where there’s no opportunity for graceful shutdown. This is more to do with the way that workloads are organised, such as using transactional messaging that will return messages to a queue if they are not explicitly terminated.

X. Dev/prod parity

Keep development, staging, and production as similar as possible

There are several different ways in which environments diverge. Code can get stuck in “testing hell” so the version in test environments starts to diverge from production. It can be difficult to keep tools and frameworks in sync. There’s also a difference in personnel between each environment which can give rise to subtly differences in configuration.

Twelve factor applications help to reduce this divergence by producing stateless, self-contained applications that depend less on external frameworks. Containerisation can help further by creating more certainty over run-time environments. Ultimately, you need to ensure that environments have a similar topology and only really vary in size.

XI. Logs

Treat logs as event streams

A twelve-factor app should treat logs as a time-ordered sequence of events and shouldn’t worry about how they are routed or stored. Logs entries should be written directly to stdout. The aggregation, storage and processing of these entries is a separate concern that should be taken care of by the environment. This approach tends to be associated with tools such as the ELK stack or Splunk that capture and process log streams.

This can be difficult for application developers who are used to being able to configure the shape and destination of log files, including rotation and rollover policies. This decoupling gives you more freedom to change the way that logs are processed without having to make corresponding changes in application implementations. It also helps with elastic scalability, as manual log aggregation can quickly become a real pain when you scale up to dozens of instances.

The .Net Core logging extensions support this abstraction by providing a standard interface for writing log statements and a hierarchy of providers that can route these statements to the console output stream. The example below demonstrates setting up a log instance that writes to stdout using the .Net Core console logging provider:

Many implementations of the .Net Core logging abstraction involve more explicit routing to a log store. For example, libraries such as Serilog have implementations that explicitly route output to Azure’s AppInsights. This is not in keeping with a twelve-factor approach as it couples services more explicitly to an external service.

XII. Admin processes

Run admin/management tasks as one-off processes

This refers to upgrades or one-off configuration tasks and it implies they should be bundled with a release. A good example is using code migrations with entity framework to make changes to a target database. One thing that is missing from .Net Core in this respect is a REPL shell that allows you to run arbitrary code, such as the rails console command or irb in Ruby.

]]>Autonomous bubbles and event streams: Pragmatic approaches to working with legacy systemshttps://www.ben-morris.com/autonomous-bubbles-and-event-streams-pragmatic-approaches-to-working-with-legacy-systems/
Sun, 29 Jul 2018 19:55:14 +0000http://www.ben-morris.com/?p=2699There’s no universal definition of what constitutes “legacy” software. Some developers seem to regard it as anything that they did not personally write in the last six months. More seasoned executives will flatly deny use of the term even when gazing out across their estate of mainframe servers.

If legacy means anything at all it’s a system that is sufficiently outdated to undermine development velocity. The system is inherently unstable and difficult to change safely. Some features cannot be delivered at all, while others become unnecessarily expensive due to the amount of effort required to work around the inadequacies of the application and its environment.

Legacy doesn’t necessarily mean “old”. The accelerating march of frameworks and languages means that it’s easy for code bases to be left behind. Web applications based on earlier versions of JavaScript frameworks such as Angular and Ember can suffer from legacy difficulties when they are only a couple of years old.

Legacy doesn’t have to involve obsolete technology either. Obsolete architecture can have a similar impact on development velocity no matter what the underlying technology. Long-lived code bases tend to suffer from entropy over time, even if the underlying frameworks are kept up to date. Lifting and shifting a tangled code base to a more recent technology won’t suddenly make it more malleable.

Why does this matter?

Legacy systems don’t collapse overnight. There’s no sudden, insurmountable crises but more of a long, slow decline of development velocity.

A large, long-lived system inevitably loses its shape over time and becomes more difficult to work with. This accumulation of quick fixes, technical debt and evolving complexity is not restricted to older code bases. It’s surprising how quickly a badly-managed code base can sink into disrepute.

Bigger problems caused by technical obsolescence start to creep in if a system is based platforms that are not under active development. Vendors may claim on-going support for a platform, but that’s not the same as actively maintaining it. The underlying run-times are not kept up to date in response to evolving security threats or emerging protocols.

This also means that there is no wider ecosystem of components and tooling. The techniques and frameworks that developers take for granted on modern ecosystems are not available. Agile technical practices such as test-driven development or continuous integration are effectively closed off, trapping a platform in more uncertain and manually-intensive delivery.

Added to this are growing problems of recruitment and retention. You might get lucky and find great developers who are prepared to work on legacy systems, but they are few and far between. It becomes very difficult to maintain a stable team of developers who understand the system in any depth.

Dealing with legacy systems. Or not.

Michael Feathers has written extensively around how you can surround legacy code with tests to enable safe changes. Advocates of microservices suggest that you can gradually decompose an application into smaller service implementations. A similar idea involves developing “strangler applications” that slowly grow around the edges of legacy systems, eventually over-powering them.

The problem with these re-write approaches is that they are very difficult to complete for larger, longer-lived platforms. They often suffer from a fatal loss of momentum, attention and support. This can leave successive failed re-writes lingering in legacy code bases, much like geological layers that provide evidence of multiple cataclysms in the distant past.

A bigger obstacle can normally be found in the internal organisation of the code. At the heart of many legacy systems is a Gordian knot of data and behaviour that is all but impossible to separate out. It’s just not realistic to imagine that you can gently decompose a fifteen-year old system based on hundreds of tables and thousands of stored procedures.

A hard truth is that in many cases a legacy platform remains the commercially optimal means of delivering functionality. The benefits of migrating to a more modern architecture cannot overcome the astronomical costs of a re-write.

In this case, running a system indefinitely under a form of palliative care starts to appear viable. A small team of seasoned developers are made comfortable with large salaries and generous pension plans. Bugs get fixed, but any new feature development happens elsewhere. This is what a lot of organisations are doing with their legacy systems, even if they won’t admit it.

This is often difficult for development teams to swallow. They need a means to augment legacy platforms with new features. They want to reduce the support burden by modernising common trouble spots. They also want the opportunity to work with more modern technology. How can you address these difficulties without falling into the re-write trap?

Using “bubbles” to separate the new from the old

Like any Gordian knot, a more creative approach is required to solve the problem. A pragmatic solution involves putting in place a structure that enables development alongside the legacy platform. You won’t transform the core application, but you can at least allow some new feature development, introduce some new technologies and even support some gentle decomposition. All without falling for the folly of a rewrite.

Eric Evans described establishing a “bubble” for new development that can sit alongside the legacy platform without being directly dependent on it. This is a small, self-contained part of the domain that can be “fenced off” from the main system using an anti-corruption layer. This bubble provides sufficient isolation to allow a small team to develop a solution without being constrained by the legacy platform.

The anti-corruption layer can be a simple API interface that allows data exchange with the legacy platform without directly coupling to it. The idea is to protect the bubble from being controlled by the legacy platform, so it can implement its own model and architecture.

Bubbles can be a useful technique for getting new development up and running, but they tend to be fragile. It’s hard to maintain the discipline of the anti-corruption layer over time and easy to compromise the bubble’s autonomy.

The “autonomous bubble” makes this bubble approach more permanent by separating it completely from the legacy platforms. It should be able to run completely under its own steam without needing to refer to any legacy systems. This implies that it will run its own data persistence and take responsibility for synchronising with legacy platforms.

This synchronisation could use something as humble as a batch file export, but a more robust approach would involve broadcasting events on a messaging technology. This event-based approach to integration allows for more timely updates but also lends far greater autonomy to the new context.

Exposing legacy data as event streams

A weakness of the bubble pattern is that it can be difficult to defend the boundary with the legacy system over time. Whatever your choice of anti-corruption layer or synchronisation mechanism there is an interface that needs careful curation. Any changes need to be synchronised between the bubble and the legacy system, requiring modelling work and tortuous parallel planning between development teams.

One technique for avoiding the overhead of an anti-corruption later is to broadcast all the changes in a legacy system using an event streaming technology like Kafka. Tools such as Striim and Attunity can capture and transform database updates by processing database transaction logs. This data will be in a very raw state, but you can build downstream processes that consume these events and translate them into a meaningful format for other services to consume.

This “all in” streaming model allows you to de-couple a new service architecture completely from legacy platforms without having to model the exchange or synchronise development. These legacy systems are still responsible for data collection and legacy processing, but their data become freely available to new services.

This approach is particularly useful if you need to implement data-centric features such as cross-platform reporting, forecasts and analytics. It implements data as a “tap” where you are free to draw off data as and when you need to without direct dependencies on legacy platforms.

Does this really solve the problem?

Bubble patterns and event streams require some work to establish but they can facilitate a path towards a more modernised architecture. That said, this is merely side-stepping the problem rather than confronting it directly.

Legacy systems are often an unsolvable problem. Hence the fact that they tend to hang around for so long – it’s rarely for lack of will or inspiration. Every legacy platform usually has some evidence of a botched re-write lurking somewhere in its bowels.

You can at least make progress towards new development without the false promise of decomposition and replacement. Most importantly, any new services can be fully de-coupled from the inner workings of the legacy system. This provides a more pragmatic and sustainable approach to dealing with legacy platforms than the delusion of a re-write.

]]>Kafka on Azure Event Hub – does it miss too many of the good bits?https://www.ben-morris.com/kafka-on-azure-event-hub-does-it-miss-too-many-of-the-good-bits/
Tue, 10 Jul 2018 09:55:35 +0000http://www.ben-morris.com/?p=2724Microsoft have added a Kafka façade to its Azure Event Hubs service, presumably in the hope of luring Kafka users onto its platform. This makes sense as the platforms have a lot in common.

Both platforms were designed to handle large-scale event streams involving multiple clients and producers. They do this by providing a distributed, partitioned and replicated commit logs. Messages are distributed onto topics with separately configurable retention periods. Partitioning across consumers is achieved through consumer groups. Clients are expected to manage their own cursor implementations top track their position in a stream.

As with all distributed systems, Kafka is decidedly non-trivial to maintain at scale. It takes quite a commitment to set up and manage a cluster in your own infrastructure. At the time of writing, hosted versions of Kafka are not shy about charging a premium to manage a cluster for you. The going rate appears to be in the region of $15-20 per day for an entry-level rig that can handle 1MB\sec throughput.

By contrast, Azure’s Event Hubs can provide similar throughput for just over a fiver. It also provides a pure PaaS solution that abstracts away all the detail of cluster management, providing global resilience, disaster recovery and scalability without any management or configuration burden. Throughput is provisioned by reserving resource units and you can control basic aspects such as message retention and the number of partitions for each topic.

Adding API façades to encourage migrations

The Kafka façade should allow applications to switch to Azure Event Hubs with minimal code changes. Kafka concepts such as partitions, consumer groups and offsets have direct equivalents in Azure. The provisioning of resource units to scale the instance throughput works in the same no matter what API you are using.

Presumably, the intention is to provide a migration path to Azure for anybody who has already made an investment in Kafka. A similar approach has been taken with CosmosDB, which has library support for MongoDB and Cassandra clients. They are not providing MongoDB and Cassandra instances, but merely providing a migration path to minimise the code changes required to move.

The problem with these implementations is that they are only partially realised. The client APIs may be the same, but under the hood all the provisioning and partitioning is still very much based on CosmosDB. In practical terms this means that your MongoDB sharding keys will work very differently, you lose control over how partitions are created and there are some limits on query capability.

The facades can also deny you access to native CosmosDB features. For example, the native SQL interface for CosmosDB lets you track resource unit usage by returning the units used in each operation. This is particularly useful for sizing loads and figuring out provisioning strategy, but it’s not available to the MongoDB and Cassandra façades.

Kafka limitations in Azure Event Hubs

Limitations also apply to Kafka through Azure Event Hubs. The APIs allow you to connect, send and receive, though some Kafka-specific features are missing. The omissions here feel a little more severe.

The main missing area is in Kafka’s support for “exactly once” delivery semantics. What this really means can be a little nuanced, but it supports a pipeline between producer and consumer where each message will only be processed once. This allows the same result to be obtained from a stream even if there are network failures of consumer crashes during processing.

This relies on two key features, neither of which are supported in Azure Event Hub. For event producers, message sending can be made idempotent to avoid sending duplicate messages. This is achieved by assigning a unique identifier to the consumer and enforces a sequence for new messages. Note that if a producer fails completely then it is given a new unique number on restart, so the guarantee only exists for a single producer session.

When writing data Kafka supports atomic writes across topics and partitions, providing transactional semantics for groups of messages. On the consumer side, you can change the isolation level of a client to respect transactions or ignore them for better performance.

Idempotent producers and transactions aren’t the only omissions. You won’t be able to leverage Kafka Streams either, an API that allows you to transform data and build queryable projections from event streams. You miss out on Kafka Connect, the connectivity framework that makes it much easier to transfer data between systems. There are also a few missing management features such as being able to add a partition to an existing topic, setting retention based on event size and using the HTTP-based Kafka REST API.

Taken together, this amounts to a neutered platform compared to a native Kafka implementation. Azure Event Hubs is just a streaming transport, lacking the more sophisticated delivery and processing features that are found in Kafka. It’s a great choice for simpler message-streaming scenarios, but may not be so useful if you have already made a significant investment in Kafka’s more advanced features.

]]>Using architectural “fitness functions” as a guide to system designhttps://www.ben-morris.com/using-architectural-fitness-functions-as-a-guide-to-system-design/
Mon, 18 Jun 2018 19:43:55 +0000http://www.ben-morris.com/?p=2696Fitness functions are used in genetic programming to assess how close a design solution is to meeting a set of aims. They are applied to iterative simulations so badly performing solutions can be removed and the output can be guided towards an optimal solution.

This is an idea that can be applied to the design of software systems. An evolutionary approach to architecture suggests that you can assess the suitability of a technical solution using objective and repeatable tests. These tests balance the need for rigor around architectural design with the need to support rapid change. You can evolve an architecture through iterative experimentation, using fitness functions to guide the way.

Identifying these tests as “fitness functions” may be stretching the evolutionary metaphor a little. You are not using functions to choose between different candidate solutions in a simulation of natural selection. You are assessing current state. The application of fitness functions has less to do with iteratively rejecting solutions and more to do with using metrics to guide future development effort.

Another problem is that it can be difficult to come up with objective and repeatable tests that measure the right things. In genetic programming a lot of investment is put into the definitions of fitness functions to make sure they correlate with design goals. Measuring the wrong thing will cause the process to converge on an inappropriate solution.

The same applies to fitness functions in an architectural setting. How do you measure architectural qualities such as coupling and cohesion? Are these even the right things to measure? In the search for meaningful metrics there is a risk that you end up groping for whatever comes conveniently to hand.

The problem with observing code

Given that code is a tangible artefact it can be tempting to define architectural fitness functions that focus on the structure of code. Test frameworks also lend themselves to creating clear, atomic tests that can be incorporated into the build process.

You can use static analysis tools such as SonarQube to provide some indication of the general complexity of code. ArchUnit goes further by allowing you to enforce code structure rules as JUnit test assertions that can be incorporated in continuous integration pipeline. This lets you define package structure, control package and class dependencies, regulate the flow between layers of an application and identify circular dependencies.

This approach is quite limited as it takes a very narrow view of the system. A full picture of architectural health needs to take different dimensions into account beyond the technical implementation. You also need to consider aspects such as performance, security, data, operability and integration, all of which can be difficult to assess through an automated test.

Another problem is that you tend to get what you measure. Fitness functions based on metrics can very quickly become perceived as targets. This encourages engineering teams to “game” them to create an impression of architectural health. For example, setting a target for unit test coverage encourages developers to create meaningless wrapper tests that do little more than bump up the coverage statistics.

Focusing on what matters

Perhaps architectural fitness isn’t something that can be measured by directly observing code or design. The real proof of architectural fitness comes with derived measures that describe how the system is being used and whether it is meeting expectations.

This can include more commercially-orientated tests that reflect business priorities as much as technical implementation, i.e.

How long does it take to deliver a feature, from conception to release?

How long does it take to on-board a new customer once a sale has been made?

How many new support incidents are being recieved?

How much unscheduled down-time is going on?

There can be disconnect between the more tangible technical tests and the messier real world of commercial priorities and users. It’s worth noting that discussions around fitness functions rarely mention areas such as user experience or customer satisfaction. This could be because these are often intangible concepts that are difficult to measure. They certainly cannot be implemented as an automated test or put on a dashboard. This tends to encourage an overly internal, technical focus on architectural assessment which risks losing sight of things that really matter.

Avoiding metric-driven development

Are architectural fitness functions anything more than glorified metrics? That’s not really the point. It’s the way they are used that defines them. They can provide a base for a more iterative approach to architecture, helping to direct an evolving design towards a desired set of outcomes. It’s this support for iterative experimentation that sets them apart.

This does all rather hinge on coming up with a set of functions that accurately describe the desired outcome. The risk is that you tip over into a kind of metric-driven development where priorities are distorted by a narrow set of measurable criteria.

You also need to be pragmatic in the way they are applied. There will be trade-offs and conflicts between different dimensions of a system. You also need to review functions for continued relevance. After all, as with any set of requirements, your understanding of what is important to a system will evolve as you develop it.

]]>Layers, onions, hexagons and the folly of application-wide abstractionshttps://www.ben-morris.com/onions-hexagons-layers-and-folly-of-application-wide-abstractions/
Sun, 03 Jun 2018 08:32:35 +0000http://www.ben-morris.com/?p=2678There are a well-established set of enterprise patterns that encourage us to think in terms of shared abstractions for data processing such as domain models and service layers. These tend to give rise to applications that are organised in layers where data and behaviour are arranged together according to these abstractions.

There are numerous variations on this theme. It is often described as an onion, presumably in response to the tears that are shed a few years down the line. “Clean” architecture extends this metaphor with a rule stating that dependencies can only point inwards, i.e. from the UI towards a central layer of data entities. Hexagonal architecture prefers a clean separation between the “inside” and “outside” parts of an application.

The central concern of these architectures is the separation of concerns. The architecture is not dependent upon any specific framework or technology. There is a separation between data processing and the UI. You should be able to test any significant functionality in isolation.

This kind of separation is a noble pursuit but establishing a common set of abstractions across an application can be very dangerous. Not only is it difficult to maintain over time, but it tends to give rise to inflexible applications that have serious scalability issues. It plays to the myth that a single universal architectural framework can be devised that will solve every problem.

An inflexible approach

Generic design approaches tend to be optimised for a very small number of requests in the system.

For example, establishing a common domain model to encapsulate all the business rules in one place sounds like a convenient way of organising processing logic. Over time any single implementation will struggle to meet the scale requirements of both large-scale data ingestion, tactical reporting and responsive interfaces.

Layered applications tend to be based on a very rigid set of common abstractions, e.g. a “controller” must talk to a “service” that must talk to a “repository”. This creates a mock-heavy code base that slows down development by insisting on applying the same solution to every problem.

These abstractions tend not be aligned with the business-orientated features that are implemented in systems. Feature implementations tend to get scattered between numerous layers of code, making them hard to understand, slow to implement and difficult to change.

Worse still, it can be difficult to protect and maintain these abstractions over time. Developers tend to implement features in the part of the code base that is most familiar to them. They don’t always understand the nuances of the abstractions that they are supposed to protect. Without rigorous policing you can find feature logic scattered incoherently around various layers as the abstractions are gradually eroded.

We need to solve a different problem now

There’s also something old fashioned about these layers and system-wide abstractions. Perhaps they belong in a different era where we were trying to scale client server applications for thousands of users. Layering a load-balanced presentation layer on top of data processing logic made a lot of sense as it helped to make applications more scalable and resilient.

Alas, this is not enough for the demands of more modern, high-volume applications. Layers tend to give rise to inefficient processing that is heavily dependent on a centralised database server. Most of the work is more about moving and translating data between layers rather than useful business processing.

The idea that design can be separated from deployment is a fallacy. Layers say nothing about how processing should be distributed. You need to consider how to handle exponential data growth, service peak loads and provide genuine resilience. None of these concerns are addressed by layered architecture.

How do autonomous services solve the problem?

Breaking an application down into smaller, autonomous service implementations addresses these challenges in two ways.

Firstly, it gives you much greater freedom in terms of the technology and design decisions you make when implementing features. You can adapt a solution that is tailored to the processing and scaling needs of each individual use case.

Secondly, it contains mess. This is one of the more darkly pragmatic benefits of service-based development. If you struggle to maintain the abstractions in any one implementation at least the impact is contained within a relatively small boundary. It does not create an enterprise-scale jumble of code.

It’s worth stressing that these patterns are still useful within individual service implementations. Repositories are a great abstraction that separate data processing code from the underlying data access technology. A service layer can provide a clarity over the available operations and the way they are co-ordinated.

The point is that you should have the freedom to pick and choose your abstractions, frameworks and technologies to suit each use case. A “clean” architecture based on layers tends to preclude this.

]]>How to decompose that monolith into microservices. Gently does it…https://www.ben-morris.com/how-to-decompose-that-monolith-into-microservices-gently-does-it/
Tue, 08 May 2018 09:52:39 +0000http://www.ben-morris.com/?p=2670It’s pretty straightforward to win buy-in for microservices. After all, who doesn’t want to reduce the cost of change while improving resilience and scalability? It sounds like an obvious solution for the problems that typically bedevil legacy monoliths.

The catch is that decomposition is a slow and complex process. It can be difficult to know where to begin. Microservices can help to simplify change over the long term, but they don’t necessarily make change simple. You are at risk of losing momentum and getting stuck in brand new, distributed quagmire.

Making peace with your monolith

You are unlikely to decouple a large and long-lived monolithic system in its entirety. These systems often based on a Gordian knot of database tables, stored procedures and spaghetti code that has been built up over ten years or more. This is not going to be re-organised very quickly, particularly when there is on-going demand for new features or support incidents to deal with.

You rarely get given the opportunity to focus on transitioning an architecture to the exclusion of all else. You may have to get used to the idea that decomposing a monolith is a direction of travel rather than a clear destination. Your monolith is unlikely to ever disappear entirely. What you are really doing here is reducing the size of the problem, providing a means of delivering new features and solving old problems as you go.

It’s worth bearing in mind that there are other ways to tackle the problems of a monolith. You can take a horizontal approach by removing concerns such as public-facing APIs or reporting into separate implementations. You can make a system more malleable by focusing on build, test and deployment automation, no matter how dire the legacy stack. Investing in the internal organisation of monolithic code can also pay dividends without the overhead of a microservice infrastructure. Microservices are not always the most sensible place to start.

From a technical point of view this means more than just ensuring you have functional deployment pipelines in place. You’ll also need to make decisions about concerns that include orchestration, instrumentation, service discovery, routing, failure management, load balancing and security. You’ll also need to consider how you will manage support, cross-team dependencies and governance in a distributed environment.

It’s surprising how quickly you can be overwhelmed by naïve service implementations. It only takes a few services to be in place before problems associated with monitoring and debugging a distributed environment can become unmanageable.

Starting small and safe

You should use the first few service implementations to validate your infrastructure. This is an opportunity to check that you are equipped to deliver services at pace, monitor them in production and deal with problems efficiently. It’s also a chance to negotiate any learning curves and get engineers used to any new ideas or technologies.

With this in mind you should avoid jumping in to the most difficult logic to begin with. The first few services should be the easiest. They will be small, their coupling to the rest of the system should be minimal and they may not even store any data. It doesn’t matter if they are scarcely used features, as the point is to establish a beachhead and get people used to delivering services.

Note that you can’t put off the more difficult services forever. The disadvantage with starting on easier areas is that you aren’t really delivering meaningful value. You cannot nibble away at the fringes of a problem forever in the hope that the core problem will somehow open itself up.

Managing dependencies with the monolith

You want to slowly pick off areas of functionality and reduce the size of the core application. This won’t happen if your new services have any lingering dependencies back to the monolith.

You should establish some hard and fast rules for the kinds of dependencies that you are prepared to support. If you really cannot avoid referring back to a feature in the monolith do it through a service façade. This can provide an architectural placeholder for a future service implementation or at the very least act as an anti-corruption layer.

Splitting up the data

Many monolithic systems have a gigantic shared database at their core. I have seen systems that contain hundreds of tables and thousands of stored procedures, not to mention the triggers and functions that help to bind them all together. The application database is generally the single biggest part of any decomposition challenge.

Getting round the problem by developing services on top of this shared database will not give rise to the resilience and flexibility that you would expect. You’re just re-arranging processing into modules rather than decomposing it. It’s not even a valid interim solution while you ease into decoupling. Everything is still coupled together via the monolithic data store in the middle of your application.

You have to bite the bullet from the start and decompose the data. Services should be responsible for persisting and managing their own data, even if this requires a painful and drawn out migration process.

Take a tactical approach

Bear in mind that extracting capabilities from a monolith is difficult. It’s time-consuming, complex work that is laden with risk. It can also be difficult to sell to commercial stakeholders who tend not be sympathetic to work that does not yield any visible functional improvements.

You can overcome this to an extent by taking a tactical approach and wrapping decomposition into the delivery of new features. The order in which you decompose services can largely reflect the functional roadmap agreed with stakeholders. If you maintain a strategic view of how your service landscape should play out then you will be in better position to seize decomposition opportunities when the roadmap allows.

Understand the domain

You will need a strategic plan that maps out the services you want to build. This will always be a work in progress, but taking an overly tactical and reactive approach will give rise to an incoherent set of services. You need to invest time in understanding your overall problem domain. Some areas will present themselves as obvious as immediate candidates for decomposition, others will need more time to come into focus.

Adopting a common understanding of the domain is essential, preferably one that is allowed to evolve and mature. Some of the more strategic concepts in Domain Driven Design (DDD) can be useful here, particularly an understanding of sub-domains and bounded contexts. These provide a mechanism for identifying your service boundaries and building a map of how decomposition could play out.

Another advantage of DDD is that it can force you to consider decomposition in terms of data and behaviour rather than code. It’s the capabilities that you are trying to extract and implement here, not the old code. This can help to protect you from “lift and shift” style service implementations where old mistakes are ported directly over to new implementations.

Big can be beautiful, particularly to start with

The DDD concept of bounded contexts is a good tool for considering service boundaries. It also tends to imply that services are going to be reasonably chunky. After all, they describe clusters of data and behaviour that have an internal consistency to them.

The single most important feature of a service is autonomy. A service should be completely independent in terms of release and execution. Too many small services tend to undermine this by creating a system that looks more like a set of distributed data objects. If you’re not sure where to draw a service boundary then make it big. You can always decompose it later.

Always finish what you start

One advantage of service-based development is that it allows for incremental delivery. You don’t have to complete an entire programme of work to demonstrate benefit.

This does depend on making sure that each service fully replaces the old code. It’s very common for service implementations to run side-by-side with older functionality. Newer customers use the new service, while legacy customers somehow never quite make it off the monolith. This isn’t really decompositions – it’s just a re-write.

]]>What makes a REST API mobile-friendly?https://www.ben-morris.com/what-makes-a-mobile-friendly-rest-api/
Wed, 25 Apr 2018 15:24:51 +0000http://www.ben-morris.com/?p=2662REST allows for considerable variation in terms of end-point design and implementation detail. Many design nuances are dependent on the clients that will be consuming the resources. APIs that are designed for server-based data integrations tend to look quite different from those that are designed to support mobile applications.

Mobile applications do face a particular set of challenges that should influence API design. Networks are always slow. Devices are even slower. Connections are unpredictable and users expect to be able to swap between devices seamlessly. Requirements are prone to sudden change. Developers tend to be in a hurry.

Added this is the fact that APIs and mobile applications are often built and managed by separate teams. This can give rise to numerous integration problems if the APIs are not explicitly targeted for mobile applications.

Consistency and convention

API usability is about predictability as much as anything else. One of the advantages of REST is that it provides a set of sensible and widely understood defaults for API behaviour. You should use them.

It’s easy to get consumed in unhelpful debates about what is and is not “RESTful”, but there are some basic expectations here. You should use GET to fetch data, DELETE to remove it, POST to create new resources and PUT for idempotent updates. Support content negotiation to let the client choose the content type. Use proper HTTP status codes so the client can tell when something has gone wrong (or right).

That said, consistency across an API is probably more important than conforming to widely-held convention. There’s nothing worse than an unpredictable API that implements different behaviours across different resources. If you’re going to be wrong, then at least be consistently wrong.

Division of labour

Any API design requires decisions around where processing should take place. You do not want your API to burden mobile devices with expensive data operations. You might also have several different client applications in development, each requiring similar functionality. Ideally, all a mobile application should have to do is render output and accept input – the server should do all the heavy lifting where possible.

There are nuances to observe here, of course. Over-reliance on a server for basic functionality can render an app useless if there’s no network coverage. Server-based processing can also give rise to an overly “chatty” relationship between application and API which can undermine performance.

Minimising calls

API calls are expensive. You should try and minimise the number that any application needs to make. A client should not have to make multiple, recursive calls to get the data it needs. It should also be able to pass updates in a single batched transaction rather than as a set of separate updates.

From a strict performance point of view the ideal is to populate each screen with a single call and allow data updates to be passed with a single call. There is a trade-off here, as coupling the API design to the application UX may leave it unusable for any other type of client.

Managing change

Managing change, particularly breaking change, is an acute issue for mobile applications as you will inevitably have numerous versions of clients in use and little direct control over when they are upgraded.

API versioning can be a minefield. If you adopt a strict versioning strategy you will have to maintain multiple API versions in production. Anticipating future changes by adding optional parameters and wildcard properties tends to give rise to a confusing and messy design. Likewise, for an evolutionary approach where explicit versioning is rejected in favour of augmenting but never changing APIs.

The challenge here is to manage inevitable change in an API whilst providing a stable contract to clients. This is a two-way street as your API consumers also have responsibilities here. Ideally, they should seek to be tolerant readers that only consume the parts of any response that they really need. Consumer-driven contracts can help to clarify the expectations that consumers have of their APIs so developers are more aware of braking change.

Implementing HATEOAS can reduce the scope of breaking change by giving the server control over feature availability and de-coupling clients from the format of URIs. This does rather rely on clients to do their part and use the API “properly” by following links provided in responses. The catch with HATEOAS is that the “correct” style of client integration is purely voluntary and there’s no way to enforce it.

Flexible resources

Mobile API development tends to be at the mercy of numerous UX-driven interface tweaks that require corresponding changes to data. It’s easy to get locked into a repeating bottleneck where small feature requests get bounced back and forth between mobile and API teams.

Ideally an API should offer a degree of flexibility, allowing clients to control the shape of a response via query string arguments. There are query formats such as OData, GraphQL and ORDS that provide a standard for this kind of flexibility. They allow clients to filter data rows, implement paging or modify ordering without requiring any code changes on the API.

These kind of query formats do help to support more flexibility in UI design, though this often comes at the cost of increased complexity both of the technical implementation and API design. These formats also tend to be opinionated about the underlying API architecture. OData in particular couples underlying data stores to public-facing end-points in a way that is not to everybody’s taste.

These query formats, though convenient, also have the potential to weaponise API requests. In exposing a flexible API you may be writing cheques that you cannot really cash if consumers start hitting you with expensive custom queries.

That said, you should try to meet your clients half way by at least providing some flexibility over the format of responses. At the very least, allowing for pagination, ordering and basic filtering through query string arguments can give clients a lot more room to make UI adjustments without requiring corresponding code changes in APIs.

Documentation

No matter how intuitive or RESTful your API may be, it will need relevant and accurate documentation to be at all useful. Alas, this is often left to the last minute and executed by hurried development teams who are not accustomed to producing consistent and accurate documentation.

Tools such as Swagger and API Blueprint can help to provide some level of automated documentation. They may have their detractors (including me) but they can at least reflect whatever is in production.

Automated documentation can provide a basic description of end-points, but it can never explain context such as explaining features or advising on best practice. This richer information is just as important as lists of resources and it can only really be maintained manually, preferably by a technical writer.

Considering offline operation

You cannot assume that a single device will have a consistent and unbroken connection to an API. At the very least you will need to consider the effects of offline activity and how you might reconcile any data that it generates.

You may also want to ensure fluid operation across multiple devices as users switch between mobile, tablet and PC. They will expect a seamless experience with data being shared freely and securely between different clients. This is more than just an application concern. For example, an API may need to implement a system of timestamps to marshal concurrent updates sent from multiple devices.

Encryption

Use SSL all the time, even in development. It makes life easier in the long term as you won’t be catching encryption issues as you switch between environments. There’s really no excuse for not using signed certificates, particularly since the advent of Let’s Encrypt.

Error handling

When something goes wrong you need to provide meaningful information to a client can respond appropriately. This means making use of appropriate HTTP status codes for common conditions such as poorly formatted requests (400 Bad Request) and referring to records that do not exist (404 Not Found).

It should also mean providing helpful responses for unexpected errors, so a client can respond appropriately. Verbose error messages tend not to be much use for a mobile application, other than to confuse the user with technical detail. A better strategy is to log detailed error conditions on the server and return error codes that an application will be able to respond to sensibly.

Caching strategies

Mobile APIs need to be fast. Anything you can do to push data closer to client applications or reduce the amount of data being processed is worth considering. Caching static resources in a CDN can help to make an API more responsive, or if that isn’t possible your API could leverage a fast memory-based data store to speed up responses for more common requests.

HTTP also provides a few different strategies for response caching, such as using ETags, the If-Modified-Since header or the Cache-Control header. All these options help to reduce the amount of data being transferred, though the most appropriate option will largely depend on your data and how it has been persisted.

Using compression

HTTP compression is supported by many clients, so it makes sense to enable it on servers. Anything you can gain in terms of bandwidth usage is worth having, particularly when it is so easy to implement.