In this tutorial, we will take a look at how Kafka can help us with handling distributed messaging, by using the Event Sourcing pattern that is inherently atomic. Then, by using a pattern called Command-Query Responsibility Segregation (CQRS), we can have a materialized view acting as the gate for data retrieval. Finally, we'll learn how to make our consumer redundant by using consumer group. The whole application is delivered in Go.

Why We Need Microservices

The most common argument that calls for microservices is scalability first and foremost. As an application grows, it can be hard to maintain all the code and make changes to it easily.

This is why people turn to microservices. By decomposing a big system and creating various microservices for handling specific functions (e.g. a microservice to handle user management, a microservice to handle purchase, etc.), we can easily add new features to our application.

The Challenges of Building Microservices

However, building a microservice can be challenging. One of the challenges is atomicity — a way of dealing with distributed data, inherent to microservice architecture.

Querying is also a challenge. It can be quite difficult to do a query like this when a customer and an order are two different services:

The two architectural patterns are that are key for creating a microservice-based solution are Command-Query Responsibility Segregation, and Event Sourcing, when it makes sense.

Not all systems require event sourcing. Event sourcing is good for a system that needs audit trail and time travel. If the system in question needs only basic decoupling from a larger system, event-driven design is probably a better option.

Kafka in a Nutshell

If we compare Kafka to a database, a table in a database is a topic in Kafka. Each table can have data expressed as a row, while in Kafka, data is simply expressed as a commit log, which is a string. Each of the commit logs has an index, aka an offset. In Kafka, the order of commit logs is important, so each one of them has an ever-increasing index number used as an offset.

However, unlike a table in a SQL database, a topic should normally have more than one partition. As Kafka performance is guaranteed to be constant at O(1), each partition can hold thousands, millions, or even more commit logs, and still do a fine job. Each partition then holds different logs.

Partitioning is the the process through which Kafka allows us to do parallel processing. Thanks to partitioning, each consumer in a consumer group can be assigned to a process in an entirely different partition. In other words, this is how Kafka handles load balancing.

Each message is produced somewhere outside of Kafka. The system responsible for sending a commit log to a Kafka broker is called a producer. The commit log is then received by a unique Kafka broker, acting as the leader of the partition to which the message is sent. Upon writing the data, each leader then replicates the same message to a different Kafka broker, either synchronously or asynchronously, as desired by the producer. This Producer-Broker orchestration is handled by an instance of Apache ZooKeeper, outside of Kafka.

Kafka is usually compared to a queuing system such as RabbitMQ. What makes the difference is that after consuming the log, Kafka doesn't delete it. In that way, messages stay in Kafka longer, and they can be replayed.

Setting Up Kafka

In this section, we will see how to create a topic in Kafka.

Kafka can be downloaded from either Confluent's or Apache's website. The version that you need to download is in the 0.10 family. We’ll be using 0.10.1.0 in this tutorial.

An event does not contain all of the data for an account, e.g. the account holder's name, balance, registration date, and so on. An event contains only the name of the event, and the necessary fields such as the ID and the changing attribute.

The whole snapshot exists only as a mere reflection of past events. That way, by using events, we can recreate the data up to the point we desire.

Let's start with creating a new folder, banku, and then use govendor to initiate the directory and serve for the project's dependency management (yes, Gohas a lotof them, and no de facto standard yet).

$ govendor init
$ govendor add +external

We will be using govendor fetch instead of go getto add a vendor or dependency for Banku. However, we need togo get` ginkgo and gomega for a BDD-style testing.

Note that we need to install and import the go.uuid library since we are using uuid in the NewCreateAccountEvent, which is an imported package. We may also use another package/technique for generating the ID.

We need to define 3 other structs and functions for the deposit, withdrawal, and transfer events. You may write them yourself, or look up the implementation here.

Nothing will happen, since we haven't created the consumer that will process all those messages.

Consuming Events

Whenever there is an event coming, a consumer must set a clear contract, whether the event is for event sourcing or command sourcing. While both can be replayed, only event sourcing is side-effect free.

Carefully designing the contract allows us to avoid executing unnecessary commands. For instance, upon receiving a UserCreated event, the user should receive a welcome email. Then, replaying the same event must never send the email again by contract. In our case, all three events are purely event sourcing.

Since event sourcing stores the current state as a result of various events, it would be time consuming to look up the current state by always replaying the event. In such cases, CQRS comes in handy, as it allows us to maintain a materialized view as a result of the events we are receiving.

First, we need to create a process() function for each Event in a file named processor.go:

Note that we are using sarama.OffsetOldest, which means that Kafka will be sending a log all the way from the first message ever created. This may be good for development mode since we don't need to write message after message to test out features. In production, we definitely would want to change it with sarama.OffsetNewest, which will only ask for the newest messages that haven't been sent to us.

The new newKafkaConsumer function is then defined at kafka.go as follows:

Ideally, the consumer and the producer are residing in altogether different source code repositories. However, to make it really short and convenient for us, we combine them into a single source code repository.

Therefore, our main() function must be able to tell if the user intended to start the program as a producer, or as a consumer. We will be using flag to help us with that.

import("flag"...)funcmain(){act:=flag.String("act","producer","Either: producer or consumer")partition:=flag.String("partition","0","Partition which the consumer program will be subscribing")flag.Parse()fmt.Printf("Welcome to Banku service: %s\n\n",*act)switch*act{case"producer":mainProducer()case"consumer":ifpart32int,err:=strconv.ParseInt(*partition,10,32);err==nil{mainConsumer(int32(part32int))}}}

To start the application as a consumer, invoke the application and pass the "act" flag with the consumer:

gobuild&&./banku--act=consumer

As soon as it runs, it will fetch messages out of Kafka, and process them one by one by invoking the Process() method on each event we have previously defined.

Clustering Consumer Instances

Answer the following question: what if a Banku consumer died? The program may have suddenly crashed, or the network is gone. That's why we need to cluster the consumer, in other words, group the consumer.

What happens is that we have all consumer instances running, but labeled the same. When there's a new log to send, Kafka will send it to just one instance. When that instance is unable to receive the log, Kafka will deliver the log to another subscriber within the same tag label.

This mechanism has been available since Kafka 0.9. However, the Sarama library we are using doesn't support it. That's why we will use Sarama Cluster library instead. It's really simple to make a grouped consumer.

First, govendor needs to fetch it:

govendor fetch github.com/bsm/sarama-cluster

We need to change our mainConsumer methods so that the customer is instantiated from the cluster library:

We will achieve this by utilizing a Dockerfile to specify the host our Banku application will be running on, as well as docker-compose to link together our Dockerfile with other dependencies, which is Redis in our case.

Create a Dockerfile at the root folder on Banku, specifying the FROM and MAINTAINER indicating the base image, and the maintainer info respectively:

FROMgolang:1.8.0MAINTAINERAdamPahlevi

After that, let's run the go get commands to install our dependencies: govendor, ginkgo and gomega:

Next, we need to copy our Banku folder at the host machine to the container:

ADD./go/src/banku

Then, let’s make sure the working directory is set at our folder:

WORKDIR/go/src/banku

Lastly, we need to install any other dependencies needed by our Banku application:

RUNgovendorsync

To test our Dockerfile locally, we can run the following commands at shell:

$ docker build -t banku .
$ docker run -i -t banku /bin/bash

You will then get connected to the container's bash shell, and if you run ginkgo our tests will fail since there is no Redis instance running locally.

Why don't we just include Redis as an image in the Dockerfile, so we can RUN any command we want?

Well, Docker views things layer by layer. Our application is just another layer, and Redis is a completely different layer. The docker-compose.yml and its docker-compose command is where we "connect" those layers to work together.

Let's create a docker-compose.yml in the same folder with our Dockerfile, specifying the file version:

version: "2.0"

We want to have two services running when docker-compose is running this file, namely:

app, our Banku application

redis, layer which redis-server will run

To do that, we specify the services as follows:

version: "2.0"
services:
app:
redis:

Each service can either use image, or build. Specifying image will make the composer pull the image from Docker repository. On the other hand, when build is specified, a Dockerfile will be executed instead.

Our local build is successful. A great thing about Docker is that, by using just these two files, Semaphore is able to pick up and do the things necessary for running our tests. In short, this is what we hear about when people refer to infrastructure as a code.

The enabling idea of infrastructure as code is that the systems and devices which are used to run software can be treated as if they, themselves, are software. — Kief Morris

Then, choose the master branch so that Semaphore analyzes the code when a pull request is made on this branch.

When it finds out that the repository contains a Dockerfile and docker-compose also, it will allow us to choose between platforms. Let's just choose the recommended platform — Docker.

We can then set the setup script to:

docker-compose build

And the test script to :

docker-compose run app ginkgo

Those settings can later be found in Project Settings > Build Settings. Semaphore will run our test with the specified commands, and all of our tests should pass.

If you want to continuously deliver your applications made with Docker, check out Semaphore’s Docker platform with full layer caching for tagged Docker images.

Conclusion

In this tutorial, we have learned how to create and dockerize an event sourcing microservice in Go, by using Kafka as a message broker. We used Semaphore to perform continuous testing in the cloud.

If you have any questions or comments, feel free to leave them in the section below.

Adam Pahlevi Baihaqi

Adam Pahlevi takes pride in solving problems using clear and efficient code. In addition to writing, he enjoys giving talks, as well as receiving non-spam "Hi, Adam!" emails. He is an engineer at Wego.

Set up continuous integration and delivery for your project in a minute.