Author: Luciano Molinari

I remember a few years ago the adventure developers would face in order to run integration tests on their machines, as they had to install and configure external services (usually some RDBMS and application servers), use some shared testing environment, or something similar. Nowadays, docker has simplified a lot that situation, once external dependencies are available as images that can be used in any environment.

However, today it’s rather common to see applications that rely on many more external services than before, like different databases. Docker compose is another tool in the docker universe that helps applications to define all of their external dependencies, making much easier and cleaner to run (mainly) integration tests. But sometimes, only these tools are not enough. One still needs to start the containers, run some initialisation steps (like creating database, messaging topics, etc) and only then run their tests. For that, one can still rely on a quite old but popular tool: Makefiles.

In this article we’ll use a very simple application that has a repository class responsible for adding and retrieving entities in/from a MySql database. We also want to create a few integration test cases for this class using a real MySql server. The application is written in Scala and managed using SBT, but don’t get distracted by these details. The ideas shown here can be applied to any other programming language/building tool, as they are more related to the infrastructure than to the application itself.

I was recently doing the Cloud Computing Specialization on Coursera and its capstone project is about processing a set of datasets with batch and then with streaming. During the streaming part, I wanted to use Spark and then I saw this new streaming project called Spark Structured Streaming, which I ended up using for this project, along with other technologies, like Kafka and Cassandra.

Spark Structured Streaming is a new streaming engine built on top the Spark SQL engine and Datasets. With it you can create streaming applications using a higher level API, without really having to care about some of the nuances required with the previous Spark Streaming based on RDDs, like writing intermediate results.

The Spark Structured Streaming Programming Guide already does a great job in covering a lot about this new Engine, so the idea of this article is more towards writing a simple application. Currently, Kafka is pretty much a no-brainer choice for most streaming applications, so we’ll be seeing a use case integrating both Spark Structured Streaming and Kafka.

I recently had to write some Spark jobs and deploy them on AWS/EMR. The jobs also had external dependencies, like Cassandra and Kafka. As the datasets were not that big, I decided to run this full stack locally during development using Docker and Docker-Compose.

Cassandra is a fantastic database system that provides a lot of cool and important features for systems that need to handle large amounts of data, like horizontal scalability, elasticity, high availability, distributability, flexible consistency model among others. However, as you can probably guess, everything is a trade off and there’s no free lunch when it comes to distributed systems, so Cassandra does impose some restrictions in order to provide all of those nice features.

One of these restrictions, and the one we’ll be talking about today, is about its query/data model. One common requirement in databases is to be able to query records of a given table by different fields. When it comes to this type of operation, I consider it can be split into 2 different scenarios:

Queries that you run to explore data, do some batch processing, generate some reports, etc. In this type of scenario, you don’t need the queries to be executed in real time and, thus, you can afford some extra latency. Spark might be a good fit to handle this scenario, and that’ll probably be subject of a different post.

Queries that are part of your application and that you rely on to be able to process requests. For this type of scenario, the ideal is that you evolve your data model to support these queries, as you need to process them in an efficient way. This is the scenario that will be covered in this post.

Scala provides nice and clean ways of dealing with Future objects, like Functional Composition and For Comprehensions. Future objects are a great way of writing concurrent and parallel programs, as they allow you to execute asynchronous code and to extract the result of it at some point in the future.

However, this type of software requires a different mindset in terms of reasoning about it, be it while writing the main code or while thinking about testing it. This article will focus primarily in testing this type of code and, for this, we’ll be looking into ScalaTest, one of the best and most popular testing frameworks in the Scala ecosystem (If you want to take a look at running integration tests with Docker/Docker-compose and Makefiles, take a look at this post).

When starting with Apache Kafka, you’re overwhelmed with a lot of new concepts: topics, partitions, groups, replicas, etc. Although Kafka documentation does a great job in explaining all of these concepts, sometimes it’s good to see them in practical scenarios. That’s what this post will try to achieve, by pointing a few findings/scenarios and solutions/explanations/links for each of them.

In this second and last part, we’ll see how to use Akka HTTP to create RESTful Web Services. The idea is to expose the operations defined in the first part of this article, using HTTP and JSON.

Akka HTTP implements a full server/client-side HTTP based on top of Akka Actors. It describes itself as “a more general toolkit for providing and consuming HTTP-based services instead of a web-framework“. One interesting aspect is that it provides different abstraction levels for doing the same thing in most of the scenarios, i.e., provides high and low level APIs. Akka HTTP is pretty much the continuation of the Spray Project and Spray’s creators are part of Akka HTTP team. In our example, we’ll be using the high level Routing DSL to expose services, which allows routes to be defined and composed by directives. We’ll also use the Spray Json project to provide the infrastructure needed to use JSON with Akka HTTP.