Building on the previous installment of this series, we now take the first steps to creating our own custom adapter for the Compose Transporter.

In thefirst installment of this series, we dissected Transporter’s simple File adaptor to gain some understanding of how an adaptor is put together. Building on our findings, the aim here is to write the simplest possible adaptor for the IBM Cloudant database. It will consist of an “all in one” source, and it will have a sink which writes one message at a time, making no attempts at buffering.

Setup

To make the most of this, it helps if you are reasonably familiar with Go development . I’m using

$ go version
go version go1.9.2 darwin/amd64

When you come to write your own adaptor, the first thing you need to do is to fork the Compose Transporter github repository.

The go-cloudant library

Talking to a Cloudant database over HTTP is quite simple, so we could rely on Go’s own HTTP libraries and talk directly to the database API. However, if you have a poke around the adaptors already shipping with Transporter, you’ll notice that they tend to rely on existing database access client libraries for the heavy lifting. This is a good idea for several reasons, but mostly because it keeps adaptors simple.

Cloudant provides officially supported libraries for several languages, but unfortunately Go isn’t one of them, so we need to look elsewhere. For this exercise, we’ll be using go-cloudant , which is open source and under active development in the community.

Let’s install the go-cloudant library:

go get github.com/cloudant-labs/go-cloudant

Client

Our first step is to implement the Client . Recall from theprevious article that the Client implements the Client interface, and its job is to represent the underlying database and to be able to provide a Session –an authenticated, active connection. When we looked at the File adaptor, there wasn’t much to do for this, but our new adaptor will need to do a bit more. Here are the main parts of the file adaptor/cloudant/client.go :

This code calls CreateClient() from the go-cloudant library, which opens the connection, and then GetOrCreate() to select the database we’re working with.

It needs to be able to create a new database as this is the expected behavior for a sink. The method GetOrCreate() handles this, but note that due to Cloudant’s permissions model, this requires account-level credentials, and can’t be achieved using an API key.

Adaptor

The next step is the adaptor entry point. Again, recall from the previous installment that we need to hook into the Transporter machinery that lets it create one of our new adaptors from a JavaScript object representation holding any configuration parameters required for us to create a Client .

These complete the Adaptor API, allowing the Transporter to access that description and sample config we set up earlier. This bit of code will be pretty much boilerplate for any adaptor we write. The difference comes in the implementation of the Source and the Sink.

Source

The source needs to implement the client.Reader interface. Our opening code mostly sets up that implementation. The only difference is we’re importing the go-cloudant package:

In terms of the implementing the Read function, it should read all the records, and on request provide them in sequence, converted to Transporter Messages. For Cloudant, we’ll read the database’s changes feed.

We can rely on the changes feed reader that’s provided by the go-cloudant library. It has lots of tricks up its sleeve, but in this instance, we’re using it in the most direct possible way: it’s reading the complete changes feed in one batch, but trickling back the documents over the changes channel. For our purposes, this is just fine, but for production use we’d want to change this to use a continuous feed instead, allowing us to stream documents and follow an evolving source.

The makeMessage() function tries to classify each changes feed event into inserts, deletes and updates. In Cloudant, this isn’t always obvious, as it treats all document modifications (including deletions) as inserts. We’re saying that if a revision id has the prefix 1- it’s an insert. If you’re interested in learning more how this hangs together, check out the Cloudant documentation .

The Cloudant changes feed is only partially ordered–there is no guarantee that the order in which you see events on the changes feed correspond to any defined ordering in terms of how the documents were modified on the cluster.

In our Write function we switch on the message operation type, and call dedicated helper functions insertDoc() , updateDoc() and deleteDoc() accordingly. Those helper functions need to do a little bit of house-keeping. In order to update or delete a Cloudant document, the document must contain both of the _id and _rev fields. This is core to the Cloudant MVCC (Multi-Version Concurrency Control) system. If we get a delete or update operation where the data does not contain either of these fields, the sink will limp on with an error.

Unit tests

Tests are only required if your code contains bugs, right? Wrong. Go’s approach to unit testing is blissfully lightweight. Let’s create some tests so that we can refactor with impunity. We’ll thank ourselves later when we come to extend this work for part 3 of this article series.

We have three components through which data will flow:

The Cloudant database itself

The go-cloudant client library

Our new Transporter adaptor

Whilst we don’t need to test the functionality of 1 and 2, we do need to test the end-to-end connectivity. Ultimately, we want to test that our source adaptor can read data from the underlying database and that the sink adaptor can write data to it. It may be possible to mock out the database itself, but for tutorial purposes, mocked tests are harder to understand. Instead, we’re going to run our unit tests against a local, single-node CouchDB instance in a docker container.

CouchDB has the same API as Cloudant, so for our testing purposes they’re close enough, even with the old version we’ll be using. To run CouchDB in docker, you obviously need docker installed .

The first line runs CouchDB 1.6 in docker, exposing port 5984, which is the standard CouchDB port. The second command saves us some keystrokes. The third command disables CouchDB’s admin party mode by creating an admin user, and the last command creates a database called testdb in the admin account. We should now be able to let our tests interact with the local CouchDB as user admin and the password xyzzy . The go-cloudant library does not allow non-authenticated ( admin party ) connections.

The tests contain more lines of code than the adaptor itself, which isn’t unusual. We won’t examine them all, but instead, let’s look at the principles they share. The source tests are found in adaptor/cloudant/reader_test.go . It does the following:

Set up a back channel connection to the database — this is just a go-cloudant:CreateClient() that we can use to poke test data into the database without using Transporter.

Create a Cloudant adaptor and Connect() it to the database.

Use the back channel to store test data in the database.

Call the adaptor’s Read() method, which gives us a reader function.

Drain the message channel we get back from calling the reader function with the Session()

If the number of messages on the channel equals the number of test documents, the test is successful.

The sink test shares a lot of the setup, but here we generate test data in the test itself and use the Write() function that the adaptor provides. We rely on a feature of Transporter’s messages that you can request a confirmation that they’ve been written. Here are the salient parts:

If you’re the paranoid type, you could instead use the backchannel approach and verify independently that the messages made it to the backend. If you’re the sensible type, you should also test the ops.Update and ops.Delete cases, for which pull requests are most welcome.

In order to run the unit tests for this adaptor, run the following command in the root of the transporter repo:

Running the adaptor

Unit tests are great, but let’s try to use the adaptor in anger, as Compose intended. We did this in theprevious article when we tested the File adaptor. Before we dive into that, we need to do a full build:

go install ./cmd/transporter/...

Let’s see if we now can use a File source and a Cloudant sink. You know the drill:

Our work here is done. The full source code for the simple Cloudant adaptor can be found here .

Conclusion

Building on our investigations in the first installment, we’ve managed to implement a simple adaptor for IBM Cloudant, functioning both as a source and as a sink. Mission accomplished.

However, our adaptor has a number of obvious shortcomings, and a few non-obvious ones, too:

It reads the changes feed in one big JSON gulp. This means that it can be a resource hog, even if the go-cloudant library tries its best to stream the documents.

It’s not able to continuously tail an evolving source.

The sink writes one document per HTTP request which is a very inefficient way of loading large numbers of documents.

It does not implement the resume buffer functionality that Transporter has available.

In some cases, we can’t safely use a Cloudant sink with a Cloudant source!

The last point isn’t at all obvious and requires some deep Cloudant-fu to understand. The way our sink writes documents means that the sink database is responsible for generating revision ids. This isn’t what we want: we want the sink to simply take the documents as they are from the source (if the source is a Cloudant database) and retain their revision ids.

As it stands, when the sink is passed documents that already contain revision ids it assumes that these represent document updates . If such revision ids aren’t already present (they won’t be) those updates will be rejected as conflicts. As a result, a Cloudant-Cloudant Transporter pipeline in its current state won’t function correctly.

The Cloudant replicator has to contend with the same issue for the same reasons, and in order to resolve this we’d need to do what the replicator does in this situation.

In the final installment, we’ll address some of these issues to make our Cloudant adaptor more useful for real ETL workloads.

Read more articles about Compose databases – use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line [email protected] We’re happy to hear from you.