In the quickstart guide, we showed you how to get up and running with a simple file connector using Kafka Connect. In this section, we provide a somewhat more advanced tutorial in which we’ll use Avro as the data format and use a JDBC Source Connector to read from a MySQL database. If you’re coming from the quickstart and already have all the other services running, that’s great. Otherwise, you’ll need to first startup up Zookeeper, Kafka and the Schema Registry.

Note

Schema Registry is a dependency for Connect in this tutorial because we will need it for the avro serializer functionality.

It is worth noting that we will be configuring Kafka and Zookeeper to store data locally in the Docker containers. For production deployments (or generally whenever you care about not losing data), you should use mounted volumes for persisting data in the event that a container stops running or is restarted. This is important when running a system like Kafka on Docker, as it relies heavily on the filesystem for storing and caching messages. Refer to our documentation on Docker external volumes for an example of how to add mounted volumes to the host machine.

For this tutorial, we’ll run Docker using the Docker client. If you are interested in information on using Docker Compose to run the images, skip to the bottom of this guide.

To get started, you’ll need to first install Docker and get it running. The CP Docker Images require Docker version 1.11 or greater. If you’re running on Windows or Mac OS X, you’ll need to use Docker Machine to start the Docker host. Docker runs natively on Linux, so the Docker host will be your local machine if you go that route. If you are running on Mac or Windows, be sure to allocate at least 4 GB of ram to the Docker Machine.

Now that we have all of the Docker dependencies installed, we can create a Docker machine and begin starting up Confluent Platform.

Note

In the following steps we’ll be running each Docker container in detached mode. However, we’ll also demonstrate how access the logs for a running container. If you prefer to run the containers in the foreground, you can do so by replacing the -d flags with --it.

You can confirm that each of the services is up by checking the logs using the following command: dockerlogs<container_name>. For example, if we run dockerlogskafka, we should see the following at the end of the log output:

{"id":1,"name":{"string":"alice"},"email":{"string":"alice@abc.com"},"department":{"string":"engineering"},"modified":1472153437000}{"id":2,"name":{"string":"bob"},"email":{"string":"bob@abc.com"},"department":{"string":"sales"},"modified":1472153437000}
....
{"id":10,"name":{"string":"bob"},"email":{"string":"bob@abc.com"},"department":{"string":"sales"},"modified":1472153439000}
Processed a total of 10 messages

We will now launch a File Sink to read from this topic and write to an output file.

Once you’re done, cleaning up is simple. You can simply run dockerrm-f$(dockerps-a-q) to delete all the containers we created in the steps above. Because we allowed Kafka and Zookeeper to store data on their respective containers, there are no additional volumes to clean up. If you also want to remove the Docker machine you used, you can do so using docker-machinerm<machine-name>>.