The sudden growth of traffic has been both amazing and a real challenge to scale, not on our production stack, but our analytics pipeline as our app started to generate half a billion events a day and counting. Our analytics stack was starting to experience failures due to scaling issues, the costs were rising with how we were running the cluster in the cloud and our 6 person team was challenged in supporting it while also building, scaling and supporting the Zenly app.

What we wanted was to build a new analytics pipeline on bare metal as it provides an cost for performance – but orchestrating and managing a bare metal cluster and the potential failures is not something we had the time nor resources to do.

It was around this time that the Docker engineering team reached out to us and presented us all the features of Docker 1.12. So we decided to give it a shot and started deploying our current analytics stack on bare metal with Docker 1.12.

Our stack involves several clusters of Kafka, Zookeeper and Spark nodes. On this cluster we run regular Spark batch jobs as well as several long-living Spark Streaming jobs which need extra reliability. For instance, one of them consumes Kafka and performs real-time breakdown of currently active users and exposes the results through websocket and a REST API for real-time dashboarding.

Part of this deployment required mirroring a subset of our production Kafka to a dedicated analytics one that would, among other things, increase retention time. This mirroring would be done by a custom Kafka GRPC proxy that we’ve written internally.

Docker 1.12 made the setup quite simple. Once each bare metal machine was configured, we set up the swarm. In just two minutes we managed to setup a fully working Spark cluster, approximately the time to learn to new service command and launch it twice! With Spark out the way we continued with every system running on our data pipeline, from zookeeper and kafka to elasticsearch. Within the first half hour we had the complete stack running, discovering some nice options along the way. As we would have done on other setups, we were able to disable per service the vips and run only on DNS discovery, and yet have them enabled when we needed it. Docker 1.12 really fit immediately into our setup.

We had already been running Docker containers in our production systems, so we happened to already have a lot of Docker images. After getting a Docker 1.12 swarm up and running, we wanted to know how much work would we have to do to adapt our images to make them work with 1.12. We found out that it was actually none.

In the process of testing an early version of 1.12, we did encounter some bugs but that’s where we need to congratulate and thank the Docker Team for jumping in, helping us as we worked through the implementation and taking our feedback. We are excited for 1.12 to GA and to see how much more we can stress (data points) we can add to our new analytics stack.