Kolo Rahl's Blog

The Path to Containerized Services

Published 2017 Dec 21, 03:35

This is designed to be a multi-part article. This first post, the introduction,
talks about what the plan was and how it came to be along with some of the
concerns and questions we had prior to starting implementation. The following
parts discuss the exact details behind each of the particular goals we were
trying to achieve.

It began with a con…

While I was the Software Architect at Loot Crate there came a task where the
company was gearing up for a spin-off of their core product(s), which would
later be known as Sports Crate. I was told that I would head up the technical
design and architecture for that project, as well as contribute as the
engineering lead in the beginning while I got the team ramped up on everything.

This was shortly before KubeCon 2016, and I knew that I wanted to use Docker
and Kubernetes if possible. The vision I had seemed like it would be most
easily realized through the use of Docker containers and the up-and-coming
Kubernetes container orchestration tool. So in order to acquire the most
information possible on Kubernetes and how it is used in production, I asked the
company to send me and some of my colleagues to KubeCon 2016, which they gladly
obliged.

The event was wonderful. We meet a lot of interesting people working on
interesting projects in the container technology space. Monitoring, deployment,
development flows, image verification, hosting solutions, and more were present.
The most intriguing part of it all, however, was that no one really knew the
“right” way of doing anything in this space yet. Sure, people knew how to
monitor a giant cluster of services, but deploying those monitoring tools as
either a separate service container or bundling it into existing containers was
new space. And automating a lot of the boring stuff was really new and different
companies had different ideas on how to tackle the issue. Overall it was a great
font of knowledge that we eagerly chugged down.

I used the remainder of November to consolidate the notes from the convention
and test out a few ideas to see if they would even be possible in this budding
architecture design I had growing in my head. It seemed more and more likely as
I did more tests, and I confirmed with some testing that our Director of DevOps
was doing in tandem with my own work, and we agreed that this path was possible.

Initial Architecture

In December I began to hammer out the official design. I’ll post other articles
to talk about some of these design points in more detail, but here was the basic
architecture:

Use Docker for all deployable services. The application code, databases,
anything we wanted to use in our new system would have to be deployed from a
container image, and we would use Docker to build/fetch those images.

Use docker-compose for local development. One of the major pain-points with
local development is that the environment you’re developing in tends to be
significantly different from the one you’re deploying to. The use of
docker-compose significantly reduced that disparity.

Use Jenkins for automation. Not just kicking off tests, but general
automation. Specifically we used it to build images, run tests, upload images,
and kick off deployments.

Use PagerDuty to automate the on-call rotation and escalation rules, with
alerts typically coming in from Datadog.

Use Cloud Pub/Sub to build an asynchronous and decentralized message bus. This
was primarily to allow our legacy software and our new prototype work to
communicate effectively.

The “legacy” platform at Loot Crate was written as a monolithic Ruby on Rails
application. Our front-end developers were writing and managing ERB and
Coffeescript files, which we already knew we didn’t want in the prototype. Aside
from merely trying to improve the technology we were using, I also wanted to
improve our process, so I wanted to separate the development and deployment
needs of front-end and back-end code. The prototype therefore built a Grape
API (still using Ruby) for the back-end and a React project for the front-end. I
will speak about this - why we did it and what it accomplished - in another
article, but the TL;DR is that this method allowed back-end and front-end teams
to develop and deploy asynchronously, greatly speeding up time-to-deployment.

Concerns and the Unknown

There was a lot of new stuff in this design that most people in the company
hadn’t even heard of before. Kubernetes was still new, I was the only person to
ever use Docker before, no one had used Google Cloud, and many people weren’t
sure how/why to use a message queue. We were also using Heroku for deployments
and some third-party CI service for automated testing, so moving away from both
of those things onto a completely custom solution (Kubernetes + Jenkins) was
quite polarizing for some. Overall people were excited though… assuming we
could bring it all together.

The biggest concern was: if we start implementing all of this in January could
we have a functioning website/service deployed by mid-March? Well, some of the
design decisions were made specifically to accommodate such a timeline. For
example, we ultimately went with Google Cloud because they supported a lot of
what we wanted to do, like container image hosting and Kubernetes-as-a-service,
at a time when AWS wasn’t touching that stuff. I also knew my team well and had
no concern that they wouldn’t pick up on the work I started and see this through
to completion. And again, we made some concessions to ensure developers would be
as effective as possible, such as using Ruby as the back-end language since all
the back-end engineers knew it well. One of the reasons we were comfortable
going with React was because our front-end engineers had been learning React up
until that point in preparation for a switch in the legacy system, so they were
all trained up and ready to go with that.

I made an estimated timeline and worked closely with our Director of DevOps and
Product Manager to ensure that all of our requirements were within reason and
that if we were cutting any corners that they were known about ahead of time. I
didn’t mind having a less-than-perfect prototype because that’s why you build
prototypes, to test things without worrying about getting it exactly correct.
Everyone was on board with that and by the end of December we had pinned down
our largest tasks according to estimated effort and time-to-completion. Getting
docker-compose to work was a large developer task since no one had used it
before. Getting Kubernetes configured and working was a large DevOps task.
Setting up Jenkins was a medium task but crucial since it automated all of our
builds and deployments. And finally there was the task to get the back-end and
front-end work completed so that we had something customers could interact with,
and that was actually the least risky part of all.

Coming Up

The following articles are planned to provide more detail about parts of the
design, as well as a final article that discusses the successes and failures of
the architecture.