Tools That Made Our Microservices Easier

by
Paul Hallett
on Tuesday, 24 Nov 2015

Over the past year the codebase powering lyst.com has grown exponentially. (Coincidentally, so has the number of occupied desks in the office). We started to experience issues getting new features and services built fast, even with our nice development pipeline.

Some of the problems we started to face included:

Slow tests because our entire test suite is run for each pull request. We have 3274 python tests and 622 javascript tests that take a combined 31 minutes to run.

Merge conflict issues when people were working on similar parts of the architecture.

Django migration conflicts when two or more people were building new migration files at the same time.

It was killing our productivity. The biggest complaint we had at the monthly team retrospectives was the inability to move fast. The team structure had started shifting from skill-focussed (backend, frontend), to feature focussed (Search team, Shopping team) to support this. We needed to change how we wrote software to match this shift too. Following one of our values (Best Idea Wins), we decided to do some research and figure out the best way to solve this problem.

This eventually brought us to where we are now:

Tooling that allows any developer to build and deploy their own service.

Templating that gives each service logging, performance monitoring, and continous integration support.

We have been able to deploy twelve services to production in the last six months using this new infrastructure. Let me share with you how we did this. Another big advantage is being able to have a much better representation of the production environment in our development environment, instead of a set up that was slightly different from production.

Buzzwords And New Tools

Of all the buzzwords in the software industry right now, microservices might be one of the few that actually means something.

Microservices vs a Monolith. AKA many little applications vs a single big application.

When we looked at solutions to our problem, microservices, or “service oriented architecture” was the obvious choice.

We knew about the argument of only using microservices if you really need to and we decided early on to make sure services were isolated applications that weren’t dependant on our main application database. Instead they would provide specific functionality, such as computing related products, or detecting duplicate images. Building Microservices by Sam Newman was really helpful in helping us make these decisions.

Around the same time we began our experiementation, a new tool called Empire was released. Empire provides a PaaS service on top of Amazon EC2 Container Service, with an API similar to Heroku’s. This was exactly what we were looking for as we already ran everything on EC2. Merges into the Master branch automatically trigger a new container build and a push of the container.

Empire helped us solve the the deployment and platform problem. Now we needed to make it easy for our engineers to start new projects and format them correctly.

Cookiecutter All The Things!

After deploying one or two services we quickly realised we needed to make sure the projects were consistent. This would make it easier for engineers to be productive without having to worry about setting up things like performance monitoring or error logging. It also meant engineers could switch teams and instantly know how a project was organised. The best way to do this is to provide a project template with everything set up already.

Cookiecutter is a tool for creating reusable project templates. A cookiecutter project directory looks like this:

Our services use HTTP APIs from Django REST Framework and follow our best practices in order to provide their functionality. So other engineers can continue to write Python and not worry about the HTTP layer, we provide a Client template that can integrate with the Service template. The Client template knows how to build itself and distribute a new Wheel to our internal PyPi. It also has mocking set up by default so we don’t have to make calls to the real service when we’re testing.

Once the Client wheel has been added to PyPi, we just need to add it to the requirements file of the project that wants to use the service:

# requirements.txt
...
my_service_client==1.1
...

A big advantage of the client template is that it allows us to handle graceful failure in the client instead of in the application using the client. If the client can’t communicate with the service for some reason, we can handle it and send back a default response.

Performance Monitoring and Error Logging

The service template already has the settings configured to handle performance monitoring with newrelic and error logging with Sentry. Once a service has been deployed we just have to set some environment variables and tell Sentry / Newrelic to listen for the new service.

How We Work Now

Empire and Cookiecutter have helped us to give each engineer in our team the tools to be productive and efficient at building new services. We are still learning a lot about how best to manage these services but we’ve seen positive things from it so far.

Search on Lyst runs on three separate services. Using the new workflow explained above, they were deployed to production in just a few months. Another service we’re building at the moment has gone live in the space of a few weeks.

The speed with which we can build new features and fix bugs is also staggering. The old monolith project took upwards of 30 minutes to run it’s entire test suite. Our new services can be shipped in about 10 minutes. That includes a full build on the pull request, and a full build on the master branch after it has been merged.

What We Want To Do Next

As I said before, we’re still learning a lot. Our process isn’t perfect yet but we’ve got to the point where we’re productive.

We’re now looking at ways to make it easier to provision tools like task queues, database, and elastic search for our services. Right now that’s still a manual process.

We’ve also found that our original templates became out of date fast. This meant that we had some inconsistency between the original services. Whilst that’s not always a problem we have had to fix the same bug across each service a few times. Solution: always make sure your templates are up to date!

On the product side, that is, teams building customer-facing components of our architecture, we’ve considered moving our data from the main codebase into it’s own service. This has always been seen as a hard task due to techincal debt. However someone has recently suggested something different: moving the Web UI components into their own service, keeping the core infrastructure as the original codebase. This would be much easier for us, but still troublesome. These type of decisions are some that you’re likely to encounter. For now, we’ll wait to cross that bridge until we come to it.