Decision at Dubsmash about Docker Compose, Docker, ContainerTools

On the backend side we started using Docker almost 2 years ago. Looking back, this was absolutely the right decision, as running things manually with so many services and so few engineers wouldn’t have been possible at all.

While in the beginning we used it mostly to ease-up local development, we have since started using it quickly to also run all of our CI & CD pipeline on top of it. This not only enabled us to speed things up drastically locally by using Docker Compose to spin up different services & dependencies and making sure they can talk to each other, but also made sure that we had reliable builds on our build infrastructure and could easily debug problems using the baked images in case anything should go wrong. Using Docker was a slight change in the beginning but we ultimately found that it forces you to think through how your services are composed and structured and thus improves the way you structure your systems.

For the unit-integration layer that tests transactional emails, we leverage Docker. Our incoming edge is when the upstream service is finished processing a message and hands it to us for delivery, and then our outgoing edge is actually communicating with someone's inbox. We don't actually want to set up a bunch of receiving MTAs and such, but we still need to test behavior at that layer. Our solution is still a work in progress, but it gets the lion's share of use cases covered so we can confidently refactor and push new features and know we did not break anything.

This Docker setup leverages DNSMasq for setting up MX and A records and ensures they point to running mock inbox sinks. These inboxes are configured from a base image with multiple options. We can specify that the sink's TLS certificate is expired or improperly set up, we can have them respond slowly or with given errors at different SMTP conversation parts. We can ensure that we are backing off and deferring email if the inbox provider says to do so. This detailed faking of the outside world allows us to automate all kinds of outside behavior and ensure that our services behave as expected.

We develop locally in Docker, as we just went into. Our docker-compose file spins up containers with fancy DNS settings and all our dependencies, allowing us to test the MTA against a variety of MX and TLS settings, alongside a variety of potential inbox responses and behaviors. Everyone uses their editor of choice and we often pair up on more complex tasks to prevent siloed system understanding.

We use Docker, Kubernetes, and Google Kubernetes Engine to make it easy to bootstrap resources for new Shopify pods.

Over the years, we moved from shards to the concept of "pods". A pod is a fully isolated instance of Shopify with its own datastores like MySQL, Redis, memcached. As we grew into hundreds of shards and pods, it became clear that we needed a solution to orchestrate those deployments.

CodeFactor being a #SAAS product, our goal was to run on a cloud-native infrastructure since day one. We wanted to stay product focused, rather than having to work on the infrastructure that supports the application. We needed a cloud-hosting provider that would be reliable, economical and most efficient for our product.

CodeFactor.io aims to provide an automated and frictionless code review service for software developers. That requires agility, instant provisioning, autoscaling, security, availability and compliance management features. We looked at the top three #IAAS providers that take up the majority of market share: Amazon's Amazon EC2 , Microsoft's Microsoft Azure, and Google Compute Engine.

AWS has been available since 2006 and has developed the most extensive services ant tools variety at a massive scale. Azure and GCP are about half the AWS age, but also satisfied our technical requirements.

It is worth noting that even though all three providers support Dockercontainerization services, GCP has the most robust offering due to their investments in Kubernetes. Also, if you are a Microsoft shop, and develop in .NET - Visual Studio Azure shines at integration there and all your existing .NET code works seamlessly on Azure. All three providers have serverless computing offerings (AWS Lambda, Azure Functions, and Google Cloud Functions). Additionally, all three providers have machine learning tools, but GCP appears to be the most developer-friendly, intuitive and complete when it comes to #Machinelearning and #AI.

The prices between providers are competitive across the board. For our requirements, AWS would have been the most expensive, GCP the least expensive and Azure was in the middle. Plus, if you #Autoscale frequently with large deltas, note that Azure and GCP have per minute billing, where AWS bills you per hour. We also applied for the #Startup programs with all three providers, and this is where Azure shined. While AWS and GCP for startups would have covered us for about one year of infrastructure costs, Azure Sponsorship would cover about two years of CodeFactor's hosting costs. Moreover, Azure Team was terrific - I felt that they wanted to work with us where for AWS and GCP we were just another startup.

In summary, we were leaning towards GCP. GCP's advantages in containerization, automation toolset, #Devops mindset, and pricing were the driving factors there. Nevertheless, we could not say no to Azure's financial incentives and a strong sense of partnership and support throughout the process.

Bottom line is, IAAS offerings with AWS, Azure, and GCP are evolving fast. At CodeFactor, we aim to be platform agnostic where it is practical and retain the flexibility to cherry-pick the best products across providers.

We used CircleCI in conjunction with GitHub to achieve an integrated version control system continuous integration setup. CircleCI automatically runs our builds in a clean Docker container or virtual machine on every commit allowing us to stay on stop of any regressions as they arise. Additionally the notification system keeps our team up to date when issues do arise so we can get them fixed quickly. It even integrates with Slack to further reduce the friction in staying up to date with the status of our builds. With the automated deployment system once a build passes we can have it automatically deployed to our production environment so we can make sure our users always have the latest and greatest features.

Decision at Portainer about Docker, Go

Go was a natural choice for the backend of the Portainer web application. It makes the creation of HTTP API/services a breeze with a lot of standard features available in the ecosystem.

One of the main thing we like with Go is its synergy with Docker and how easy it is to leverage this synergy to easily distribute an efficient software:

Go allows to compile a program for multiple platforms and OSes easily (it's just a matter of options when starting the compilation process, no matter the execution context)

Go binaries are lightweight, fast and can have a low memory footprint

Combining these points with the empty scratch Docker image and multi-platform images, we can distribute Portainer for any environment that is running Docker. It allows our users to get started using the software in a matter of seconds.

Go is also heavily geared toward the creation of HTTP/API services and is a language that is easy to read and also quite easy to learn, making it a first choice in the context of Portainer.

In addition to our fancy Docker setup, we have captured and sanitized production logs for the behavior of our legacy Perl MTA, and we can test that the log output from the new Go version behaves the same way as the old version. These tests are set up to allow us to switch between the legacy and new version of the MTA and ensure that both systems behave in a legacy-compatible way. Not only can we ensure that we operate against a variety of issues we've seen over time from inboxes, but we know that the newest version of our MTA continues to cover all the same expected behaviors of the legacy version.
#CodeCollaborationVersionControl#ContinuousIntegration

We use CircleCI for #ContinuousIntegration. Workflows are configured via a simple yaml file and run inside isolated Docker containers. CircleCI runs ESLint, Brakeman, and RuboCop to enforce code quality and security best practices. It integrates with GitHub and Slack to notify us of build progress and pass/failure statuses.

Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).

It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up or vagrant reload we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.

I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.

We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.

If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.

The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it:
1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around.
2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management.
3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally.
4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).

Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.