Center stage: Best practices for staging environments

We’re talking about staging because no one talks about it. It’s mentioned in passing as the annoying sidekick to production. It’s the expected and completely necessary part of the deployment cycle barely touched by schools or internships. It’s considered such an obvious part of architecture that no one mentions it, no one details it, many people do it wrong—and some don’t do it at all.

When you have an idea burning a hole in your keyboard, you want to push to master and demo nonstop—but that can quickly get away from you. First you’re running off Janice’s laptop, then Sarah pushes it to AWS, and suddenly people are paying you to deploy buggy code with a Capistrano script. You want the pain to stop but you’re scared of the cure.

It’s all too easy to deprioritize investing in staging when there are features to ship and your rollbacks work, you know, most of the time. But if you want to run code on a mature and respected platform, you need a staging environment. It’s not about eating your tech vegetables; it’s about showing respect for your users’ money and time.

People tend to define staging in relation to production. “Staging is where you deploy code before you deploy to prod.” “Staging is like prod but without customers.” “Staging is prod lite.” Staging can be all of these things, but let’s clarify its intent.

Staging is where you validate the known-unknowns of your systems. These known-unknowns are the dependencies, interactions, and edge cases foreseeable by the humans in your company and the machines they tend. Staging is where you gain confidence in your systems by consensus.

Known-unknowns have been cited by security and intelligence professionals for decades as part of threat analysis processes. Known-unknowns are expected consequences that you can anticipate but not prove with past experience. An unknown-unknown, like how many users will hit your site today, is something that can only be validated in production.

You can’t replicate unknown-unknowns, such as user behavior and traffic loads, in staging. But you can replicate everything else. And that matters.

Why have a staging environment? It’s easy to brush this off by saying “best practices,” but I think it’s good to examine best practices from time to time and make sure they actually fulfill your needs.

Can we fulfill these needs in other ways? Perhaps! To dig into this, let’s address the main argument against using a staging environment: tests.

“You don’t need staging when you have good tests.” I’ve heard this from small startups and from companies that are nearly household names. These companies have two or two and a half environments: local development, an elaborate testing framework, and production. The testing framework is impressively robust and is fixed fairly quickly if it breaks. Passing builds are deployed to production with all the confidence green Jenkins jobs can buy. This process balances the edginess of the Cowboy Coder with the warm fuzzies of We Did All We Could.

But there’s an inherent problem with this model: It depends on the forethought of individual humans. Tests are good, but tests must be written. And who is writing them? Often, the person who wants to ship that section of code as soon as possible. Even if you write tests in pairs, that’s only two humans trying to account for every probable interaction of code in the wild. That’s not setting anyone up for success.

This model breaks down further when your product has a UI. People don’t write unit tests to ensure a sidebar is the proper shade of millennial pink. Mockups only go so far, and not everyone can make design meetings or stay focused in them. Running your UI in an environment where employees have to look at and interact with it smokes out issues from color mismatches to weird button behaviors.

Even without a UI, tests don’t account for all possibilities. Asking one or two humans to imagine the innumerable interactions of machines isn’t likely to produce good coverage. But an environment where those interactions exist and you can let the code run organically likely will. Enter staging environments.

This isn’t to say you should stop writing tests. Tests catch known-knowns, the step before the known-unknowns of staging. Tests save you time when building, reviewing, and deploying your code to staging. But tests are not a replacement for staging.

So, when should you build a staging environment? Ideally, before you have customers using your product, and it’s still a small, lightweight, and easily portable application. Realistically, however, you’ll probably build one about two months after the latest outage that made it to your board.

Start keeping track of your outages and what caused them. The number of outages linked to buggy code and thwarted deploys will likely both depress and inspire you. These are the numbers to bring to meetings where people with notebooks argue against your dreams. These numbers are how you’ll garner support to build a staging environment.

Once your organization recognizes the value of building a staging environment, the question becomes: Who should build it? Often the first instinct is to assign it to QA because it “makes sense,” but staging is a kissing cousin to production and should be constructed the same way. Have your infrastructure team create the platform and your application engineers fill it with their services.

This process is much more difficult when you’re dealing with a service-oriented architecture where teams have been given license to use whatever technologies they wish. Be prepared to discover that Team A uses DynamoDB against company best practices, and Team B uses custom Capistrano scripts because they think Jenkins is boring. You’ll want to take an inventory of the different technologies teams use, identify dominant players, phase out outliers, and make transitioning into a more homogenous pipeline a primary roadmap goal. This will take support from the top down, and you’ll want a cross-functional team to help you drive it. Though there will be some challenges, investing in a proper staging rollout will pay off in the end.

While you’re building staging, hash out a consistent deployment pipeline and runtime platform for your engineering organization. It might seem best to simply forklift production into a smaller space, but this means staging will be limited from the onset. Some teams’ work won’t transfer easily and they might get discouraged or complain about added friction in shipping. Take the time (and it will be a lot of time) to ferret out custom scripts and one-off tools and replace them with something consistent. Containers are a great solution, but there are other options as well. Capistrano is fine, as long as everyone is using it and knows how it works.

Another reason to force consistency with staging is that it eases disaster recovery plans down the line. If you run any sort of SaaS offering or sell your product through a site, disaster recovery will eventually rear its head as a requirement. If all your services deploy with the same deployment pipeline, to the point where teams can ship each other’s code if needed, you’ll be able to execute disaster recovery plans more quickly, and with fewer people and less confusion.

So build staging mindfully and make it match production as closely as possible. You should be shipping the same code between staging and production, using environment variables to switch between network endpoints and databases. Said network endpoints and databases should have the same configurations and schemas as production, only running at smaller scale with dummy data.

Use the same load balancers in staging and production, use the same security group settings, use the same deployment tooling. The very point of staging is to address the known-unknowns, and you can’t do that with conditions that don’t exist in prod. Last but certainly not least, ensure you’re sufficiently monitoring production for the metrics that matter to your engineers and your business. Then apply that same monitoring product, with the same depth of visibility, to staging.

Eventually people ask: “Can we fully replicate production traffic in staging?” The answer is no. A better question is: “Can we get close and should we try?” Like with many complex things, it depends.

Are you building a new backend for your heavily trafficked data intake pipeline and worried it won’t be as performant as your current one? Are you launching something shiny that you expect Reddit will hug to death while trying to post a thousand hot takes? When you’re shipping something big and new, there’s a good case for trying to replicate production-level traffic.

There are two main methods for this: Write your own dummy data and play it back at production-level load, or use some percentage of actual production data. There are pros and cons to each.

Dummy data is great because you don’t have to worry about accidentally storing customer data in insecure ways, and you can use it to test specific use cases you may be worried about. However, generating the data itself can lead to blind spots in your testing. For example, if you’re testing database performance, make sure you’re using multiple fake customer accounts, or you might end up writing to the same row over and over again instead of testing a true insert pattern.

Splitting off a percentage of production traffic produces far more organic results and means you don’t have to invent usage patterns yourself. However, you’re now storing production data in a non-production environment. This is scary, and might even violate security measures in customer contracts. You’ll need to make sure the data you’re saving doesn’t contain anything private about your users. This includes everything from passwords to IP addresses; generalized geolocation data is typically fine to help understand traffic patterns. Such restrictions mean this approach is mostly suited for understanding user flow on your frontend, but even then it might not be the right match for your needs. For example, if you’re rolling out a new UI, your production traffic won’t match the endpoints you’re trying to test.

Conducting load tests with production data can certainly inspire confidence. But if you anticipate doing large-scale rollouts multiple times a year, you’re probably better served doing A/B testing in production and using canary rollouts to test traffic behavior.

For all their advantages, if staging environments are built incorrectly or used for the wrong reasons, they can sometimes make products less stable and reliable.

A common argument against staging is that it adds unnecessary friction and time to the deployment pipeline. This is usually because the company underinvested in staging and went against the build-out patterns discussed above. If your staging environment is running on older, slower hardware (or smaller cloud instances), full of dead code paths, and being used for reasons other than as a pure staging environment, you’re going to have a bad time.

To help mitigate the potential for trouble, let’s go over some Staging Don’ts:

Don’t underprovision staging

Staging should be the same scale, proportionally, as production. The more your company grows, the more traffic and code paths will be in staging, and the more you’ll need to scale it to match. Staging environments always start out slick and end up stodgy. If this happened in production, you’d be running benchmarks and dedicating a sprint to improving performance. Staging should be no different.

This also means staging should run on the same hardware or cloud instances as production. Don’t run staging on out-of-warranty commodity boxes when production is on next-gen specs. And don’t run staging in the cloud when production is all bare metal. Not only will your performance estimates be off, but unexpected low-level interactions can surface if you’re, say, running a custom kernel. You want to catch these bugs before your customers/users do.

Don’t treat staging as precious

Staging is meant to be like production, but only in ways that benefit you. Staging is meant to be broken—which means it must in turn be easy to fix. How you do this depends on the size of your codebase and whether or not you have a service-oriented architecture. Consider having tooling that gives anyone the ability to roll back the last deploy, no matter what team it came from, so no one’s work is blocked by someone else’s bug. An engineer sees an issue, performs a rollback, and the deployment owner gets an email notifying them of said action. Friction is reduced.

Don’t become dog food

Dogfooding, the act of using your own product, is a popular term in the industry. But there’s a distinct difference between tasting your own dog food and putting yourself into the grinder. Dogfooding too much makes staging an emaciated extension of production and the entire system more brittle.

The worst cases of dogfooding tend to be in SaaS companies, so let’s be a bit more clear:

If you are a hosting platform, don’t host your production applications in staging. It may seem like a sensible idea at first, but it only takes one buggy code deploy to derail that notion. Host your production applications in production (or, better yet, someone else’s production) and host their staging counterparts in staging.

If you are a monitoring platform, don’t monitor production with staging. Again, doing so may make sense at first—but now you don’t have a staging environment. You have a monitoring environment that is crucial to the health of production. This means you can’t break staging, and you now have on-call rotations for non-production services. When you hear yourself saying, “Staging code freeze over the holiday weekend,” you’ll know it’s time to backpedal.

We’ve talked about staging environments, why you should have one, and how to avoid making common mistakes with yours. However, a staging environment is just one part of a healthy breakfast. Let’s briefly chat about my ideal deployment pipeline:

Dev local

Engineers can spin up their code on personal machines to see how it looks in their browser or how it interacts with dummy local databases. This is where the code is written and tested in one specific environment to give the code a head start on being trusted.

Integrated testing environment

Code auto-deploys here once a pull request is approved, and is hit with a battery of tests. These tests are designed to be evil, apply edge cases, and put the code through its paces. I lovingly refer to this stage as Thunderdome.

Staging

See everything written in this piece.

Partial production rollout

Roll out your code using canaries, feature flags, A/B tests, or whatever model you want. The point is to introduce code that can handle known-unknowns to the unknown-unknowns of real user behavior.

Production

Set your code free with the confidence that you did all you could. Make sure everyone can roll back quickly and safely.

Does this pipeline seem exhausting and tedious? Maybe—but it’s no more exhausting or tedious than trying to roll back bad code at 2 a.m. with your Most Expensive Customer on the phone, or filing yet another report for why that thing broke again.

Caring about your cashflow means caring about your users means caring about the stability of your platform. If you invest in a solid, sleek, and maintained staging environment, you are one very big step closer to making this a reality.