Dark Architecture: Upgrading Infrastructure With Agile Principles

There more than one way to teach your old dog some new agility tricks. Image: Stefano — etino/Flickr

The goal for any company is to grow their business, but growth requires constant evolution of every piece of your company, not the least of which is your technical infrastructure. Yet many companies are still using an archaic approach even though there is an easy solution.

Companies typically upgrade their infrastructure and system architecture for one of three reasons:

1) For scale (targeting 10x or 100x current capacity) 2) For performance (trying to remove a system bottleneck to process work faster)
3) For decoupling (for greater reliability, maintainability, and future scaling efforts)

These efforts historically are planned as large “forklift” upgrades. A three-month plan is developed and people begin working diligently on the new system, hoping to flip a switch three months later when they’re ready to go live.

Often what actually happens is that three months turns into six months (things inevitably take longer than expected), and the business is forced through a high risk “all or nothing” flag day exercise to migrate over to the new system. When the systems are cut over, 100% of the functionality is now serviced by the new system, and the legacy system is sent off to the farm to live out its final days.

All the while during the six months, the business had no flexibility for other efforts or changing of priorities. The forklift upgrade is an all or nothing proposition, and until the team finishes the effort and migrates over to the new system, there is no business value delivered.

There are specifically three things that you’d want to improve about this approach:

1) Deliver value sooner to our customers (we don’t want to have to wait for the whole new system to be completed and put live before we deliver any value) 2) Reduce the risk of failure on introduction to production (we want to avoid an “all or nothing” migration plan)
3) Offer flexibility to the business to switch priorities and at least have delivered the most critical value (if we need to switch gears, we want to know that we’ve solved and delivered the important solutions first)

The Solution: A Dark Architecture Approach

Dark Architecture is a way of thinking about, and technical approach to, solving the scale/performance/coupling problems, while enabling the business to succeed and maintaining the sanity of your staff. We do this by:

1) Prioritizing migration of “flows” through a system rather than components of a system 2) Running legacy and dark architectures in parallel
3) Sending system inputs to both systems, collecting two outputs, comparing values of outputs, but throwing one away

Here’s how things would play out applying a Dark Architecture approach:

Before touching code and systems, begin by prioritizing the “flows” of data through the system in order of pain, opportunity, business value, or whatever metric makes sense for your business.

Rather than speaking on component terms (e.g., swap the reporting database backend from MySQL for Cassandra), think in flow terms (e.g., rendering a graph of wildcard queries for customer X is taking 40 seconds to render, while all other graph types for this customer render perfectly quickly). This exercise will force you to hone scope to exactly where the pain is so you can focus on delivering the solution to this pain first and save others for later.

Once you have our priority flow through the system (let’s assume it’s 2% of the overall functionality), you’re not actually going to start building that functionality. Instead, build the scaffolding around it to allow two inputs and two outputs (comparing outputs, logging when they differ, but throwing one output away) to our overall system.

Practically speaking, this might be duplicating web service calls (one legacy, one new) or duplicating database interaction calls (one legacy, one new), and then comparing the return values and logging to a file or server or message bus when they’re different.

With the input/output scaffolding in place, you’re now ready to start writing functionality. Implement that most painful 2% flow of the system, and as soon as it’s ready, push it to the Dark Architecture in production. It will receive production input, yet you’ll be throwing away the output, but comparing the value to the legacy system output. If they differ, log it so you can inspect. You’ll be instrumenting the performance improvement your new system has, and gain operational experience in working with it.

Once you’re confident this new system works and delivers the desired performance or scalability improvement, switch which output gets thrown away, thereby realizing the value of the new system for solving the most painful 2% of the functionality, while still relying on the legacy system to service the remaining 98% of the functionality.

At this point, your team has successfully delivered to the customers a solution to the most painful 2% of system functionality in a fraction of the time it would take to re-implement 100% of the functionality.

Benefits?

1) Morale is high (what technologists don’t love seeing their work put to use?) 2) Customers are cheering (their pain is solved)
3) The business assumed little risk (the functionality was de-risked by running in production with one output thrown away)

Let’s assume there are a few more high priority flows to tackle, representing 20% of the overall system flows. Following a Dark Architecture approach, the business will soon find itself with a choice:

1) Continue upgrading flows of functionality until 100% has been migrated, or 2) Assess the remaining 80% of functionality against other business priorities

This is a powerful difference between the legacy approach to upgrading infrastructure and a Dark Architecture approach: the business now has a choice partway through the effort. Some circumstances may warrant completing 100% of the functionality migration, some circumstances may warrant shelving future migration all together and find operating two systems in parallel a perfectly reasonable solution (not ideal technically or operationally, but it’s business!), while some circumstances may warrant slowly migrating the remaining functionality as technical debt while also pursuing more pressing endeavors.

That opportunity for choice is a cornerstone of an agile process, and having it in our toolbox for evolving our systems has been pivotal for achieving our scale.

Would you consider using Dark Architecture?

As the CTO, Cory von Wallenstein leads technical strategy, innovation and development across IaaS offerings at Dyn.