Recently, this amazing video came across my twitter feed. It shows the cutover process from one switching technology to another at an AT&T central office back in 1984. These cutovers were a delicate process because they involved a service outage during the transition, including emergency service. By the time this video was made, Western Electric had honed the process to a science and could complete a cutover in well under a minute. Seriously, you should watch the video. It involves actual giant wire cutters. And mustaches.

Of course, it was in that same year that the federally-mandated breakup of Bell Systems went into effect, in which AT&T was forced to divest itself of the regional carriers. Previously, Western Electric had been the sole supplier of all components of the public telephone network—from long distance to the central office to the wiring and telephones in your own home. As a consequence of the 1984 breakup, the regional carriers were no longer required to purchase all of their equipment from Western Electric. Incidentally, ADTRAN was founded in 1985.

These days, at ADTRAN, we have a different kind of cutover problem. We use Scala quite heavily in our Mosaic Cloud Platform product. With Scala, updates from one major version (they use an <epoch>.<major>.<minor> versioning scheme) to the next are source compatible, but are not binary compatible. That means that Scala libraries must be compiled and published separately for each major version of Scala that they wish to support. The convention is to indicate which version of Scala a library is compiled against by appending a suffix to the artifact name (for example, my-library_2.11 versus my-library_2.12).

Building complex systems is, by definition, hard. The more components, technologies, developers, product managers, and whiteboards that are involved in designing a system, the more likely it becomes that a project will struggle or fail to meet requirements, deadlines, and budgets. Dealing with this complexity is the practice of software architecture. A good architecture encapsulates complexity, facilitates collaboration, enables evolution, and supports a sustainable development cycle.

To design and develop software that fits into modern products and pipelines, development areas at ADTRAN are a buzz of activity. Engineers use a suite of hard and soft skills to collaborate on software. This involves verbal communication, white board design, collaborative and individual coding, troubleshooting and organization. Activities require group and individual work, usually planned out at the daily group stand-up:

The Matrix. Mr. Robot. That 90's movie with Angelina Jolie. When we talk to most people about hacking, their first thoughts go to cinema or television, where they envision computer wizards doing nefarious deeds on computer systems. So when I say we do a Hackathon at work, the first reaction is typically along the lines of: "Huh, why are you trying to break into stuff? I thought hacking was illegal." However, when we talk about hacking, we're not talking about trying to break into computer systems or trying to do anything illegal. Instead, we use the verb hacking similar to how we use the word creating. "Hey, I hacked up a quick prototype to test this out." "Come check out this cool tool that I hacked together." A Hackathon is simply an event in which we get to try out something we've never tried before, to mess around with new technologies and build something new.

Developers at ADTRAN cover a lot of ground, from developing embedded networking devices to developing microservices that run in datacenters.

ADTRAN Mosaic Cloud Platform (Mosaic CP) is a product that gives users a bird’s-eye view of their network and is where all of these development layers come together. A typical high-level integration test for Mosaic CP pulls together components from across the org: a collection of Mosaic CP microservices and a mix of real hardware devices and simulated hardware devices. Some tests also incorporate virtualized ADTRAN products, where embedded product code has been cross-compiled to run on commodity x86 hosts.

Pulling these pieces together for integration testing, while testing them with the various hypervisors that our customers prefer to use, is a fun orchestration challenge.

Today I want to talk about a service we've developed in-house to make creating these types of integration test environments convenient for everyone inside the org: testers, developers, sales people, etc.

We call the system TestBed as a Service (TBaaS). Nathan, a developer here, touched on TBaaS in a previous blog post. Today I will share some of our design decisions and motivations for building TBaaS, and discuss how TBaaS is used within ADTRAN.

These represent the full extent of the family rules in the Alderson household. With two adults, two teenagers, and two youngsters living in one house, conflict is inevitable. However, rather than attempting to enumerate the infinite list of behaviors we do and don't want from our children (and ourselves!), we instead choose to focus on a few key ideas and let the rest flow from there. Brushing your teeth and doing your homework? That falls under being responsible. Slamming doors? Let's try that again with respect. Pillow fight? Have fun (but no hurts)! The point is, these four rules aren't really rules at all–they're more like guiding principles.

In a previous post, I introduced Firefly as our microservices cloud platform. I described how we faced two separate scale problems, and those drove us to certain architectural decisions. I also described some technology choices we had made. Underlying all of these decisions, however, are a set of principles and thought processes, along with a whole lot of study and prototyping. When we began the Firefly project, Jonathan and I felt that it was important to capture these thoughts explicitly. Since then, we have continued to review them periodically, and we have indeed found them valuable in steering our ongoing design processes.

The Problem

One of the requirements in the projects we are working on is to handle incoming requests in an asynchronous way. So Scala Futures are all around. Since Futures have map and flatMap function implemented they can be handled in an elegant way using For-Comprehensions.

During World War II, with the advent of Blitzkrieg-style warfare, it suddenly became clear that entire divisions of armored vehicles needed to be able to advance quickly over varied and unpredictable terrain without losing speed. One common challenge was the need to cross streams and other obstacles without having to concentrate forces at vulnerable fords or bridges. This led to the introduction of armored vehicle-launched bridges (AVLBs). Rather than building a new bridge, these vehicles were capable of deploying improvised bridges quickly and on short notice.1 This AVLB from 1960 is capable of deploying a bridge spanning up to sixty feet in just three minutes. Once the bridge was laid, tanks, jeeps, troop carriers, and all manner of support vehicles could cross rapidly. Finally, the bridge layer itself could cross and retrieve the bridge, ready to deploy it again at the next obstacle.

Have you ever experienced the frustration of trying to use an unfamiliar microwave oven? I certainly have. All I want to do is warm up my lunch, but I end up having to study the text on several of the buttons to figure out how to make the machine go. Sometimes I even have to start pushing buttons to try to discern which combination will make it start cooking my food.

Why is this so hard? Microwave ovens have been around for decades. Everybody has at least a basic understanding of how they work – the power comes on and your food cooks. The longer the power stays on, the hotter your food gets. Pretty simple. So how does a competent engineer with a master’s degree like me have trouble getting my food to cook? It’s all about the interface. Every model of microwave oven seems to come with its own user interface to access all the whiz-bang features. But if you’re not familiar with the interface, it can be hard to even warm up your lunch.

In software development, it’s very important to have a good Application Programming Interface (API). This API defines how to communicate with your software. It tells other application programmers how to interface with your software. Whether it is a function prototype, a data structure format or a remote procedure call definition, this API defines how to interact with your software. It’s like all the buttons on the microwave oven – it’s what you have to do to make your software "go".

Several years ago we were faced with a dilemma. The industry was changing and we could see it coming. Terms like SDN and NFV were starting to appear on the conference circuit. Carriers were becoming tired of vertically integrated monolithic devices and software stacks. Vendor lock in was becoming an issue at the forefront of everyone’s mind. A migration to more open source technologies was becoming evident both within our own company and those around us. These were the tidings of change. There was another real, very simple reality. Few, if any, knew exactly what was about to happen. What would our users expect from these new systems? How will we make them perform well? How are we going to deal with these new products that we haven’t even seen yet?

There are no simple answers here. In true engineer fashion, we decided the scientific method was best. We would hypothesize a solution and test that hypothesis with our users. In the software world this means more high-quality releases so that we can garner feedback on faster timelines. We couldn't wait months to get feedback on the functionality of our product - the industry was (and still is) simply moving too fast for timelines of that scale to be tenable. We needed feedback cycle timelines on the scale of days or weeks if we wanted to succeed.

Cassandra is a powerful NoSQL database that can be easily scaled. This makes a large distributed Cassandra cluster highly fault-tolerant. Depending on the size of the cluster, Cassandra can survive the failure of one or more nodes without any interruption in service. It then may not be obvious why backups are even needed. There is, of course, the very unlikely catastrophic failure that will require you to rebuild your entire cluster. More likely though, data can become corrupt. In either case it would be useful to roll back the cluster to a known good state.

Cassandra provides a useful command line tool for creating snapshots of the data called nodetool. Nodetool has many other uses, but for this post we'll look specifically at the snapshot command. The documentation for snapshot can be found on the Datastax Website. As a quick overview, nodetool snapshot flushes all data in memory to the disk. The data is then stored in a snapshot directory alongside the existing data files. You can provide a tag for the snapshot using the -t flag or the snapshot tool will tag it with a timestamp. The process for restoring a node is a mostly manual procedure that requires you to delete all commit logs and data. You must then copy all of the data from a snapshot into the data directories. More detailed instructions can be found here.