Introducing Firefly Part I - Two Scale Problems

Last summer, I took one of my sons camping at Buck's Pocket State Park here in Alabama. We hiked through some beautiful wooded terrain to an incredible overlook, rappelled down a 100-foot cliff, and spent the evening watching the sunset over a lake while keeping an eye out for bald eagles. The most spectacular sight of the trip, however, came after all of that when we headed back to our campsite, nestled in a canyon by a dry creek bed. Night had fully fallen deep in that gorge, and what greeted us as we descended was the most dazzling display of fireflies I have ever witnessed. The entire forest was ablaze with little twinkling lights like some kind of fairyland prom scene. It was breathtaking.

Fireflies scale beautifully. An individual firefly (or "lightning bug," as I grew up calling them here in the Southern U.S.) can create a complex chemical reaction in its abdomen allowing it to blink or glow with a yellowish-green light. They're easy to catch in your hand and inspect up close. One flashing firefly quickly attracts others, though. As dusk falls on a summer night, you might find yourself, as I did, amid a cacophony of bioluminescence as thousands of fireflies all communicate in parallel by blinking out little love messages to each other. Some species can even precisely synchronize their blinking.

As Adtran has transitioned into the world of Software Defined Networking (SDN) (see our last blog post), we quickly realized that we faced two separate but related scale problems.

The first was technical. Our previous network management system was built as a traditional enterprise server application, and it served us and our customers well for many years. It ran on a single server, with a "warm standby" for redundancy. As network sizes increase from tens of thousands to millions of network elements, however, we knew we would need to harness new technologies to keep up with the load. Instead of requiring even-more-powerful servers, we wanted to scale horizontally across large clusters of commodity servers. Instead of a single standby node, we wanted an all-active cluster with no single points of failure. As SDN and NFV push more and more intelligence into the cloud, we needed an architecture that could efficiently handle more requests and more data for our customers. That was one scale problem we had to address.

Our second scale problem was different, but no less intimidating: our development processes didn't scale. With our previous management system, we had armies of hardware and software engineers rapidly producing new products and features for our network elements. Meanwhile, we had just a handful of teams building the management system to support all of those features. The management system was maintained as a single monolithic codebase, with (sometimes) messy layers of legacy code half-migrated to newer systems, and a "build automation system" that was often bottlenecked by a single person who knew all the right incantations go from version control to a shippable product. We needed an architecture that could support simultaneous development by hundreds of developers across multiple countries, with shorter release schedules and higher quality than ever before.

To solve these two scale problems, we took inspiration from the web community–and from the those little blinky insects–and created a microservices-based software platform that we call Firefly.

In a microservices architecture, each component of the system is small and focused. Ours are maintained as individual repositories. A microservice can be developed, tested, deployed, and scaled independently of other components. We chose to network our microservices together by exchanging REST-like messages over a brokered messaging system. This architecture means that microservices can either be collocated on a server, or deployed across a whole cluster of servers. Multiple instances of the same microservice can also be deployed throughout the cluster, allowing us to load balance requests. Also, because microservices only interact by message passing, they are largely language independent. Our performance-critical microservices are built using reactive programming techniques in Scala, but we also fully support microservices written in Python. Additional language bindings can be easily added, and prototypes have already been built using Javascript, Ruby, and C++.

In addition to the microservices themselves, the Firefly architecture also includes:

The message broker to allow them to communicate

The Northbound Interface which serves as the API gateway for the system (really just another microservice)

An authentication, authorization, and accounting (AAA) service (again, just another microservice)

Defining an architecture using blocks and arrows is all well and fine, but that doesn't tell anyone how to actually build the thing. That's why Firefly isn't just an architecture, but a platform. A platform, to us, includes not just a set of principles and design decisions, but also the tools, processes, and people to make those decisions a reality. We created a Firefly platform team (I'm currently the product owner) whose main charter is to make Firefly development as fun and productive as possible. We have done lots of things to try and enable developers to focus on delivering value, without having to worry about all the "other stuff." For example, we provide new-project scaffolds for both Scala and Python so you don't have to start from a blank page. Out of the box, a scaffolded project makes it easy to write your code, manage dependencies, write and run tests (including unit, component, and integration tests), write and build documentation, build your artifacts, and deploy your project into a running Firefly instance, all right from the command line. We have a tool we call Kaylee (in reference to another kind of Firefly) that lets developers easily spin up a local instance of Firefly and deploy new microservices into it. We have a continuous delivery pipeline built around Jenkins that can take a code change from check-in, though numerous test phases, to packaging up a deployable product in about an hour. And finally, we have a team of folks who allocate an explicit percentage of their time to supporting other people as they use all of these tools–and contribute back to them!

The Firefly platform team is also responsible for most of the deployment and clustering aspects of the system. We use Docker to containerize each component of the platform. We have found that this brings a lot of consistency and determinism to our development pipeline. A local Firefly instance created by Kaylee, a test deployment in our CI pipeline, and a large-scale clustered deployment in production are all essentially the same thing: a bunch of orchestrated Docker containers. We have developed some really cool tools in-house to do Docker orchestration, but the ecosystem around Docker is rapidly evolving, so we are also keeping a close eye on many projects, including Kubernetes, Swarm, Rancher, and others. For us, that means trying things out for ourselves, so someone on the team is constantly saying "hey, come look what I just did!" It's an exciting place to be.

Many of these topics will become the subject of future blog posts, so check back often. We look forward to sharing more about the exciting things we're doing!

Update 10/31/16: Part II of this post, which dives in to the principles behind the Firefly platform, can be found here.