December 20, 2015

We’ve read a lot of great ideas so far. However one crucial piece is missing in these conversations - the network!

Automation has moved from a new idea to something we take for granted in the new DevOps paradigm. For too long the network has been left behind from this awesome paradise and has been left to suffer in the dark days of manual configurations. Some engineers think that this is because network engineers simply don’t care about modernizing their ways, but this is not true for many! The truth is that tools to interact with switches and routers are still in their infancy. Cisco’s OnePK, which created an API for Cisco switches and routers, was only released to the general public in early 2014! CFEngine began in 1993! The systems world has had over 20 years of a head start over the network world! There has been much debate over why this lag has occurred, but I am not going to jump into the middle of that. Today I am going to tell you how to bridge this gap.

Collaboration

The fact that network teams deal with the world in a different manner than systems teams is not only a technological problem. These differences prevent easy collaboration. Systems and network teams are often divided with ticket walls in between silos. In many companies this causes friction or even hostility in between teams, instead of promoting empathy and collaboration. As an example, take new system setup. The network team has to configure ports and assign vlans. The systems team has to turn up the machine and get it running in its proper roles. In a better world, one team could just send a pull request over to the other and it would be a quick review. In a normal company, a ticket is opened. The network team has more urgent and interesting tasks to do than port configuration. The systems team is waiting, feeling helpless and frustrated. Resentment builds up instead of harmony and cooperation!

Software

Rancid has been the one and only way to deal with gathering network configurations. It’s not a bad tool - it works with almost everything. The problem is that Rancid deals with the world through the lens of an older paradigm. Rancid logs into network gear with passwords stored in plain text and then uses expect scripts to basically screen scrape data and push it into a CVS or SVN repository.

The network teams do not have to reinvent the wheel and can use the same software that the systems teams use. The popular configuration tools, like Chef, Puppet, Ansible, and Salt are starting to interact with switches and routers. The bad news is that these tools often only configure a subset of the possible configuration options and only work on a few models of switches. While we wait for all of our gear to be easily supported, the network world needs to take some intermediate steps to move towards the standard future.

The DevOps movement has evolved the systems world to one where infrastructure is defined by code, with all of the benefits involved. The network world needs to join up, with NetDevOps.

Templatize configurations

Many switches and routers still don’t support automation tools. As well, most switches share the same basic configurations, with ports and IP addresses as changing variables. We can use our automation tools and templates to automatically output configurations.

Tear down the ticket wall

Now that your configurations are code, you can use git pull requests for all of those little things you used to ticket. Breaking down the ticket wall in between team silos improves everyone’s productivity and empathy. Spotify calls this an internal open source model and uses it to help make their company so successful.

Code Reviews

Now that all changes are via pull requests, you can use tools to enforce that requirement and code reviews by a second person. If you are not using version control software to centralize and change your configurations, code reviews are only possible by someone looking over another person’s shoulder before they hit enter. Continuous integration tools like Jenkins and Travis CI aren’t just for code. You can write tests to check syntax or for spelling errors! Typos have brought down most major websites at some point in time (http://www.cnet.com/news/widespread-google-outages-rattle-users/). Computers are so much better at catching these mistakes than humans.

Testing

Virtualization and test environments aren’t just for servers any more. And you don’t need to buy an entire second set of hardware just to test out your changes. GNS3 is a great, open source tool that has VM support for most major vendor platforms. You can make a model of your network and test out that BGP change before your 3am maintenance window. No more “I’m pretty sure this will work in production!”

The future starts with you

If you want your networking equipment to support automation clients directly, the best thing you can do is to pressure your sales people. Until about a month ago I was working for a vendor and a large amount of feature prioritization is based on customer demand and feedback (these are for-profit corporations, after all!). We’re not in the old fashioned world where Cisco can dictate how your network is run. There are so many choices nowadays. If we demand that the network vendors help us to run the networks just like we run our servers, they will listen because we can vote with our wallets.

Hopefully soon, we’ll tell the stories of how we used to login to a router via telnet and type commands directly on the command line like a ghost story, to scare the new junior admins around the campfire!