5 Network automation tips and tricks for NetOps

Despite what some people say, automation is not for the lazy. This opinion probably comes from the fact that the whole point of automation is to reduce repetitive tasks and make your life easier. Indeed automation can do just that, as well as giving you back hours each week for other tasks.

But getting your automation off the ground to begin with can be a challenge. It’s not as if you just decide, “Hey, we’re going to automate our network now!” and then you follow a foolproof, well-defined process to implement network automation across the board. You have to make many decisions that require long discussions, and necessitate ambitious and careful thinking about how you’re going to automate.

Just as with anything else in the IT world, there are no one-size-fits-all solutions, and no “best practices” that apply to every situation. But there are some common principles and crucial decision points that do apply to all automation endeavors.

In this post, I’ll give you five network automation tips and tricks to get clarity around your automation decisions and reduce any friction that may be inhibiting (further) adoption of network automation.

1. Choose whether you want flexibility or simplicity

Automating your network requires treating your network as code. It’s literally programming your network, and when it comes to programming, there are several ways to accomplish the same objective. Unlike a traditional CLI, where there may be at most two ways to enable OSPF on an interface, there may be six or more ways with automation.

Think of flexibility and simplicity as sitting at opposite ends of a spectrum. At the simplicity end of the spectrum, you can automate a task in a way that’s quick and simple, but not very scalable. At the flexibility end of the spectrum, you can automate a task in a way that’s initially difficult and requires a lot of careful thought and testing, but is massively scalable. Whether you go for flexibility or simplicity depends on how comfortable you are with automation and programming in general.

Simplicity

If you’re just starting out, sticking close to “the way we’ve always done things” will be easier. You can take existing CLI commands and port them with few changes into your automation infrastructure. Take the following CLI configuration, for example:

router ospf 1
network 0.0.0.0 area 0.0.0.0

If you’re using Ansible for automation, you could port the preceding configuration to declarative code like this:

nclu:
commands:
- add ospf network 0.0.0.0 area 0.0.0.0

There’s almost no difference, and it’s clear what the code does, even to someone who isn’t familiar with Ansible. I call this the copy-and-paste approach. You can take lines from your existing configuration and paste it into a template to create declarative code. It doesn’t get much simpler than this!

At first, this approach may seem pointless. Why bother with automation at all if you’re just going to essentially port a CLI configuration to a different format? This doesn’t take advantage of the many powerful benefits of programming languages, such as iterative loops and conditionals. Even in this case, a big benefit of automation is improved stability by enforcing predictable (and working) network configurations. If someone surreptitiously makes an ad-hoc change on one switch and brings down part of the network, it can be difficult to determine both what changed and where. But with automation, you can push a button and safely put everything back to the way it was.

However, there’s another consideration. You must consider who’s going to be looking at the code. In a small environment, there’s a good chance a non-network person will need to analyze the code to diagnose a problem (often when the only full-time network person is out to lunch). Simple, straightforward code makes it easier for them to understand what’s going on.

However, the simple approach isn’t all roses. If you build your automation by copying and pasting, you have a lot of duplication, and this doesn’t scale. For example, suppose you have a few switches, each with its own unique configuration. You can create a single Ansible Playbook that contains what amounts to a verbatim copy of each of these configurations, and this works fine.

But this gets unwieldy when you want to add more switches or make a sweeping network-wide change, such as switching your IGP from OSPF to EIGRP, or implementing BGP for the first time. As your network grows and changes, you’ll end up having to refactor your code, which means potentially breaking things that work today. This is risky and requires you to go and retest everything.

Flexibility

In a larger environment where you may have a full-time network team, flexibility is more important. It’s also more complicated. You have to think less like a network engineer and more like a programmer. That means separating the logic of your code from the data that’s unique to each individual device. All automation platforms do this natively, although again, there are many ways to go about it. Regardless of the platform you use, you generally break your code across multiple files.

One file contains a list of devices, usually grouped by role. For example, you may have all of your spine switches in one group and all of your leaf switches in another. Then you may have a supergroup containing both the spine and leaf groups.

Another file contains variables that are unique per device or device groups. Some examples include IP addresses, ASNs, network statements, access lists, prefix lists, SNMP settings and so on.

Finally, another file contains the logic and the configuration directives. Unlike a traditional CLI configuration that may contain repetitive commands (like ip prefix-list), this file can contain iterative logic to loop through one section of code many times. This makes it less obvious what the code does, but the tradeoff is that it’s much more scalable.

Flexibility and simplicity?

Can you settle somewhere between flexibility and simplicity, perhaps enjoying a little bit of both? The short answer is no. It’s not that you can’t, but it’s not a good idea. Although combining the ease of copy-and-paste with powerful programming logic gives you the best of both worlds, it also gives you the worst. It becomes much more difficult to understand and predict how your automation platform will actually configure your devices. Will those copied-and-pasted BGP configuration statements that apply just to spine01 get overwritten by that loop that applies to all switches? Mixing and matching approaches is more trouble than it’s worth.

2. Build one-offs into your automation

One of the biggest barriers to network automation is the inevitable presence of ad-hoc or one-off configurations. You know, things like that one access list entry on that one switch that someone put there to satisfy an IT auditor way back when. Rather than trying to eliminate these, embrace them and make them a part of your automation solution. Adopt the mindset that if it’s not in the automation code, it doesn’t exist in the running configuration.

Going to the trouble of automating a single statement on one device does take time and effort that may seem wasted; but it’s actually quite the opposite. Failing to ultimately adopt the one-offs into your automation family will inevitably result in a broken network. You’ll eventually encounter a situation where either the automation platform has overwritten your one-off, or your one-off has created a conflict with some new configuration you pushed out via automation.

Such an ugly event has to happen only once before management declares that automation is off the table, and that all changes must be done manually (after going through rigorous change control, of course). Investing extra time up front to automate your one-offs is far preferable to continuing to do everything manually.

Note that this does not mean that you have to automate everything right away. It’s wise to start by automating small, simple tasks. But don’t stop there. The goal is to get everything under the automation umbrella and do away with manual configurations once and for all!

3. Use a single automation platform

There’s no getting around it. Automation requires treating infrastructure as code, and every automation platform has its own chosen language. Ansible uses Python, while Puppet and Chef use Ruby. Therefore, it’s important that when choosing your platform, everyone who will use the platform agrees on a common language.

This doesn’t mean that everyone (or anyone) has to know the language starting out. You and others may have to learn it, but the important thing is picking one automation platform and running with it.

If you have developers or a DevOps team that already uses automation, ask them for recommendations. Find out how they’re using it. If they’re automating only a handful of servers using ad-hoc configurations, they may not be in the best position to advise you on how to automate the network.

Also, be cautious about choosing a platform just because it’s someone’s favorite. The people who are going to use it must like it. I call this the ice cream test. The developers may all like vanilla, but if you and your colleagues prefer chocolate, you should choose chocolate, even if the developers have some technical arguments against it. At the end of the day, if you don’t like the automation platform, you’re not going to use it.

Realistically, if you’re on a network team, you might not necessarily see eye-to-eye with your colleagues when it comes to favorite programming languages or automation platforms. But you must decide on a single platform for automating your network.

4. Use version control

All automated device configurations should be kept in a centralized repository using a version control system such as Git. This has a couple of advantages.

First, the repository is the authoritative source for all configurations. Although it takes a while to get to this point, ultimately the goal is that if the configuration is not in the repository, it doesn’t exist on any device. This is the ideal and not a rule because the reality is that, if you’re going to introduce automation bit-by-bit, what’s in your repo will be only a small portion of the actual device configurations.

The other advantage is that version control lets you keep a record of changes so you can roll back easily. If you add one too many spaces (something Ansible is not forgiving of) or inadvertently delete a line of code, a version control system can tell you exactly what changed. Better yet, correcting the change doesn’t require manually fixing the code. You simply revert to a previous version, and everything is back to the way it was before the mistake.

The key to effective version control is to track all of your changes, even the little ones. If you make a change and suddenly the network gets slow, version control can help you prove that your change wasn’t the culprit. But there’s one thing that version control can’t track: the state of the network.

5. Validate and monitor your network using Cumulus NetQ

Regardless of where you are on your automation journey, it’s a smart idea to make sure NetQ is up and running early on.

Whereas version control tracks changes to your network configurations, NetQ tracks changes to the state of the network itself. In other words, NetQ can tell you when the state of the network has changed and why it changed.

Even if your network is only partially automated, NetQ can still track every state change – even the manual ones. This eliminates the blind spot left by partial automation. NetQ can also help you validate that your changes had the expected effect.

For example, if you implement BGP across your network for the first time, NetQ can give you an instant, real-time view of the BGP status of every device. This saves you the time and trouble of logging into and checking each device manually.

Get more hours back

Implementing automation is a manual process that requires careful thought and planning. It’s not just a matter of choosing Ansible or Puppet and learning it. There’s a learning curve, but it’s well worth it. If done correctly, you’ll end up with a more stable and predictable network. And in the long run, you’ll get back hours that you can use to devote to other things. After all, the whole point of automation is to let a machine do the work so you don’t have to!

Want to know even more about leveraging automation in your network? Good news — we’ve got just the videos for you! Watch our how-to video series on automation and follow along with our networking experts as they take you through the steps.

Share this blog post!

Ben Piper is an IT consultant and the author of "Learn Cisco Network Administration in a Month of Lunches" from Manning Publications. His certifications include AWS Certified Solutions Architect, Cisco CCNA/CCNP, and Citrix CCA. He’s a Pluralsight author, and he blogs at benpiper.com.