Tales from the field: Best practices for initial provisioning (Part 1)

Tales from the field: Best practices for initial provisioning (Part 1)

Working with the Cumulus Professional Services team, we get the privilege of seeing how many folks use and operationalize Cumulus Linux. Over time, we’ve learned many lessons and best practices that can benefit others who are getting started on the journey. It’s for that reason that we’re putting virtual pen to virtual paper and writing this post. This article is the first in a series of two that will discuss how to use Zero Touch Provisioning (ZTP) and automation tools together for maximum efficiency in your initial provisioning. This post is going to focus on ZTP while the next will focus on automation tooling.

Cumulus Linux: Batteries included

Some COntrol Plane Policing (COPP) rules that limit lots of different types of control traffic

SSHv2 enabled by default

NTP enabled by default

You’ll notice here that we’ve said nothing about interface configuration. Like all network switches and routers we start with a pretty blank slate from an interface configuration perspective. We leverage ZTP to give us some initial configuration for the node to use to get on the network.

What configuration is appropriate is going to vary depending on your environment. In all cases we need some kind of IP address that we can use to login and communicate with the node later on. For instance a management switch will probably need an ip address to be configured on an in-band VLAN (SVI) to be reachable on the network where as a switch in the dataplane might need to use DHCP to get an IP address on the management port to be reachable on the network.

With that in mind let’s talk about what should be included in your Zero Touch Provisioning (ZTP) and your initial turnup routines.

ZTP and initial turnup

ZTP within Cumulus is a fairly robust process and is supported by either USB or DHCP delivery mechanisms. The recommendations we make here are relevant to either delivery mechanism.

Concept: ZTP as a minimum viable product

We tend to see several different mindsets when it comes to ZTP. The couple of schools of thought can be summarized in the following categories:

ZTP is simple

ZTP covers only the bare minimum of services to get the node in communication with the automation station

While it is absolutely possible to write a complex “kitchen-sink” ZTP script we generally don’t recommend folks go down that path. For a number of reasons.

Hard to build and maintain

a. Few people on the team understand the operation of the complex ZTP script
b. Scripting skillsets tend to be hard to find on most teams
c. Debugging ZTP scripts can be difficult

2. Better tools exist for pushing configuration

a. When configurations change, rerunning an automation utility is typically easier than rerunning ZTP
b. Automation utilities have better abstractions for common tasks than writing code to do it all yourself
c. Automation utilities typically have MUCH MORE ROBUST error handling than any code written by you or I

“The right tool makes any job easy.” ~Eric’s Dad

When your ZTP script is so long that working with it becomes difficult, and debugging the script consumes hours of your time, these should be indicators that ZTP may not be the best tool for what you’re trying to do.

Corollary: Maintenance of long and complicated ZTP scripts tend to be dedicated to a single individual — the original creator of the script… which is bad for other reasons.

Automation Tools which perform configuration management like Ansible, Puppet, Chef or Salt exist for a reason. They’re good at what they do, and what they do is deploy and manage configuration, so why not use the right tool for the job? ZTP scripts are incredibly powerful but for most tasks using a higher-level abstraction from one of the configuration management tools tends to be easier to debug and maintain especially if the rest of the members on your team aren’t Linux pros. Learning these configuration management tools is also easier than ever before and we promise, with only a couple hours of your time you can do some pretty incredible things.

ZTP is meant to be run before the switch is in service which means the actions you take in your ZTP script often can’t be repeated on your node once it’s been deployed in production. While it is possible to write a script that can be executed again on a production node via `ztp -r http://somecomputer/somescript.sh` to do this will add significant complexity to your ZTP script. Automation utilities also handle this better with the idempotency (or the ability to re-run automation) that is typically built into their abstractions.

Contents of simple ZTP scripts
OK so you’ve bought-in to writing simple ZTP scripts and leaving the more specific configuration to an automation tool, so what should our ZTP script look like?

In the minimum viable product (MVP), ZTP should perform the following actions:

Deploy a license*

Deploy a minimal subset of interface configuration*

Prepare the switch for 2nd stage automation tools

For Agentless tools (Ansible) †

Create a user for automation tools to use

Install a Public SSH Key from the Automation Server

Initiate an API Call to Automation Tooling for Provisioning

For Agent Based Automation Tools (Puppet, Chef, Salt)

Install and configure the agent

* only if access to the automation server occurs in-band, over a frontpanel port† discussed in greater detail in the second article

Everything else that is NOT the items above should be deployed with an automation tool.

I’ve spent a lot of time debugging ZTP scripts which were hundreds of lines long and didn’t work as expected with less than helpful error output so consider saving you and your team some effort and leveraging simple ZTP. Take a peek at the end of this guide to see what we recommend.

User creation

We mentioned before that there are two user accounts on Cumulus switches by default. When using automation tooling we strongly recommend to create a third account that is dedicated for automation. This automation user should leverage local authentication via pre-shared public ssh-keys to minimize the ability for a password to be compromised and to lessen the “house of cards” failures that can occur in outage scenarios where requiring network services like TACACS or LDAP to be functional prevents the update of configuration on a network device. The usage of an automation user makes accountability and auditing of changes on the device significantly easier.

The ssh-keys can be installed directly by the ZTP scripting so that the automation station can immediately control a newly powered switch.

Concept: Human users — thanks but I’ll pass

We haven’t talked at all about administrator users and user accounts for humans at this point. That is on purpose, since we’re describing the ideal we should point out that automation is the best way to drive any network. If that’s not possible operationally consider adding setup of TACACS or LDAP or Radius as part of the configuration delivered by your automation tooling. You can also use the cumulus account as a local account of last resort once the password has been changed from the default.

You can see in the above example the entire script is devoted to creating a user for automation to use. The license, interface configuration and network services will all be configured with an automation tool.

Next steps

We’re going to build on this example in our next article so stay tuned! In the meantime, to see more examples of what we recommend for ZTP see our best practices section of our tech docs which has examples with error handling for some common tasks performed by ZTP.

If you’d like to chat with any of the folks involved with this article, feel free to join our public Slack at https://slack.cumulusnetworks.com and ping @eric or you can reach out to our Professional Services team for additional help designing your ideal initial provisioning setup, we’re always happy to help!

Share this blog post!

Eric Pulvino is a Senior Consulting Engineer on our Professional Services team. Before he became Cumulus Curious(TM) he worked for Cisco consulting on large service provider networks from various household names. Today he works with customers in all stages of the open networking pipeline from initial product training, on to architecture and design, as well as the deployment and operation phases. He is not sure if he loves Linux or Networking more but is happy to work at Cumulus Networks where he doesn't have to choose. When not on-the-clock, he is frequently annoying his family, writing all kinds of python-based home automation.