Salt use cases for Cumulus Linux

A talking point I often lean on when speaking to customers is, “It’s Linux, so use whatever tool you like.” This approach can be especially paralyzing for customers that are just getting started with automating their network and compute infrastructure in a uniform way. In those particular situations, diving into the numerous articles that pit the various automation tools against each other can be counterproductive. Instead, I often find the most value in looking at a few examples of a particular tool in action that is addressing a use case which is relevant to me, while following along hands-on.

Salt frequently comes up as one of the options in the infrastructure configuration management conversation, however its main differentiator is the message bus architecture and the ability to react to events in real time. While that sounds a bit abstract, the main question we should be asking ourselves is how will this simplify the day to day management of my infrastructure? In this post, we’ll step through getting the configuration on a couple Cumulus switches under full management with Salt, and end with a practical event-based workflow for adding and replacing devices in our infrastructure.

Configuration Management

In a previous article, we’ve touched on the basic Salt terminology and showed how easy it is to install the Salt minion agents on Cumulus and get them communicating to the Salt master. Once the master is successfully communicating to the minions, we’re ready to focus and how we’ll be managing the configuration of our two switches. Our end goal is to have the Salt master populate both the /etc/network/interfaces and /etc/frr/frr.conf files on our end devices. While its certainly possible to have static local copies of each switch’s configuration stored on the Salt master, the more dynamic approach will be to templatize these configurations with Jinja2, so that we can manage them in a consistent and reusable manner.

For starters, lets edit the /etc/salt/master file and tell our Salt master where the root Salt directory will reside, as well as the pillar root directory.

We will be placing our static device attributes in the pillar directory (represented as YAML) so that we can later use this information to dynamically build out our switch configurations. Let’s go ahead and configure these attributes for the first switch in /srv/pillar/leaf1.sls

Let’s also specify that all of the minions (our two switches) should have access to this pillar data, via the top pillar file in /srv/pillar/top.sls.

Now that we have the device specific attributes defined, we’re ready to build out our templatized configuration files. Let’s look at the interfaces file template, that we’ll be defining in /srv/salt/interfaces.jinja. As you can see, this is a standard linux interfaces file with variables used in place of the static per-device values. This allows us to re-use this same template file for all of our leaf devices, while only specifying device specific values in the pillar directory.

The same logic will apply to the routing configuration template, specified in /srv/salt/frr.jinja

Now we’re ready to put it all together in our main state file that we’ll use the push the configuration to our devices, /srv/salt/main.sls

As you can see, there’s quite a few things going on here, so let’s step through them. The first two sections specify that we’d like Salt to manage both the /etc/network/interfaces and /etc/frr/frr.conf files on our minions. We’re dynamically generating these files from the respective Jinja2 files in the root directory (/srv/salt) and specifying that the individual devices should grab their pillar specific configuration from a file matching their hostname. Further down, we’re reloading the interfaces file on the devices for configuration to take effect, as well as starting the frr routing process.

Just like we previously did in the pillar directory, we’ll need to reference our main.sls file in our top level state file, /srv/salt/top.sls

We’re now ready to push state to our devices:

Once the state is pushed, we no longer need to rely on manual switch by switch configurations and can now leverage the pillar and template files we’ve created as a source of truth, for both initial configuration provisioning and ongoing operations.

Event Driven Workflow

When treating your switches in a commoditized and fully automated fashion, ZTP becomes a critical part of your workflow when adding or replacing devices. While it might initially seem like a good idea to delegate as much decision making as possible to this process, the best approach is to let ZTP do the bare minimum, before registering the device with your orchestration system. In our case, we’re going to look at a common real world use case where an existing switch needs to be replaced. The value of using the same provisioning process for adding devices onto the network as the one used to replace existing devices (while constantly testing it), cannot be overstated. The end goal will be to let the replacement switch boot up onto the network via ZTP and re-register itself with Salt.

For starters lets configure our ZTP bash script, which as stated before, will do the bare minimum required until handing operations over to Salt.

Starting at the top, we’re going to pull the SSH public key from the management server, to allow credential-less login. Even though Salt itself doesn’t rely on SSH, I’ll explain why this access will be necessary for our workflow shortly. Next, we’re going to grab a Cumulus license from our management server and restart the switchd process for the license to take effect. We’ll then install the salt-minion package and register our new switch with the master. This particular ZTP script will reside on a centralized management server (our Salt master in this case) and will be provided during the initial boot process by the ZTP option handed down to the switch by DHCP.

Since we’ll be replacing an existing device, when the new switch attempts to register with the Salt master, it will fail due to a key mismatch. To address this, we’ll need to delete the old Salt key that the master has for the minion and automatically accept the new one. For this event-based logic, we’ll be using a Salt reactor and building upon an example mentioned in the reactor documentation. To start, lets configure two reactor state files. The first will be /srv/reactor/auth-pending.sls, a workflow that is triggered by a failure to authenticate with the master.

Starting up top, if a minion whose hostname starts with leaf attempts to authenticate to the master and the master already has a key for that minion, we will delete that key. In addition, we’ll SSH to that minion (via the SSH keys we previously setup in the ZTP script) and restart the minion process. Finally, if a new or replacement switch whose hostname starts with leaf, sends a new key to the master, we will accept it.

The second reactor state file (/srv/reactor/auth-complete.sls) will ensure that once a minion properly authenticates to the master, it will automatically grab its configuration.

The final piece of this workflow will be our reactor configuration file, /etc/salt/master.d/reactor.conf

This configuration will indicate that if a new minion attempts to authenticate with the master, we’ll run our auth-pending state file, while the auth-complete state file will run after a minion starting with the leaf hostname has successfully authenticated.

As we can see, the existing Salt key was deleted when the replacement switch attempted to authenticate, after which a new key was accepted and config state was successfully pushed. Looking at the event bus in real time becomes crucial when troubleshooting or even verifying these multi-step processes.

In closing, our goal here was looking at a couple of use cases that would get us more familiar with Salt and simplify the day to day management of our infrastructure. Hopefully, looking at the configuration management example and seeing how the various Salt components tie together into a real world event based workflow, provided some value for those looking to get started with Salt on Cumulus.

Share this blog post!

Anthony Miloslavsky has spent the last 15 years in the technology industry focusing on various aspects of network engineering and architecture. After multiple stints in financial services doing traditional networking, his recent focus has been on all things cloud and automation. He is currently working as a Systems Engineer for Cumulus Networks in the New York City metro area, helping financial service organizations (among others) embrace network disaggregation and automation.