Introduction

Have you ever thought you knew how something worked, only to have your mind blown when you learned more? We recently ran into a scenario that forced us to re-examine our understanding of how the Splunk deployment server behaved with respect to local changes on a deployment client. This was a learning experience for me, so I wanted to share this knowledge in the hope that it can help someone else too.

It was our original belief that this was a bidirectional relationship, whereby the client would compare the hash of its app to what was on the deployment server and, if there was a difference, automatically update the app to match the deployment server. In fact, there were some apps - such as the OPSEC LEA app - that (at least in older versions) would write data locally and were considered incompatible with a deployment server due to this mechanism.

This understanding, however, was not quite correct.

What actually happens is that the checksum for an app is calculated once (at app installation) and maintained on the deployment client, in the file serverclass.xml ($SPLUNK_HOME/var/run/serverclass.xml). We’ll cover this file in more detail below, if you’re interested.

Note that the checksum is stored in this file. This value is not recalculated after app installation, and will not come into effect until a new version of the app in question is added to the deployment server (and the deployment server is reloaded).

Any local changes made to the app after it is deployed from the deployment server will not result in this stored checksum changing or the app being re-downloaded from the deployment server. The only exception to this is when the app is removed, it will re-download on next check-in.

Why Does This Matter?

The fact that checksums are only checked for changes upon a modification to the app on the deployment server can lead to configuration issues down the road. Consider this scenario:

You deploy an app from the deployment server to a forwarder.

Another administrator fails to recognize this app is managed by the deployment server, and makes a local configuration change to an inputs.conf file.

This configuration will work for a very long time without any issues, until months down the road…

The app in question is updated (which changes several configuration files) on the deployment server, then the admin reloads the serverclass.

The next time the deployment client checks in, everything that was working stops working. This is because the entire app on the deployment client is replaced and the local configuration is lost.

This example illustrates the importance of tracking changes appropriately, and ensuring that any apps managed by the deployment server are not modified locally.

How It Works Under the Hood

For those of you who like technical details and explanations, let’s dig into the config files a little bit deeper:

A Look at serverclass.xml

As mentioned, the file that controls all of this behavior is $SPLUNK_HOME/var/run/serverclass.xml. This file is included in a Splunk diag, and also can be accessed locally on any system running Splunk or a Splunk Universal Forwarder.

Digging into this file can provide a ton of information about what our deployment client is (supposed to be) doing. Let’s explore a somewhat simplified example:

Each serverClass stanza of this XML file represents a server class in the serverclass.conf configuration on the deployment server. We can see that this example system is a member of the following server classes:

all_HeavyForwarders

all_SplunkInfrastructure

all_linux_servers

all_splunk

Within each serverClass stanza, we see the associated apps. In this example, this works out to the following structure:

ServerClass: all_HeavyForwarders

Includes app: if_syslog_inputs

Includes app: infra_outputs

ServerClass: all_SplunkInfrastructure

Includes app: infra_license

Includes app: infra_authentication

ServerClass: all_linux_servers

Includes app: baseline_linux_inputs

ServerClass: all_splunk

Includes app: all_splunk

As you can see, this is a really straightforward way to see what apps on a given deployment client are managed by the deployment server and what configuration parameters are applied.

A Word on Checksums

Based on our testing, the checksum calculated on the deployment server is highly dependent on the timestamps within a given app. This means that something as innocuous as touching a file (that is, updating the timestamp) will lead to this checksum changing. Be aware of this when working with deployment apps, as it could lead to unintended Splunk restarts depending on how your server classes and apps are configured (only if you reload the server class or restart the deployment server though)!

For those of you who like proof, here you go (thanks to Brian Glenn for doing the testing). Note that simply changing the timestamp on app.conf is enough to cause a new bundle to be generated:

If this functionality is not desired, it can be manipulated with the crossServerChecksum option in serverclass.conf. Setting this to true will result in the md5sum not changing if only the timestamp on a file in a deployment app is modified, as demonstrated below:

While the bundle ID is regenerated (the .bundle file has a new name), the checksum remains the same. This would result in the app not being redeployed due to a difference in checksum (because there isn’t one).

In Conclusion

Hopefully this write-up will save you some pain and suffering when making changes to your deployment apps (or at least help you understand the cause of your pain and suffering if you find out about this too late). If you found this tutorial helpful and there’s another aspect of Splunk configuration you would like to learn more about, let me know!