When things are managed automatically with good tools, they work. When people manage them, they often work. Here we talk about automation, managing systems, monitoring, discovery, open source, and related topics.

30 August 2012

Injecting Nanoprobes into Servers - What's that about?

I've recently had some people who have asked about the how nanoprobes work – are they clients, or what exactly are they? They start out like clients, and behave in some ways like peers, and maybe a bit like servers. So what the heck are they? The simplest explanation is that they are autonomous delegates of the central management authority. Read on to find out more about how this unconventional model works and why this authority model is key to unprecedented scalability and stealth discovery™ in the discovery-driven Assimilation monitoring project.

In the Assimilation Monitoring project, there are two primary kinds of entities, the Central Management Authority (CMA) which there is one of, and nanoprobes, which there are many of – one per operating system image being monitored.

The simplest way to understand the relationship between the CMA and its nanoprobes is that nanoprobes are just extensions of the CMA – with limited authority. They are extended limited trust – enough for them to do the requested tasks – and more than in typical client-server models. Within this view of the world, they are autonomous agents, trusted to do the tasks requested without continual supervision – implementing a no-news-is-good-news philosophy. So if a nanoprobe is requested to monitor a web server every 10 seconds, it will do so silently unless a failure occurs. This is our “no news is good news” philosophy. When nothing is broken, and no configuration is changing, the nanoprobes say nothing to the CMA.

They receive orders from the CMA, and whatever those are, they carry them out. These order include things like requests to perform repetitive discovery actions, monitor services which only involve the current OS image. But they also include some actions which involve other entities in the network. For example, they send heartbeats to peer nanoprobes, and are asked to expect them within a certain interval as well. Similarly, they can be asked to perform monitoring actions that monitor services provided by appliances and other things we can't put nanoprobes in – and it likely won't even know it's a remote thing being monitored. For example, when monitoring a web server, it does an HTTP GET operation on an IP address. Normally, this is a local address, but the monitoring agent doesn't know or care if it's remote. Of course, the CMA cares if it's local or remote, but the nanoprobe doesn't.

This is another important aspect of the authority scheme – nanoprobes have no idea why they're being asked to do anything, and they don't need to. They are policy-free. This is makes them lighter-weight, and easier to keep updated – because they embody less knowledge which might need updating. Given that there may be a million of them, this is a really good idea.

Security of nanoprobes is important – particularly when acting as peers. For this reason, nanoprobes use public keys to sign all their packets, and the actions peers can perform are limited. Currently they only send heartbeats to each other. This is an action with minimal security implications. They only obey commands from the CMA, whose public key each one knows.

Nanoprobe startup resembles DHCP. It starts up, sends a “phone home” message to the CMA announcing that it's here, and what it's local network configuration is, and waits for orders from the CMA. The CMA then sends it a configuration packet, indicating that it heard the request for configuration, and a series of other packets telling it who to send heartbeats to, and what discovery actions to run. The default on the initial phone home message will be to send a multicast packet to a reserved multicast address. For many sites, this means that the Assimilation system will require no configuration at all to get started. For sites that don't support multicast like clouds, the address of the CMA will have to be configured for startup.

Because we have continually running active agents on every monitored machine, we can monitor things besides services – we can monitor anything for which an agent has been written. This is how stealth discovery works. Our active agents can monitor the set of ports being listened to and notify the CMA when new services are added, or old ones go away – just by using netstat. Because we run on the machines, our nanoprobes can discover anything internal to the machine, and monitor when those configuration things they've discovered change. Since the output of discovery agents is JSON, it's easy to see if it's changed, and in the spirit of no news is good news, only send things to the CMA when they change. In some sense, discovery is simply monitoring the metadata of the data center.

Since discovery is just running scripts that produce JSON, it's quite simple to discover more or less anything on a server. This might be hardware configuration, software configuration, ARP neighbors, etc. OF course, the CMA puts this information into the Neo4j graph database. These kinds of scripts are what SysAdmins are typically quite good at writing, so our hope is that we can create a community of people collaborating on stealth discovery – similar to the community around OCF resource agents. One good sign, Ted Zlatanov of the CFEngine3 project has decided to collaborate with us on this effort.

In many ways, this autonomous delegation methodology is exactly what makes the Assimilation project able to do what it does. This treating of discovery as the monitoring of metadata makes this work elegantly, and is the key to our approach. In my view, discovery and monitoring put together into discovery-driven monitoring truly is greater than that of the simple sum of its parts.

What's your take on all this? Does this slightly oddball way of working make sense to you? What questions do you have about it? Does this all hang together for you, or just sound like another self-congratulatory load of organic plant fertilizer?