When things are managed automatically with good tools, they work. When people manage them, they often work. Here we talk about automation, managing systems, monitoring, discovery, open source, and related topics.

13 December 2010

In my previous article on scalable monitoring, I described an interesting and highly scalable low level infrastructure for monitoring servers and services. Although that proposed architecture would work without network topology information, it worked best with a certain minimal amount of network topology information - which machines were connected to which switches, and what subnets these various servers are connected to.

This article proposes a high-level design for collecting and managing this network topology information.

Basic information Outline

The previous article described a three-level ring structure with each ring representing a different level in the network hierarchy. Although other topologies are possible, and some may even be especially interesting, this article will expand on the three-level ring topology. These three levels were (from top to bottom):

Collections of subnets

Collections of switches providing a given subnet

Collections of hosts attached to switches

Towards this end, there are a few types of data items that need to be collected. These include:

A (hostid, switchid) tuple. Depending on the protocol (CDP, LLDP) a given switch might have different types of switchids. One might be the fully qualified domain name of the switch, and another might identify itself by the base MAC address of the switch.

A set of one or more (switchid, subnet-description) pairs collected per-host. For IPv4, the subnet description can be a standard CDL address/mask format. I don't know what the right terminology is for a corresponding IPv6 physical network segment.

This information is sufficient for creating a basic network topology map from which one can create the 3-level ring structure as described in my previous blog post.

Some Considerations

For both LLDP and CDP, the client cannot solicit an announcement - you just have to wait for it. The default interval for CDP announcements is one per minute. This creates some complications in handling newly booted hosts since there is no way to tell when, if ever, an LLDP or CDP packet will arrive. In addition, since one can move switch ports around easily, the affiliation of a given host with a given switch can change dynamically.

This implies that a general design for managing network topology must be prepared for a host to have no switch identification, for it to be supplied later, and for it to change.

Minion Design Outline

As you will recall from my previous article, I use the term minion to refer to a server which sends and receives heartbeats and reports failures. For a minion, the job is relatively simple. Here are a few elements of what it has to do beyond the tasks identified in the previous article:

On initialization, send a startup packet containing network segment information for the host.

Forward LLDP or CDP packets to the overlords when it first hears one and when something significant changes in the packets it hears. Significant is meant to include changing of the identity of the switch, what port the host is connected to, and perhaps a small number of other conditions. Minions only have to understand a small number of fields in CDP or LLDP packets

Overlord Topology Design Outline

Overlord is a term I used in my first article on this topic to refer to a host that is performing a management function. There may be many overlord machines performing a variety of functions, and they may be in a cluster or other method of enhancing reliability. For the purposes of this article, I will describe a single node implementation of the topology management function.

For managing topology, an overlord has to perform the following functions:

Receive announcements from minions announcing their startup. When this happens, the overlord should initially add it to its subnet ring as described below.

Receive update CDP/LLDP packets from hosts and depending on how things have changed, one or more of the following actions can be taken:

Move it from the subnet ring into the ring for its switch.

Move it from one intra-switch ring to another

Create new higher level ring(s) for its switch and/or subnet.

Assign it a position in a higher-level ring.

Take away its position in a higher-level ring.

Give its position in higher level ring(s) to other nodes.

Remove subnets and/or switches from the active configuration.

Update the active configuration database with the forwarded CDP or LLDP packet.

Receive "apparent death of host" notices from minions announcing that another minion has apparently died. It will need to take one or more of the following actions when this happens.

Put a message in syslog for the death of this host

Send a detailed message to a higher-level management entity.

Close the intra-switch ring this host was a member of (if any) by sending the dead machine's neighbors directives telling them to send heartbeats to each other instead of the dead machine.

If the ring had two nodes before, move the surving node to its subnet ring.

Assign its role in higher level rings to another eligible host.

Remove the host information from the active configuration.

Remove a subnet or switch from the active configuration.

A few special cases need to be accounted for:

A two-node ring - each only needs to send the other a single packet per interval - not two.

A configuration with only one minion.

Although there are a number of cases to be taken care of, the details resemble the well-known algorithms for inserting and deleting from doubly-linked circular lists. Of course, this case is more complex because things might fail in the middle of these various rearrangements. What comes to mind are finite state machines to handle timeout and other failure conditions appropriately.

No doubt there are things I've overlooked, but this should be enough to demonstrate feasiblity and approximate level of difficulty.