Learn how to migrate and modernize stateless applications and run them in a Kubernetes cluster.

There are two major technology trends that have the potential to move the networking industry away from managing discrete devices and instead thinking of the network as a single distributed system. Software-defined networking (SDN) aspires to manage through centralized control, and DevOps aims to treat configuration more like code that is managed and deployed more as a collective than as individual device configuration.

The question is how to take devices explicitly designed to operate discretely and make them behave as a single cohesive unit.

What is a single system image?

There are a lot of technical requirements for the purists, but the basic idea behind a single system image is that a number of devices act in concert as part of a distributed system. In networking terms, this means that the network itself (or, more likely, domains within the network) can be provisioned, monitored, and troubleshot as a single cohesive entity.

In many regards, this is part of the objective behind solutions like Ethernet fabrics. If the datacenter network can be managed as a single logical switch, then behavior can be specified at a global level and then translated into device-specific behavior without needing to touch every switch in the datacenter.

The downside of a single system

The Internet is based on distributed, autonomous behavior. The entire design premise of networking is that devices acting locally can ultimately route around failures in the network. If a particular node fails, surrounding nodes can detect the failure, adjust their forwarding behavior, and distribute their updated information.

Of course, this is most useful in sprawling, heterogeneous networks whose domains span various control spaces. When the physical path traffic must take crosses between multiple networks, each owned by separate players, then the only way to make it all work is through this type of design. But the datacenter is a different beast entirely. It is typically contained within well-defined boundaries, and the network within is usually managed by not just a single corporate entity but also a single network team (forgetting for a moment that there are multiple device types that sometimes lead to a separation between network, security, and appliance teams).

Does the datacenter have a single system image today?

While the datacenter lends itself more easily to a single system image approach, datacenters do not operate this way today. Most datacenters are heterogeneous environments featuring kit from a variety of vendors. Even in single vendor deployments, the disparity between device types is usually prohibitive to those that would manage the network as a single entity.

Fabric-based solutions and virtual chassis implementations remove some of these barriers. But even in these cases, the boundary of the single system image can stretch only up to and not including the WAN. Whether the WAN is implemented using optical gear or MPLS-enabled routers, it creates a barrier beyond which control does not extend.

Can SDN help?

The entire premise of SDN is that control can be separated from forwarding. This separation is not so much about whether the control plane physically resides within the sheet metal that defines the box. Indeed, in virtually every modern networking device, the separation of control and forwarding already exists. But if that control plane is logically central and it encompasses a networking domain, then the set of boxes (or virtual devices) within that domain can be managed collectively.

But even here, SDN has limits. Most SDN efforts in the market today are incremental additions to devices designed under the traditional model of discrete management and behavior. So while you can centralize some aspects of management, not everything ends up being central. For example, you have to wait for OpenFlow 2.0 to get QOS support. The result is that deployments will end up being a hybrid with some centralized behavior and some distributed behavior.

While this is a useful step, it falls short of treating the network as a single system image. You get the benefits of centralized control to help make more intelligent control decisions, but you still carry the burden of discrete device management for the bulk of what makes a typical piece of networking equipment function.

What about adding DevOps?

DevOps (or NetOps for the more specific networking version) is all about treating configuration more like traditional software projects treat code. Conceptually, if you imagine that individual device configuration is analogous to a piece of code, the collection of all those configuration makes up the source code that drives network behavior. Using this model, configuration can be managed much like source code is managed. When provisioning changes are made, the configuration is compiled (built and tested), and then deployed to all the devices that are impacted.

In this model, you get a lot closer to treating the network as a single system image. The configuration of the network can be done as if the network is a single cohesive entity, and since DevOps is more about the configuration and less about the behavior, the issues of dealing with different device types are not as important (though you obviously need to account for different configuration syntaxes and device capabilities).

What is missing?

Even though you can centralize configuration management through DevOps, the devices themselves still act as autonomous entities. This creates some interesting failure scenarios. For example, if there are 100 devices in a network domain, you might use DevOps tools to deploy new configuration to those devices. But what determines a successful change?

If 99 out of the 100 devices successful commit the configuration change but one does not, what is the desired behavior? Imagine the case where configuration changes are pushed to these 100 devices, but one device times out and does not yield a positive response. The likely resolution is to back out all of the changes across the other 99 devices and try again. I don’t want to suggest this is not manageable, but the failure scenarios get interesting.

For a timeout-related failure, the problem is that some operators will deal with the behavior by adjusting timeout parameters. If you set the timeout to some arbitrarily high value, what happens in the interim between issuing a configuration change and flagging the timeout? During those seconds or minutes, the network has 99 entities acting under one set of rules and the other acting under the previous set. These types of situations can be difficult to identify, and depending on the changes can be service impacting for applications and users on the network.

What is needed?

Ultimately, if you want your network to behave as a single system image (not just from a configuration management perspective but also from a behavior perspective), we likely need to look at how we develop networking equipment. Making things that are not one appear as if they are gives us the illusion of cohesiveness. For many cases, this can be an extremely powerful thing. But when you start looking at end-to-end service deployments across multiple domains (a single Layer-2 domain stretching across two datacenters using an interconnect network in between, for instance), it’s the details that will matter. Even slight mismatches in policy across multiple domains can create resource islands in the datacenter, which defeats the entire purpose of moving to distributed architectures.

The bottom line

SDN and DevOps are absolutely critical additions to the networker’s arsenal. They help to alleviate some of the issues that stem from the distributed nature of networking. But simply adding layers on top of the network doesn’t change the fundamental architecture on top of which applications reside. If applications continue their move towards scale-out architectures (Big Data, clustered compute, clustered storage, and the list goes on), the underlying network infrastructure will have to undergo similar transformations. How tightly we cling to legacy constructs will determine how quickly we move to something more capable of operating in conjunction with the new breed of applications.

- See more at: http://www.plexxi.com/2014/09/sdn-implications-single-system-image/?utm_source=feedly&utm_reader=feedly&utm_medium=rss&utm_campaign=sdn-implications-single-system-image#sthash.3YPcHT8n.dpuf