It's all in the fabric for the data centre network

IT infrastructure is worth exactly nothing if the network doesn't work. The network designs we have grown so comfortable with over the past 15 years or so are wholly inadequate if you are building a cloud or merely undergoing a refresh that will see more virtual machines packed into newer, more capable servers.

The traditional network was a hierarchical, oversubscribed affair. Servers would be linked to top-of-rack switch through1Gb ports. That switch would connect to the end-of-row switch via one or more 10Gb ports and the end-of-row switches would talk to the core at 40Gb. Dial the numbers up or down depending on your budget

Trunking allowed you to lash together multiple ports into a single logical link; 10Gb to a single server was considered for some time to be fast enough for converged networking – storage and userspace traffic on the same link.

Things have changed so drastically that it can't be solved merely by turning the knobs on port speeds.

All points of the compass

Get networking types talking about network design and you will inevitably hear them discuss "north-south traffic" versus "east-west traffic".

The nomenclature is derived from the standard hierarchical network design. The core switch was represented as the topmost item on a network diagram, thus it was north. Individual servers – or more accurately the virtual machines and applications they hosted – were at the bottom, or south.

In the early 90s, when the client-server revolution really took off and the World Wide Web was just coming into being, virtually all traffic in a data centre was from servers in the south to users on the other end of that core switch in the north.

A server on the east side of the map wanting to get data from a server on the west side would have to go through several hops to get there. It would have to go to its rack switch, its row switch, the core switch, then another row switch and another rack switch to the destination server and back again.

Typically, each of these links was oversubscribed. You might have 30 servers on a rack each talking to the rack switch at 1Gb. That rack switch would then have only a 10Gb link to the end of the row.

Twenty racks in a row would share a single 40Gb link back to core, and so forth. So long as there wasn't a lot of east-west traffic – and so long as server densities didn't rise too high – this worked quite well.

Fast-moving machines

Things changed. Network traffic became predominantly east-west. Depending on who you talk to you will get different reasons for this shift.

Brocade's Julian Starr believes that a shift towards new application architectures based on message buses, tokenised passing (via things like XML) and similar modern web architectures are responsible .

I argue that virtualisation was the enabler. Before virtualisation we had run all of those pieces on a single server. After virtualisation we started to break out one-application-per-virtual-machine and virtual machines were mobile.

They wouldn't necessarily keep their inter-app conversations within a single host. Regardless of which came first – app model or infrastructure – it was at this point that the change from north-south to east-west really began.

Two major changes affecting networking occurred at the same time: a shift towards centralised storage; and a push to drive server utilisation as close to the red line as possible. All of a sudden servers were talking east-west to get storage traffic while also chattering among themselves for application traffic.

The hierarchical network wasn't dead but it was certainly on life support

To make matters worse, blades and other high-density solutions became popular, converting 30 servers per rack into well over 100. The traditional hierarchical network wasn't dead – there was nothing to replace it with yet – but it was certainly on life support.

QoS and binding ever more ports together into ever wider links bought everyone some time. Storage was kept on its own network, but this was expensive and, more critically, inflexible.

Applications in a hierarchical network are still very siloed; they cannot stray far from their storage, and that storage shares the same hierarchical limitations as the rest of the network.

Workloads were becoming dynamic. Different virtual servers would need access to the same chunk of storage at different times: traditional north-south userspace stuff during the day, big-data number crunching at night and backups in the wee hours of the morning.

As data centre workload complexity grew it became increasingly difficult to keep all this traffic close enough to the data for the network not to represent an undue bottleneck.

Another fine mesh

The need for a new topology – a mesh, be it full or partial – has become painfully apparent. Servers need to be able to talk east to west with as little contention as possible without sacrificing north-to-south connectivity along the way.

Switches need to be able to determine the best path for a packet without needing to get into full-on layer-3 routing.

Getting the packet from A to B needs to be a layer-2 affair: something that doesn't require routing based on IP addresses and where getting more speed between two switches is as simple as plugging in another cable between them.

What is more, the human element of networking has become a problem. Modern data centres are heavily automated. New virtual machines are created and destroyed much faster than a network administrator can manually configure a network port or a storage administrator can assign storage.

Network configuration needs to be automated – something that traditional network equipment and management platforms just aren't good at.

This break with the traditional hierarchical network is one of the foundational considerations behind software defined networking (SDN) and is the most important movement in data centre networking to have occurred in decades. These modern networks are referred to as a network fabric.

Instead of a pyramid with a core router at the top, picture a tapestry of interwoven threads which intersect in an almost haphazard fashion but ultimately give rise to an elegance that belies the chaos of the individual elements.

Command and control

Currently, there are a number of approaches to making some or all of the elements of a modern network happen. Transparent interconnection of lots of links (Trill) and shortest path bridging stitch networks together into a fabric. Others take this a step further by completely separating the control plane from the data plane.

Traditional switches are little islands that intercommunicate. Each holds its own configuration and needs to be babied along. Its ports are configured individually, setup is handled separately, and generally there is a lot of rather unnecessary labour involved.

Modern switches are starting to be capable of SDN. This means they can be controlled centrally. The industry terminology is "separation of the control plane from the data plane" but that's not exactly helpful.

Put simply SDN is about separating the decision making and configuration widget from the device actually doing the work.

For infrastructure guys a great example is RAID controller software. Each RAID controller does the work of turning groups of disks into a single volume, and each RAID controller can be accessed and configured individually if absolutely necessary. This is the equivalent of the data plane that network types go on about.

The control plane is the centralised application from which an entire data centre's worth of RAID cards can be managed, maintained, configured, monitored and so forth.

Move up a level from RAID cards to storage area networks (SANs) and that control plane has the ability to do things such as inter-system replication, mirroring across devices and so forth.

With SDN routing decisions – layer 2 or layer 3 – are made by a separate controller that can see what is happening across the entire network.

Switches are reconfigured automatically, not only in response to a server being added or a virtual machine being created, but to detection of a downed link, changing traffic patterns or even an alert from various network security systems.

OpenFlow is emerging as the most popular way to do this, though there are other attempts at open standards and some proprietary versions too.

Brocade gets a nod for "old to new transition therapy": the latest version of its NetIron software can run ports in hybrid mode, allowing both OpenFlow and traditional routing to operate on the same port.

Name a price

We are at the beginning of the SDN revolution. The standards and patent wars have barely begun.

There is an incredible amount of FUD being flung about and a great deal of defensive hand-wringing by those who haven't adapted to changing requirements as well as others.

Amid all the hullabaloo about capabilities or performance, price is a very real consideration. All the sexy automation in the world doesn't help you if you can't afford it or if the minimum buy in to make it happen is an order of magnitude larger than your current data centre deployments look set to be.

You can lower the cost to entry if your vendor offers a port-based licensing approach or a subscription alternative. ®