Demystifying DCN Topologies: Clos/Fat Trees – Part1

In this post, I will be focusing on a topic related to Datacenter domain which is an area I haven’t focused in the past. But since I find Topologies as an interesting area, it created enough motivation for me to write this post.

So I know that nowadays the most widely adopted topology for the datacenter fabrics is some sort of a Clos Network variation. But when I look at a Clos network it looks something like in fig.1 and then I see pictures in various marketecture articles similar to Fig.2 and none of them looks the same.

Fig.1

Fig.2

This made me wonder if there are any common patterns and structured thought process behind these various topologies?. And it made matters worse when I noticed people referring to them as Fat-Trees which made me think how ?

In this blog post, we will try to answer the above questions by starting with fundamental concepts of Clos Networks, its properties, then look at Fat-Trees and few fat-tree design options and finish it by decomposing some of the publicly known topologies to our understanding.

Terminology

Types of Network Topologies

There are fundamentally two types of topologies, Direct and Indirect. In case of Direct network, the endpoints sit with the network. In case of Indirect network, the endpoints sit outside the network. Our focus here will be Indirect networks.

Blocking vs Non-Blocking

This Blocking vs Non-Blocking concept is only applicable to networks exhibiting circuit switching type behavior and does not make sense for Packet switched networks. In the case of circuit switched network, the resource is tied down during the duration of a session and is not available for other sessions. This is not the case for Packet switched networks. We will cover Blocking/Non-Blocking networks for the sake of concept clarity.

Blocking Network

Let’s say we have A and B as source connected via a single path to destinations C and D. If A is already using the path to talk to C and at the same time if B wants to talk to D, it can not because there is no free path available from B to D. This is an example of Blocking Network.

Non-Blocking Network

Now if we add another path between Input and Output switches, then B can talk to D at the same time as there is another Path available to route it’s request. This is an example of a Non-Blocking Network.

Strictly Non-Blocking Network

A network is strictly Non-Blocking if it can route a request from the available paths without shuffling any existing connections. It simply means that the number of communication paths should be greater than the number of inputs.

Rearrangeable Non-Blocking Network

A network is rearrangeable Non-Blocking network when it can route a request by rearranging the existing connections. If this doesn’t make sense, don’t worry, we will look into an example later which should clear things up.

Intro to Multistage Networks

We will start with an example and use that to build our concept. Let’s assume that we have 4 set of Inputs which wants to talk to 4 set of Outputs. Our connectivity goal is that any input should be able to talk to any output if both are free meaning the network should Non-Blocking.

One way to achieve this is through a crossbar structure. If an Input wants to talk to an Output, we connect that cross point assuming both are free. But the basic problem with the crossbar structure is that the crosspoint complexity is in order of .As you can see that we need 16 cross points for 4×4 Crossbar switch. Advantage of a Crossbar structure is that we get a non-blocking network but at the expense of higher crosspoint complexity.

Another idea which can be explored is to build small switches with cross bar architecture and stack them together with some sort of connectivity pattern which reduces the crosspoint complexity for large set of inputs and outputs. Let’s explore this thought by assuming that we have a 2×2 digital switch which has 2 inputs and 2 output. We can connect multiple 2×2 switches like below

The problem is that it’s Blocking in nature. For instance, if A is talking to E then B does not have a free path to F.

Two Stage Non-Blocking Network

We can try to build a Two stage Non-Blocking network but the problem is that the crosspoint complexity of those networks is (n is the number of inputs and r is the number of input switches). This is worse than crossbar crosspoint complexity. For example, if we add two more paths like below we get a Non-Blocking network but with a cross-point complexity of 32 (n=2, r=2).

Three Stage Non-Blocking Network

Building up on the previous idea, we can introduce a middle stage between Input and Output. This was in fact the idea of Charles Clos. He demonstrated that one can build Non-Blocking networks by introducing middle stage switches. Extending our previous example, Below is a Three stage Non-Blocking network. Now B can route to F via the down middle switch if the top path is occupied.

Technically the above network is a Re-arrangeable Non-Blocking network. Let’s look at an example to get a better understanding. Assume that in the below fig, colored paths are occupied by B->H and D -> E. Now if A wants to communicate with F, it has no free paths. But, If we re-arrange the existing connections like on the right, A has a free path to F. I hope this make sense. The crosspoint complexity of this three stage network is 24 (6 x [2 x 2] = 24).

If we want to make the above network strictly Non-Blocking network then we can add a 3rd middle switch like below which will make it strictly Non-Blocking. You can see now that A can communicate to F without a need of rearranging any existing connections. The crosspoint complexity of this network is 36 (3 x [2×2] + 4 x [2×3]).

What Clos showed that with a 3 stage network, one can build a strictly Non-Blocking network with crosspoint complexity of which is better than O(N^2). As you can see below in the table that how the crosspoints for Strictly Non-Blocking network becomes less then crossbar for the input = 36 and beyond.

Formal Introduction to Clos Network

Assume a general three stage network like below, where we have “r” Input and Output switches. Every Input and Output switch is of “n x m” and “m x n” dimensions respectively. Every switch in the middle stage needs to connect to every input and output switch which means the dimensions of the middle stage switches needs to be “r x r” as there are “r” input/output switches and there need to be at least “m” of them.

With the above structure in mind, Clos showed that one can build a strictly Non-Blocking network if we satisfy the inequality

and we can build a rearrangeable Non blocking network, if we satisfy

You can check these inequalities for our 2×2 strictly and rearrangeable Non-Blocking networks.

Converting a 3 stage to 5 stage Clos network

Charles Clos also showed that middle stage can be further decomposed and we can recursively build a 5 , 7 or 9 stage Non-Blocking networks. If we go back to our rearrangeable Non-Blocking network and add two more input and output switches. This will result into a network looking like below with 4×4 middle switch and a total of 6 inputs. This is still a Rearrangeable Nonblocking network as it satisfies the inequality, where m = 2 and n =2.

But let’s say we only want to use 2×2 switches in our multistage network. We can achieve this by breaking the middle 4×4 switch with 2×2 three stage clos network and replace it into the original network.

This gives us a network made up of 2×2 switches and the resulting network we have now is a 5 stage Clos network.