VMware SDDC Architecture: Rack Design

Let’s now discuss the Rack design, as it’s closely related. Make that intertwined, as you will see it later. The rack is where all the Server, Network and Storage have to live together, so it’s critical that everyone sits down together and plans this. By everyone I mean the Server Architect, Storage Architect and Network Architect. This should be led by the SDDC Architect. Not too many people, else you may have a committee instead 🙂

With new technology and form factor, the entire infrastructure fits inside a single rack. It is becoming common that customers drastically save space when moving to a more efficient form factor. I think the era of gigantic and unique equipments are slowly coming to an end. I wrote that with a bit of sadness as I used to sell Sun Fire E12K – 25K and HDS 9990. If you are supporting 2000 VM, you should aim toward 2 racks space as your benchmark.

It’s worth repeating: 2000 VM = 2 racks.

Rack Design for 500 VM

In the example below, I have chosen a 2RU 4-Node form factor. You can find many models from Super Micro that uses this form factor. From their web site, you can tell that they have a lot of other form factors to choose from. So you do not have to follow this form factor. Whatever you choose, avoid choosing more than 2. Standardisation could be the difference between camping in the data center and spending time with your loved ones.

I chose this form factor as I have used it before. It saves space and power. It is easier to handle as it is lighter than bigger chassis with same number of nodes and socket. It is also not a blade. There is no backplane and proprietary switches. As a result, I do not have to cater for chassis failure. There is no active component in the chassis that needs patches or replacement. It’s just a metal chassis with no electronic. Because of this, I do not have to span a VMware vSphere Cluster across chassis to cater for chassis issue.

What’s the drawback? Cabling. Relative to blade, you have more of them, and they are visible.

Let’s look at the design. I let you digest it for a 60 billions nanoseconds….

What do you think? I hope now you see the complexity in rack design, and why I said it’s the work of the entire team.

I follow the standard placement of Network switches at the top, and Storage at the bottom. The servers fill the space the middle, and is further split into 2:

Infrastructure:

Network Edge

Management

Workload:

VDI workload

Server workload

You notice something missing? Yes, there are 2 components missing:

UPS

KVM

I’m not familiar with UPS, hence unable to provide advice. What I know is they can fit inside the rack. I’d place them at the bottom. This means Storage will be above them. In general, put heavy equipments at the bottom.

On KVM, I think there is little need since we have iLO. I’m also a fan of dark data center, and do not like to stand in front of rack working on console. I also think this improves security.

For Storage, I’m using Tintri 820. I will discuss the reason why in the Storage section, which I’m drafting and seeking advice from Jason Stegeman, a Tintri SE based in Australia. The blog post will hopefully appear post VMworld. Tintri takes up 4RU, which is a good save spacing.

For Compute, the space required is 14 RU. This gives us 28 ESXi Host. I only need 25, so there will be empty slots in the chassis. I’m using the following logic in placement:

NSX Edge Cluster at the top. This is because the network switch is at the top. Depending on my expansion plan, I may leave space for this cluster to grow. In my diagram, I’m leaving 2 slots, as I’m expecting to grow to 4 nodes.

Management Cluster below the NSX Edge Cluster. This is because I’m not expecting it to grow beyond 4 nodes. Depending on my expansion plan, I may leave space for this cluster to grow.

VDI Cluster below the Management Cluster.

Gap. This is a common expansion area for VDI workload and Server workload. I’m not sure how many each will grow, so by sharing a common expansion slots, I’m giving myself flexibility.

Server Cluster above the Storage box.

The above Rack Design is for Primary Data Center. It has a total of 25 ESXi hosts.

The Secondary Data Center has 27 ESXi Host. The Rack Diagram is very similar.

Notice that there is plenty of space left in the rack. This means I could have used a standard 1RU server instead of 2RU 4 Node building block. Using a 1RU server give you more choice of vendors and models.

We have done the equipment placement. We know they will fit into the rack. But there are 2 more items we must consider. Can you guess?

Yes, power and cooling. Just because we can fit the equipment, does not mean we can power it. Just because we can power it, does not mean we can cool it. Also, we need to consider UPS. All these depends on the Data Center facility. Generally speaking, you can expect 32 Ampere x 2 ceeform per rack. In older data center, you may only have 16A x 4 cables. Plan your power carefully, as you do not want to hit the limit. Generally speaking, I’d buffer 10%. So if I’m given 16A, I’d only use until 14.4A.

Scaling to 2000 VM

We’ve discussed the rack design for 500 Server VM + 1000 Desktop VM. What does it look like when we scale to 2000 server VM + 5000 Desktop VM?

As you can guess, because we are dealing with physical world, the scaling cannot be done without physical re-wiring and relocation of equipments. This can certainly be difficult to execute in production environment. This is why it is critical to plan ahead. You may even need to buy ahead, and leave with extra equipments that you do not actually need. One way to have a better protection is to partner with vendors who are willing to invest the future box, knowing that you will eventually buy them anyway.

Below is the draft Rack Design for 1000 Server VM + 2500 Desktop VM. This is for the Production Data Center, so it has around half the workload. This is an early draft, as I have not applied some of the principles I discussed above.

From the above diagram, can you notice something missing? It’s a big component.

Yes, you are right. It’s the shared Storage. I have not included the shared storage. For that, you have to wait for the next blog 🙂 I’m planning to use VSAN and Tintri 820 as the examples. One idea I’m exploring is to have Tintri 820 per rack, serving the ESXi on that rack only. This minimises cabling.

If you are wondering why I end up with this architecture, the design consideration will help to explain.