We're getting a pair of new 8Gb switches for our fibre channel fabric. This is a Good Thing since we're running out of ports in our primary datacenter, and it'll allow us to have at least one 8Gb ISL running between our two datacenters.

Our two datacenters are about 3.2km apart as the fibre runs. We've been getting solid 4Gb service for a couple of years now, and I have high hopes it can sustain 8Gb as well.

I'm currently figuring out how to reconfigure our fabric to accept these new switches. Due to cost decisions a couple of years ago we are not running a fully separate double-loop fabric. The cost of full redundancy was seen as more expensive than the unlikely downtime of a switch failure. That decision was made before my time, and since then things haven't improved much.

I would like to take this opportunity to make our fabric more resilient in the face of a switch failure (or FabricOS upgrade).

Here is a diagram of what I'm thinking for a lay-out. Blue items are new, red items are existing links that will be (re)moved.

The red arrowed line is the current ISL switch link, both ISLs are coming from the same switch. The EVA6100 is currently connected to both of the 16/4 switches that have an ISL. The new switches will allow us to have two switches in the remote DC some one of the long-range ISLs are moving to the new switch.

The advantage to this is that each switch is no more than 2 hops from another switch, and the two EVA4400's, which will be in an EVA-replication relationship, are 1 hop from each other. The EVA6100 in the chart is an older device that will eventually be replaced, probably with yet another EVA4400.

The bottom half of the chart is where most of our servers are, and I'm having some concerns about exact placement. What needs to go in there:

At the moment the ESX cluster can tolerate up to 3, maybe 4, hosts going down before we have to start shutting VMs down for space. Happily, everything has MPIO turned on.

The current 4Gb ISL links haven't come close to saturation that I've noticed. That may change with the two EVA4400's replicating, but at least one of the ISLs will be 8Gb. Looking at the performance I'm getting out of EVA4400-A I am very certain that even with replication traffic we will have a hard time crossing the 4Gb line.

The 4-node file-serving cluster can have two nodes on SAN1SW4 and two on SAN1SW1, as that'll put both storage arrays one hop away.

The 10 ESX nodes I'm somewhat head-scratchy over. Three on SAN1SW4, three on SAN1SW2, and four on SAN1SW1 is an option, and I'd be very interested to hear other opinions on layout. Most of these do have dual-port FC cards, so I can double-run a few nodes. Not all of them, but enough to allow a single switch to fail without killing everything.

The two MS-SQL boxes need to go on SAN1SW3 and SAN1SW2, as they need to be close to their primary storage and db-export performance is less important.

The LTO4 drives are currently on SW2 and 2 hops from their main streamer, so I already know how that works. Those can remain on SW2 and SW3.

I'd prefer not to make the bottom half of the chart a fully-connected topology as that would reduce our usable port-count from 66 to 62, and SAN1SW1 would be 25% ISLs. But if that's strongly recommended I can go that route.

Update: Some performance numbers that will probably be useful. I had them, I just spaced that they're useful for this kind of problem.

EVA4400-A in the above chart does the following:

During the work-day:

I/O ops average under 1000 with spikes to 4500 during file-server cluster ShadowCopy snapshots (lasts about 15-30 seconds).

MB/s generally stays in the 10-30MB range, with spikes up to 70MB and 200MB during ShadowCopies.

During the night (backups) is when it really pedals fast:

I/O ops average around 1500, with spikes up to 5500 during DB backups.

MB/s varies a lot, but runs about 100MB for several hours, and pumps an impressive 300MB/s for about 15 minutes during the SQL export process.

EVA6100 is a lot more busy, since it is the home to the ESX cluster, MSSQL, and an entire Exchange 2007 environment.

During the day I/O ops average about 2000 with frequent spikes up to around 5000 (more database processes), and MB/s averaging between 20-50MB/s. Peak MB/s happens during ShadowCopy snapshots on the file-serving cluster (~240MB/s) and lasts for less than a minute.

During the night the Exchange Online Defrag that runs from 1am to 5am pumps I/O Ops to the line at 7800 (close to flank speed for random access with this number of spindles) and 70MB/s.

Do you know how many systems you are going to be CA'ing? We're seeing ~20Mbps for a "typical" departmental Oracle based system.
–
Simon CatlinNov 23 '10 at 20:08

@Simon Our Oracle stuff is in another environment entirely. Right now 6 servers talk across the long-range ISLs, only 4 of which do so continuously; the other two do large bursts 1-2 times a day. Throughput to that EVA averages about 15-30MBps with peaks up to 150MB during normal backups, and 320MB during the SQL exports (lasts about 15 minutes).
–
sysadmin1138♦Nov 23 '10 at 20:42

1 Answer
1

Had a look at what you've got and what you want to achieve, I had a few thoughts, here's a nice picture first...

There seems no point using an 8Gbps link between sites just now - the reason is that you're constrained by the 4Gbps ports on the remote 4400, you’ve got a stable 4Gbps already plus the available bandwidth is much higher than the actual usage requirement - it just seems a waste, today, to put one of the 24x8 switches over there. I'd use two of the 16x4Gb switches at the remote site.

I'd be tempted to use the new 24x8 switches as your main 'core' switches - most of your traffic is server-to-6100 and the new box will be much faster. This way you should see some, small, performance gains as the new switch has larger buffers and lower latency, plus you can pick and choose which servers to move to 8Gb as and when you like, same for when you swap out the 6100 (the 4600's have native 8Gb ports but that's not official yet ;) ).

We then get into a part of the design where we have two options; to keep or discard the two 16x4Gb 'middle switches' - purely based on port count. Basically if you used the 24x8 switches as core boxes you only have 3 spare ports (as you'll use 18 for the 18 servers, plus 2 to the 6100 and an ISL link, equalling 21 used). You could connect the local 4400 to the 24x8 switches, leaving you exactly 1 port free for your tape drives but that leaves you with zero free ports. What I'd be tempted to do is use the two 16x4Gb 'middle switches' either as secondary local switches to handle the local 4400 and tape drives or possibly to handle the inter-site ISL links if you wished - although you'll have ports free on the 24x8Gb switches to do that directly from there if you wish - I haven't shown both as they're really very similar.

So that's my thoughts - there are tweaks to be had all over but my general ideas are there - feel free to come back to me with any clarifications.

Budget Ghods willing, the hope is that when we get around to replacing the 6100 we'll also be able to put a couple ESX servers in the remote site. I am perfectly happy waiting until the powers that be realize that having the post-6100 array have a replication partner in the remote site is The Thing and wait until that project for 8Gb inter-site ISLs. When I get back into work I need to poke people about how likely those new ESX boxen are w/o the 6100-replacement.
–
sysadmin1138♦Nov 24 '10 at 15:19

1

After having coffee and thinking about it, I do have some comments. One of my goals is to be better at handling switch failures (or reboots), the linear topo gets broken when that happens. A couple of ISLs'll fix that. Keeping the 24/8's in one site is a very good idea I'm keeping. Tasty 4600.
–
sysadmin1138♦Nov 24 '10 at 16:35