Network ZEN: Stackable or Chassis Switches

The Zen Master was meditating over his network, at oneness with the Flow. A student approached the master and asked “May I not stack the Cisco c3750 switches to create more connections to the Flow ?”

The Master looked carefully at the young apprentice and said “It is believed that the sum of the parts is greater than the whole and that combining many into one creates more Flow”. And the student nodded, because that was his thought.

The Master smiled and then said “But that which is many, always remains many and is never truly one”.

The failure rate on the C3750 is pretty bad. If you have a couple of hundred C3750 switches in your data centre, you can expect to have a failure at least once a week. If you had of purchased C4500 or C6500 you wouldn’t have those failures.

It’s not as bad as HP ProCurve though. Their entire product range is much worse.

Ethan has it much worse that though. His experience has been really bad.

(1) Failure rate way beyond acceptable if you’re a 100% uptime shop. Most of the failures experienced have been due to stackwise port problems.

(2) Stackwise incompatibilities between IOS versions makes upgrading without taking down the entire stack next to impossible without temporary upgrades to middle versions. You want to get from 12.2(25)SEE to 12.2(50)SE2, but you can’t reload the whole stack at once because your environment has zero tolerance for downtime? You can’t do it in one step…if you reload one switch on the new 12.2(50) code, and he won’t join the 12.2(25) stack because of stackwise version incompatibilities between old and new IOS. You’d have to do interim upgrades to lesser IOS versions to get to 12.2(50). Multiply that problem by 100 stacks you need to upgrade, and you’d rather take a 3750 stack to corporate in San Jose and set it on fire first.

(3) When replacing failed stack members, very often the shiny new member won’t join the stack. He’ll just sit there as “provisioned”, and/or will hang on load waiting for the stack master to send him port information. This is using an IOS version identical to the others in the stack he’s joining. Nothing like having a finite change control window to replace a failed switch and the stupid switch won’t join the stack while the clock goes tick…tick…tick.

(4) Stackwise ports flapping excessively and eventually taking the entire stack down, unless you unplug the failed port, which of course breaks stackwise backplane redundancy and puts the stack at risk for splitting until you can replace the failed switch. Comedy ensues if the stack actually does split and you use cross-stack etherchannels or layer 3 functionality.

When dealing with Cisco on these issues, we asked them how other data centers were dealing with their 3750s, since a 6500 core/dist and 3750 access layer was a Cisco-recommended design back in the day, at least from the SE’s working our account 4 or 5 years ago. All we got were blank stares and nervous silence, as they told us that no other data centers really use 3750s at the access layer, at least not in mission critical environments. 6500s were the better choice (duh).

The 3750s are great in concept, but I’ve seen too many problems with the stackwise ports over the years to ever want to see another 3750 again.

As far as VSS, it’s a similar concept with a very different execution. VSS lab testing has generated warm fuzzy feelings for me thus far (at least when using the SXI code train, SXH was not so great). It’s on my list to create a “dual-active” scenario and test recovery before I’m completely overwhelmed by the warm fuzziness, though.

Just deployed two small six switch stacks of 3750-48’s for an office expansion. Hadn’t really heard about an MBTF issue. Couldn’t have been listening! Is it still current for v2 series?

The office expansion should be OK, although my greater worry is my client is seriously considering upgrading its HQ network and replacing 18 x 6509 (2 per floor) with 3750 stacks (5 per).

I’d rather they stayed with the 6500’s and upgraded Sup to a 32 for the edge and 720 for dist & core. However, vendor C and their sales teams are pushing 3750 religously. I’ve mentioned my concerns but not getting heard….. 🙁

I have one of these disconnected right now in a lab. To my amazement, if you do a “show switch stack-ring speed”, it only operates at 16GB unless you put the cable on it and hook it back to itself. According to the QoS guide, everything (even internal to the same switch) hits the stack-ring. I know, I know, why would anyone ever have a standalone 3750. True, that is just a glorified 3560. I was just getting this one ready to redeploy, but it surprised me a bit.

Network Break Podcast

Network Break is round table podcast on news, views and industry events. Join Ethan, Drew and myself as we talk about what happened this week in networking. In the time it takes to have a coffee.

Packet Pushers Weekly

A podcast on Data Networking where we talk nerdy about technology, recent events, conduct interviews and more. We look at technology, the industry and our daily work lives every week.