It just happened that my 30″ Sony WEGA TV finally gave up the ghost this past week. That beast weighed at least 150 lbs, and I wasn’t looking forward to getting it out of the wall hutch I had built almost 10 years ago. I had to ask both my wife and 14 year for help getting it out of the hutch and out into the garage.

I had to make a quick purchase and I was limited by the dimensions that the wall hutch would allow, a width of 38 3/4″. I did some research and landed on the Vizio 39″ LED TV sold by Best Buy – yes, I still frequent Best Buy and enjoy having a brick and mortar store that I can run down to in a pinch and pickup an item. I decided to pickup a third Google Chromecast already having two others elsewhere in the house.

HDMI-CEC

With Consumer Electronics Control (CEC) you can simultaneously turn on your TV and Chromecast and even change the TV to the correct HDMI input without ever touching your TV remote. I had to enable this feature in the Vizio TV, it was automatically enabled in my 40′ Samsung LED TV, but it makes using the Chromecast very user friendly. You just select the Chromecast on your smart phone, tablet or laptop and it turns on the TV, changes to the correct HDMI input and starts showing your content.

The Chromecast supports 2.4Ghz 802.11bg wireless networks although a 5Ghz version is rumored to be in the works. There’s a deployment guide from Cisco that details how the Chromecast works and what settings are needed in an enterprise wireless network. There’s a little known tidbit about which data rate Multicast traffic is transmitted at;

Multicast applications, such as Chromecast, require special consideration when being deployed over a wireless network because a multicast in 802.11 is sent out as a broadcast so that all clients can hear it. The actual data rate used by the AP in order to transmit the Chromecast frames is the highest mandatory rate configured within that band. For 2.4 GHz, the default rate is 11 Mbps.

In order to optimize the delivery of these frames, it is important to tune the 802.11 data rates within the controller to allow multicast to be delivered at the highest rate that the coverage model of the network can support. For networks with a low density of APs, it may be necessary to keep the data rates at the default. For a network that does not have any requirement to support 802.11b clients, tuning the data rate to 12 Mbps mandatory and lower rates disabled will help to reduce multicast airtime utilization.

I’ve run into similar issues in my enterprise wireless network with Apple TVs as opposed to Google Chomecast, but the same issues apply.

If you’ve been following me recently you might recall that I’ve been chasing an issue with a Motorola WS5100 running v3.3.5.0-002R experiencing high CPU utilization. The problem came to a head this weekend and here’s my quick account of the experience.

The WS5100 would intermittently come under extreme load for 5-30 minutes, so much load that ultimately the entire wireless network would collapse as the Access Ports started experiencing watchdog resets and would just continually reboot themselves. This problem would come and go throughout the day or night, we could go 12 hours without an issue and then go the next 12 hours with issues every 30 minutes. The problem was affecting both the primary and secondary WS5100 so I eliminated the hardware almost out of the gate. I have first hand experience running v3.3.5.0-002R software on a large number of WS5100s and have never had an issue with that software release so I really didn’t suspect the software. This wireless solution had been in place for more than 18 months without any major issues or problems. The local engineers reported that there had been no changes, no new devices. So what was causing this problem? I immediately suspected an external catalyst but how would I find it?

As with most highly technical problems it wasn’t until I could get my hands on some packet traces and I had time to dissect those packet traces that I could start to fully understand and comprehend what was actually going on.

Topology

A pair of Motorola WS5100 Wireless LAN Switches with 30 AP300 running software release v3.3.5.0-002R in a cluster configuration with one running as primary and the other running as secondary. The network was comprised of a single Cisco Catalyst 4500 with around ten individual Cisco Catalyst 2960S switches at the edge each trunked to the core in a simple hub and spoke design. The entire network was one single flat VLAN. The WS5100s were attached to the Cisco Catalyst 4500 via a single 1Gbs interface, one arm router style. The peak number of wireless devices was around 200, the total number of MAC addresses on the network was around 525 (this includes the wireless devices).

Symptoms

The initial problem report centered around poor wireless performance and sure enough I quickly found 30-40% packet loss while just trying to ping the WS5100. When I finally got logged into the WS5100 I could see that the CPU was running at 100%. The SYSLOG data showed me that the APs were rebooting because of watchdog timeouts. PTRG was showing me that here was a huge traffic surge being received from the WS5100. I quickly realized that the traffic spikes in the graph correspond to events that users were experiencing problems.

Packet Traces

I directed the team to setup a SPAN port to capture the traffic that was flowing between the WS5100 and the Cisco Catalyst 4500 switch. This would provide me a better idea of what was actually on the wire and might provide a clue as to what was transpiring. The team setup Wireshark to continually capture to disk using a 100MB file size and allowing the file to wrap 10 times for a total of 1GB of captured data. The next time the problem occurred I was alerted within 15 minutes by the help desk and users but I found that we missed the start of the event. There was so much traffic Wireshark only had the past 3 minutes available on disk so we had to increase the filesize to 300MB and the number of wrap files to 25 giving us a total capacity of 7.5GB. That configuration would eventually allow me to capture the initial events along with the time needed to get to the laptop and copy the data before it was overwritten. While I waited for the problem to occur I took to setting up SWATCH to alert myself and the team when the problem started so we could quickly gather all the data points during the start of the event.

Using the data from the packet traces we were able to identify and locate two HP desktops that were apparently intermittently flooding the network with ICMPv6 Multicast Listener Reports.

We removed those HP desktops from the network and everything has been stable since.

Analysis

Here’s the current working theory which I believe is fairly accurate. The HP desktops were intermittently flooding the network with ICMPv6 Multicast Listener Reports. Those packets were reaching the WS5100 and because the network at this location is a single flat VLAN the WS5100 needs to bridge those packets over to the wireless network. It does this by encapsulating them in MiNT in a fashion very similar to CAPWAP or LWAPP. The issue here is the number of packets and the number of access points or access ports. In this case we had 30 APs connected to the WS5100 so let’s do some rough math;

This explains the huge amount of traffic the WS5100 is transmitting. For every ICMPv6 Multicast packet (or broadcast packet for that matter) received by the WS5100, it needs to encapsulate and send a copy of that packet to each and every AP. If there are 30 APs then the WS5100 needs to copy each and every packet 30 times. Now multiply that by the number of ICMPv6 packets that were being received by the WS5100 and you have a recipe for disaster.

A quick search of Google will reveal a number of well documented issues with Intel NICs.

Summary

There were a few lessons learned here;

The days of the single flat network are gone. It’s very important to follow best practice when designing and deploying both wired and wireless infrastructures. In this case if the wireless infrastructure had dedicated VLANs both for the wireless client traffic and for the AP traffic this problem would have never impacted the WS5100. It may have impacted the Cisco Catalyst 4500 somewhat but it wouldn’t have caused the complete collapse of the wireless infrastructure. Unfortunately in this case everything was on VLAN 1, wired clients, APs, wireless clients, servers, IP phone systems, routers, everything.

The filtering of IPv6 along with Multicast and broadcast traffic from the wireless infrastructure is especially important. I posted back in September 2013 how to filter IPv6, multicast and broadcast packets from a Motorola RFS7000, the same applies to the WS5100. Unless you are leveraging IPv6 in your infrastructure, or have some special multicast applications you should definitely look into filtering this traffic from your wireless network.

Validate those desktop and laptop images, especially the NIC drivers and WNIC drivers. In the early days of 802.1x I can remember documenting a long list of driver versions and Microsoft hotfixes required for Microsoft Windows XP (pre SP2) in order to get 802.1x authentication (Zero Wireless Configuration) to work properly.

I recently happened upon a familiar problem with IGMP Snooping on a Layer 2 topology comprised of Cisco Catalyst 6504 and 4948 switches. Another team was having issues getting Multicast traffic to pass between their Xen hosts which were all on the same VLAN, but where physically wired to the two different switches mentioned above. There was a trunk interface between the two switches, passing all the VLANs so there was nothing wrong with the basic Layer 2 forwarding. In general Multicast frames will be flooded across all ports in the VLAN, unless IGMP snooping is enabled which it is by default in Cisco switches. I remember quite a few challenges with IGMP snooping back in the Nortel and Avaya days. Avaya eventually changed their default configuration such that IGMP snooping is now disabled by default.

In this specific case all the routing was being performed by a number of high-end Cisco ASA firewalls which didn’t have PIM routing configured or enabled so I took the easy approach of just disabling IGMP snooping across the Cisco Catalyst 6504 and 4948 switches and the problem was solved. The cleaner solution would have been to setup an Mutlicast Router (mrouter) on the VLAN to properly handle all the IGMP requests and reports.

As pointed out by a colleague you can use a great little Python script written by RedHat for testing Multicast on your Linux servers.

Like many engineers and network managers I’m finding more and more clients are connecting via our 802.11a/b/g wireless network than ever before. While some of the wireless clients are corporate devices which connect to the corporate network, a large number of wireless devices are connecting to the public guest network which connects to the public Internet. At our largest facility we have some 1,500 corporate devices connecting via wireless. However, we can have upwards of 2,000 public devices connecting to our public guest network at any one time. All those smartphones, tablets and computers put out an immense amount of broadcast and multicast traffic which can adversely impact a wireless network.

I originally calculated that the broadcast and multicast traffic was accounting for between 40Kbps and 60Kbps of traffic on our wireless network. However, looking at the traffic graphs right after the change I was shocked at the delta. I performed the change just before noon and you can see a delta of Mbps not Kbps. I would estimate that the changes are saving us 5Mbps of traffic to/from our wireless network.

That’s a lot of needless background noise that ultimately leads to airtime issues which eventually results in retransmissions, delayed packets, jitter and packet loss which can severely impact application performance.

Over the past few weeks I’ve been working to deploy some filters on our Motorola RFS 7000 Wireless LAN Switches (v4.4.2) so I thought I would share them as a best practice in any medium to large scale wireless deployment. If you only have 10 APs then you probably don’t need to worry about filtering the broadcast and multicast traffic. If you have 500 APs then you definitely need to be paying attention to all the needless noise being generated on your wireless network. In the example below I also took the opportunity to block IPv6 frames since we’re still utilizing only IPv4 on our wireless networks.

There was yet another question recently on the discussion forums (I almost never have to search too hard for ideas to write about) concerning how to configure PIM-SM on the Avaya Ethernet Routing Switch 5000 series. While I’ve written in the past about DVMRP and PIM-SM on the Ethernet Routing Switch 8600 in I’ve never written about running PIM-SM on any of the stackable Ethernet Routing Switches (the 4500 or 5000 series). It honestly took me longer to figure out to configure VLC (with all the changes it’s gone through) than it took for me to configure the Ethernet Routing Switch 5520 or setup the two Windows XP clients. I downloaded VLC v1.1.10 and configured one Windows XP desktop (192.168.200.10) to act as the streaming Multicast server while the other Windows XP laptop (192.168.100.10) would act as the Multicast receiver. I utilized a Multicast address of 239.255.1.1 for this test and I made sure to set the TTL for the UDP stream greater than 1.

While running through the initial configuration I realized that you must have an Advanced License to enable PIM-SM on the Ethernet Routing Switch 5000 series. Since I don’t have any “spare” Advanced Licenses I downloaded the evaluation license from Avaya’s support website and loaded it on my test switch.

With PIM-SM configured I setup VLC on the Windows XP desktop (192.168.200.10) to Multicast the video stream to 239.255.1.1. I then setup the Windows XP laptop (192.168.100.10) to receive the Multicast stream on udp://239.255.1.1:1234. It took me a few minutes to work through some of the new menus on VLC but I eventually got it working.

I was able to confirm everything was working properly with the “show ip pim mroute” command.