Monday, August 17, 2009

Just in time to join the big summer sci-fi blockbusters is a bigger-than life techno-drama for mobile operators: the 3D, 4G Mesh. Unfortunately it’s not entertainment, not even mildly entertaining: tackling sticky QoS issues is a serious dilemma for providers rolling out WiMAX & LTE backhaul. In a previous post I outlined how the move to intelligent, self-organizing networks (SONs) has created unprecedented performance challenges for 4G mobile backhaul. Towers communicating directly with each other to coordinate roaming hand-offs, deliver and optimize user traffic has created an adaptive mesh-based network where the intelligence has been delegated to “empowered towers”.

However operators choose to connect their cell sites together, whether through a direct mesh or traditional hub-and-spoke design, it’s the tower-to-tower latency, jitter, packet-loss and prioritization that counts as users roam between cells while watching District 9. From the user-experience perspective, the network is a mesh regardless of how the data gets moved around. And this is where the mind-bending fun begins.

Enter the 3rd Dimension

The word exponential is not common in backhaul networking. We’re much more comfortable thinking about tidy point-to-point circuits, or even 2D “clouds” with data in, data out. But packet-based applications have gone beyond this to the third dimension: quality of service tiers (service classes) stack up on the network. Priority traffic associated with real-time applications like VoIP and video are latency and jitter sensitive, and need special handling so calls don’t go robotic. And control-plane traffic is just as critical as we roam on the highway and our conversations jump tower-to-tower within milliseconds. Stack up to 8 classes of service on the mesh interconnectivity of 4G backhaul and you’ve got a really interesting mess – in fact an exponential mesh mess.

To illustrate, this simple diagram shows only 4 towers and a Mobile Switching Center, connected through an Enhanced Packet Core (EPC) to PSTN and Internet gateways. The most basic configuration would be 3 classes of service between each site (control plane, real-time applications & best effort). The result? 54 unique service flows to maintain (27 flows in each direction). Now take a more realistic scenario: 100 towers talking to each other while homing to an MSC, and 5 classes of service. The damage? 49,510 unique flows (I’ll let you verify the math)!

In these 49,510 flows, at least 40% (19,804) will be high-priority streams that are particularly QoS sensitive. They’ll need to be monitored for latency and jitter, packet loss, throughput and availability in real-time. Not monitoring is not an option: if something went wrong, how would you even know where to start troubleshooting when you’ve got almost 20,000 flows to sift through? And the other 30,000 or so? They also need to be monitored, at the very least for packet loss and continuity – because you want to know if the whole pipe went down or just one service.

So you’re the operations guy (who definitely is watching a different kind of widescreen content in the NOC). Where do you start? The approach most operators are using clones the mesh itself with a service assurance overlay. Network Interface Devices (NIDs) capable of monitoring up to 100 flows each in a full-mesh setup are installed at each cell site and the MSC. Automation gets them all talking and watching each flow, and a centralized monitoring system crunches mountains of per-second data, boiling it off into a dashboard view that makes sense of this 3D, 4G world.

Sometimes it’s interesting to know what’s happening behind the scenes: the making of one of the most amazing networking stories of our time.