There's more to WAN performance than just the size of the pipe between offices

Ten years ago, the WAN was the exclusive domain of frame-relay communication and leased lines. Today, a WAN may use anything from IPSec connections and cable modems to MPLS (multiprotocol label switching) tunneled over multimegabit networks. The methods may have changed, but the challenge remains the same: How do you make a WAN seem like one big LAN?

Simply throwing more bandwidth at the problem won’t solve it. MPLS, as described last January in Paul Venezia’s “Supercharge Your WAN”, can go a long way toward improving WAN performance, but the root cause of the problem lies well below the MPLS level.

Other forces are at work conspiring to rob your WAN’s performance and response time; latency, congestion, chatty applications, and traffic contention all affect in how the WAN may respond at any given time. These are the dirty secrets of WAN performance that are usually swept under the rug — if they’re even detected at all. Most of the time, the focus is on the size of the pipe, not on how the pipe is being used.

Size doesn’t matterIn the world of the WAN, the size (that is, bandwidth) of the link often makes little difference in overall performance, particularly when the link is a long one (“long” being more than a few hundred miles). Part of the problem is that TCP and other protocols weren’t intended to function beyond the local-network edge. “The reason why long-distance networks don’t work is that the protocols weren’t designed to do that,” explains Dick Pierce, CEO of Orbital Data, which sells WAN-optimization appliances. “They work pretty well on a local basis, and in some cases even short distances. But wide-area networks don’t. The whole history of how this market segment [WAN optimization] developed was on that basis.”

The problem is that the protocols’ efficiency suffers as latency increases. Latency is based on the speed of light and the overall length of the WAN link, something we have little control over. Don’t think speed of light is a factor? Just experience the latency in a satellite link. (A few years back, one could have argued that routers and switches added significant latency to WAN links, but most backbone equipment today works in the sub-millisecond range.)

Latency affects network protocols in various ways. TCP, for example, uses ACK (acknowledgement) packets to help provide reliability. By receiving an ACK from the receiving endpoint, the sending system knows the packet made it without any errors. But on high-latency links, waiting for ACKs chokes throughput.

Thus, latency is one of the biggest — if not the biggest — killer of WAN performance, both in response time and overall throughput. Long fat networks (LFNs) run at T1 speeds and higher, but suffer greatly from the inherent latency of the link. For most U.S. terrestrial links, the average round-trip time is approximately 150 ms, with satellite links averaging approximately 800 ms. Global links vary greatly, but it isn’t uncommon to see 200 ms to 400 ms or higher RTTs (round-trip times). And increasing the bandwidth doesn’t help.

In fact, due to latency, LFNs are largely underutilized. “The reason people built long-distance pipes that turned out to be empty was they were trying to get predictable application performance by overprovisioning,” Orbital’s Pierce says. “Yet the inherent design of the networks — that they weren’t designed for long distance — was the problem.”

Click for larger view.Rush hourCongestion also affects WAN performance, of course. Congestion occurs when no bandwidth-allocation policy has been applied to traffic on the WAN. Traffic flows can be bursty, such as when one user tries to retrieve a large

e-mail attachment while another user logs in to a CRM portal. With no bandwidth management, the download can bring the smaller link to a grinding halt.

P.G. Narayanan, CEO of Allot Communications, believes that much of the congestion problem can be solved by applying QoS to the traffic. “The problem most of these networks have, though, is temporary … that second, or that minute it’s congested, you can get away with just prioritizing applications. So what you can do is put a gigabit box at the central site to prioritize those applications, the critical applications, on a temporary basis, and you can avoid the congestion, and all other times you’re OK anyway,” says Narayanan.

Prioritizing application flows is an important part of managing your WAN traffic, but it isn’t going to solve TCP’s inherent limitations when latency creeps in. On shorter links where latency isn’t an issue, simply preallocating your bandwidth will help keep important packets moving, regardless of what else is in the pipe. But on LFNs, latency, not congestion, is the culprit.

Talk, talk, talkFrom the end-user point of view, latency gets less tolerable as the back-and-forth communication required for some action increases. And layer 7 protocols — where applications live — are chatty, requiring an absurd number of round-trips to complete a single task. Much like TCP, protocols such as CIFS and MAPI (mail application programming interface) were designed to run inside the LAN, not over the WAN.

The chattiness reaches a crescendo when users map drive letters over the WAN using CIFS (used in Windows networks). Any user that has had to open, edit, and save a Microsoft Word or Excel document from a remote file server knows how long this simple task can take, even over a fat WAN connection. By the same token, users of Microsoft Outlook and Exchange 2000 suffer when they open an e-mail with an attachment over a WAN link. The message appeared to be in their inbox, but in reality it was still on the server waiting to be retrieved.

Microsoft Exchange Server 2003 was designed to mask this problem by downloading messages and attachments in the background (cached Exchange mode). Although this is great for the end-user, it adds additional traffic on the WAN. For example, Outlook now downloads all attachments to your inbox, regardless if you were going to open them in the first place. This places an additional load on the WAN link, which should never happen.

Out with the old…Traditionally, WAN performance was attacked at the packet level. Back in 1998, Expand Networks was one of the leaders in WAN compression. Liad Ofek, vice president of technical services at Expand Networks, says that, at the time, the goal was to “squeeze as much data as possible” into existing links.

Expand used a series of compression algorithms to reduce the number of packets on the wire. Other vendors, most notably Packeteer, also used highly advanced compression schemes and began adding QoS to further allocate and manage WAN traffic flows.

File-caching provides yet another way to reduce traffic by storing a copy of recently accessed files on an appliance near requesting users. As with a browser cache, files and objects are kept closer to the remote user, helping to overcome latency and prevent excessive, redundant requests over the WAN. This is typically a “full file” cache and not made up of smaller data segments. Full-file caching isn’t nearly as effective as newer segment-caching methods, because the chance of a second or third user requesting the same file is slim. Also, if the file on the file server is renamed or changed, then it won’t match the file already in cache and must be transferred again anyway.

In with the newIn recent years, TCP acceleration has taken center stage as one way to improve performance by reducing ACKs and playing games with the TCP window size. Vendors such as Swan Labs, Peribit (now owned by Juniper Networks), Expand Networks, and Riverbed Technology have all developed solutions based on improving TCP’s performance.

One of the most effective methods is to handle TCP ACKs locally, using an appliance. The appliance bundles multiple ACKs into a single request, thereby reducing the delays caused by high latency. To the application requesting the data, it receives an ACK just as it expects to, except the ACK comes from the local WAN appliance and not from the far side of the WAN.

Click for larger view.

The next step beyond TCP tricks is application-specific acceleration. Some WAN optimization vendors use plug-ins in their appliances to help improve application response. Applications such as DNS, Exchange, FTP, Citrix, Notes, and CIFS/NFS can all benefit from reduced chatter on the wire. The plug-ins work much like the TCP ACK optimization in that they handle redundant requests locally instead of sending each one.

There is no quiltThe WAN optimization and acceleration space is heading toward a convergence of sorts. In the past some vendors specialized in a single technology solution, but now they are adding other technologies to solve additional pieces of the WAN problem. Orbital’s Pierce sees the multiple approaches to solving WAN problems as “patches, in the context of patches and a quilt. In the end, it’s about the quilt; it’s not about the patches themselves. Customers buy patches today because there is no quilt.” The trend is for vendors to move away from “point” solutions to a more comprehensive managed system.

Several WAN appliances include compression and TCP acceleration along with file-caching and application-specific acceleration. But not all vendors agree that such consolidation is wise. “I think more customers are more worried about just the visibility into the network,” says Allot’s Narayanan. “They want a good traffic-management company with the ability to decode any application layer properly, not falsify it.”

Other vendors, such as Swan Labs, Riverbed, Disksites, and Juniper Networks, are banking on single-box solutions. Tom Tansy, vice president of Marketing at Swan Labs, sees a further consolidation of technology. He believes many customers are suffering from a “box proliferation problem” and will want to roll out a single appliance instead of many disparate solutions.

Either way, when it comes to speeding up WANs, everyone agrees that more bandwidth alone is not the answer. As long as TCP remains unchanged (and for now it has to) and the speed of light governs latency, boosting WAN performance will require tricks at the protocol level, combined with traffic-flow prioritization and application-specific packet reduction. WAN acceleration solutions will continue to evolve to include multiple techniques for getting most out of your link, at least until we find a way to send data faster than the speed of light. Click for larger view.

Either way, when it comes to speeding up WANs, everyone agrees that more bandwidth alone is not the answer. As long as TCP remains unchanged (and for now it has to) and the speed of light governs latency, boosting WAN performance will require tricks at the protocol level, combined with traffic-flow prioritization and application-specific packet reduction. WAN acceleration solutions will continue to evolve to include multiple techniques for getting most out of your link, at least until we find a way to send data faster than the speed of light. Click for larger view.