Bulletproof IP telephony deployment series

Part 3: Read about how to troubleshoot the OSI network layers in an IPT deployment.

Quality of service The IETF has specified two major QoS mechanisms that can be applied in routers. The Resource ReSerVation Protocol (RSVP) was patterned after a circuit-switched network. RSVP allows devices to request specific qualities of service from the network for particular application data streams or flows. The routers in the network respond by either allocating the requested QoS if the network can provide it, or by denying the request if requested resources are not available. Once a flow has been reserved, RSVP may deny requests from other hosts if no additional bandwidth or resources are available for additional flows. Since RSVP knows if the end-to-end network can support a call before it allows it, congestion is controlled very well. Nonetheless, RSVP has not been widely adopted in enterprises or other internetworks. One reason is that it requires that the state of each flow be maintained in each router along the path. The scalability of such an approach on large intranets has kept most network administrators from deploying RSVP.

The IETF's other Layer 3 QoS mechanism, Differentiated Services, has been widely deployed in production networks. Instead of being concerned about each flow, as is RSVP, DiffServ aggregates packets into groups, identified by DiffServ Code Points (DSCP). Routers along the path are configured to inspect the codepoint in the IP header and forward the packet based on the class of service to which the packet belongs. Such a response is called a per-hop behavior (PHB), as each router can provide a different forwarding treatment for each packet.

Generally, voice bearer traffic bearer should be delivered with low delay and jitter, while signaling packets should be guaranteed to be delivered without loss. DiffServ specifies an Expedited Forwarding PHB to build a low loss/latency/jitter, end-to-end service through a network. In an EF queue, voice packets are processed as quickly as possible, in order to keep the conversation flowing. DiffServ specifies an Assured Forwarding PHB in RFC 2597, in which signaling packets are sent with low-drop precedence. In an AF queue, voice signaling packets should "just get there." Some delay is tolerable, but no loss is acceptable since signaling packets are more servicing affecting than voice bearer packets. On an outbound router interface, sending both EF and AF packets out of a low latency queue provides best results.

Bandwidth engineering and call admission control RSVP has built-in call admission control: if the initial end-to-end test of the network finds any link that can't provide the bandwidth and performance requested, then the call is refused. DiffServ, however, has no such inherent characteristic. Therefore, careful engineering must be done to make sure that a link is not oversubscribed. Usually if too many calls are active through an oversubscribed link, all of the calls will lose packets as the high-priority queue fills up with too much high-priority traffic. Call Admission Control settings on the telephony system can limit the number of calls between devices in different geographical or logical network regions. This not only maintains call quality, but also can keep the network from melting down from uncontrollable overload situations. Queue depths on router interfaces should match the bandwidth (based on the number of calls) admitted in the call admission control settings, as in Figure 7.

Redundancy Network redundancy typically means providing multiple routes from a source to a destination. While this is certainly important for voice and data packets, it can cause some subtle mis-configurations that might impact voice quality. Redundancy at Layer 3 is generally simpler than it with spanning tree at Layer 2. However, at two important engineering techniques still need to be assured:

A. Paths between source and destination should be symmetrical. If a voice packet is sent from a phone to a gateway through devices A->B->C and packets in the reverse direction, from the gateway to the phone, are sent through C->D->A, then there is an asymmetrical route. If there are variances in link speeds, congestion, or performance between B versus D, then delay, jitter or other impairments may make communicating more difficult for the callers. In addition, troubleshooting is much simpler with symmetric routes

B. Load balancing should be done on a per flow (or per session) basis, rather than on a per packet basis. If per packet load balancing is used, jitter may result as packets travel from the source along through different network paths and arrive at the destination out of sequence. Figure 8 illustrates the improper, then desired results.< p>

Security While security for IPT is worthy of all of the articles written about it, an in-depth analysis is not the focus of this article. However, there are a few Layer 3 security-related points to make in the context of troubleshooting.

It is well known that NAT causes problems for IPT, especially many-to-one NAT. Prior to setting expectations to users about work-at-home-phones, be aware that H.323 and SIP messages have embedded IP addresses in them. These become part of the IP payload, encapsulated in an IP header. NAT translates the IP address in the IP header, but not the embedded addresses in the H.323/SIP messages. When the corresponding IPT device looks in the payload and sees the IP address that was in the H.323 message, instead of the NAT address, it tries to respond to that original address. So when the response is sent, it's being sent to the original (private) address, not the NAT address. As a result,the packet being sent back to the H.323 device can get dumped in a router's bucket, since the private address isn't usually known, since it' s been translated. While there are vendor specific workarounds, the IETF is working to solve NAT issues with standards.

Occasionally, firewalls can block IPT signaling packets that keep IP phones or gateways from properly communicating with their media servers. Proper firewall ports need to be opened for signaling and if necessary, media traversal. Some firewalls have wizards or other macros that are supposed to allow H.323, for instance to pass, but such configuration shortcuts can lead to unexpected results. Instead of allowing H.323 to pass, some macros actually block the standard ports (usually 1719-1720), so manually configuring the ports for your particular H.323 implementation works best.

Layer 4: Transport Layer

The moral at Layer 4 isn't "what to do" as much as it is "what not to do." Prioritizing using TCP and UDP port numbers is not an efficient way to handle QoS, for three main reasons:

Since multiple applications (i.e. voice, H.323 video) use the same open range of UDP ports, providing priority based on only the port numbers doesn't allow differentiation between other multimedia applications.

2. Since Access Control Lists will inspect each packet (voice and data) all the way up the OSI stack to Layer 4, the delay and router processor utilization will increase.

3. UDP port ranges can be set in the media server, governing the number of ports that are to be used by the entire voice system. In routers, matching port numbers would need to be edited in the ACL for proper inspection and prioritization. If this range changes, the ACL has to be edited to reflect the change. .

Therefore, it is more efficient to use Differentiated Services as opposed to RTP port numbers. While RTP/UDP port numbers have been seen to work on small-scale deployments, most network managers that employ them end up migrating to DiffServ as they scale up their multimedia deployments.

Layers 5-7 (generally): The application Layers
The upper Layers of the stack usually involve specific telephony system configurations. For instance, Call Admission Control, as mentioned earlier, should be implemented to keep too many calls from overloading a network link. While each system has its own features and engineering requirements, there are some commonalities worth mentioning.

Codec selection Codecs should be carefully selected to balance the tradeoff between the need for call quality and the need to save bandwidth. When calculating bandwidth utilization, keep in mind the overhead of the lower Layers, which can add a significant amount to the base compression of the codec.

Consider using the lowest-bit rate codec during trials in the initial rollout. Even if there is enough bandwidth on a MAN or a WAN to support full-quality G.711 now (64KBPS base rate per call, about 80KBPS on a wire), you may not have the luxury of adequate bandwidth in the future. Instead of implementing G.711 at first and then possibly annoying users later as the codec is scaled back to G.729 to save bandwidth, implementing G.729 at first will set the worst-case expectation from the onset.

Silence suppression This bandwidth-saving algorithm, which keeps packets from being sent during the silent periods of a conversation, should be tested for its acceptability with users. While you can reduce the bandwidth used by 50% or more, the tradeoff may be poorer voice quality. In general, silence suppression can lead to clipping of the first (and less often, last) syllables of a speech burst. Phone and gateway DSPs simply cannot sense when someone is about to start speaking, and take milliseconds to react. Using one of the low bit rate codecs that perform relatively well in noise, such as G.726, rather than silence suppression, is more effective at saving bandwidth without sacrificing quality.

Quality monitoring systems The most common flaw in IP telephony deployments to date is that many systems are deployed without the appropriate tools to manage and monitor this demanding application. Even if the system is deployed flawlessly and users are pleased with the voice quality and stability, there are likely to be issues over time. When a user calls the helpdesk to report a voice quality problem, the helpdesk should have training and tools to validate the problem, then triage it to either the voice or the data network experts. Tools like Real-Time Control Protocol analyzers can be invaluable. These tools take updates from phones and gateways in real time, and can plot the key indicators of VoIP quality (delay, packet loss, and jitter) versus time. Such tools objectively qualify whether or not a user complaint about voice quality is related to the network or not. If the user is on their phone with the help desk at the time they' re experiencing quality issues, the help desk should be able to view the RTCP stats to see if the issue is due to the network. RTCP operation is outlined in Figure 9.

To fully understand how to read the data, use the analyzers in a lab environment first. Use a delay simulator like NIST Net and watch the graphs move as you inject more delay into the lab. Most telephony systems are SNMP compatible and should be consistently monitored for availability and performance. Continually pinging the interfaces for availability is not recommended. As importantly, the network topology itself should be mapped and monitored. Then, if all phones at a remote site are reported down, for instance, helpdesk personnel can check if a WAN link itself is down, or if the LAN switch that interfaces the telephony servers has crashed. After validating that the data network looks good, they can triage the issue to the voice team.

On a final, non-technical note, it is the process that is built for troubleshooting IPT problems that is just as important as the tools used. While it is just another application, voice is the most demanding application that most networks have carried to date.

Summary While there may appear to be a plethora of potential pitfalls to an IPT deployment, be assured that this comprehensive list has been developed over hundreds of deployments in as many unique network situations. Most implementations go very smoothly when following these general steps:

Assess, simplify, and optimize the network. Eliminate unnecessary complexity. The less protocols running and the fewer hops the better. Nail the speed and duplex settings, and set the optimal root and secondary bridges of spanning trees.

Develop a Quality of Service strategy, considering not only how to prioritize voice, but how critical data applications may behave when they' re sent to the back of the queue. Involve the business and get buy-in before "de-prioritizing" the applications that make your company money.

Simulate a full load of calls across the network during the busy hour of the day. This should give you a worst case scenario. Monitor the quality of the simulated calls and the performance of the data applications. If necessary, make changes to optimize voice or data performance.

Pilot the system with an IP phone next to the server/gateway, and use a sniffer to baseline proper signaling and voice traffic. Get familiar with the performance management tools.

Set user expectations for quality, and explain the benefits of the IP telephony deployment. Typically the most difficult problems aren't technical, and when some users see a new phone on their desk they' ll likely be critical of it. Making sure that your users accept change for the good of the company will make case management easier.

Change the IT support processes to incorporate voice troubleshooting. Train helpdesk reps to qualify and triage trouble to the appropriate expert.

Then, deploy the full production system…. and start enjoying the benefits of convergence!

About the author:Christian Stegh is currently Avaya's IP Telephony Practice Leader for the North American Region. Among other responsibilities, he acts as a liaison between Avaya's customers/sales teams and Avaya's development teams, influencing direction of Avaya solutions based on customer input. Prior, he was a Managing Consultant within Avaya Global Services, designing, optimizing, and implementing hundreds of converged networks for customers. He began his IT career as a network engineer for a Fortune 200 manufacturing firm, where he managed worldwide L2/L3 networks. His interests not only include multimedia network performance, but also SIP, converged security, and business continuity. Feel free to provide your feedback to him.

Start the conversation

0 comments

Register

I agree to TechTarget’s Terms of Use, Privacy Policy, and the transfer of my information to the United States for processing to provide me with relevant information as described in our Privacy Policy.

Please check the box if you want to proceed.

I agree to my information being processed by TechTarget and its Partners to contact me via phone, email, or other means regarding information relevant to my professional interests. I may unsubscribe at any time.