The Challenge of VoIP Security

Voice over Internet Protocol (VoIP) is about carrying voice calls over an Internet Protocol (IP) network. It results in the digitization and packetization of voice streams. Security practitioners need to better understand VoIP technologies, protocols and standards and develop a policy to address VoIP security to ensure that these technologies are not the “gaps” exploited by hackers.

The Challenge Voice communications must be real-time, so high performance is critical. A delay of a few seconds in data transmission for a VoIP infrastructure renders the system unacceptable to users. Both performance and security are significant challenges that must be addressed in the planning of a VoIP infrastructure.

VoIP Components and Protocols The components of VoIP include call processors/call managers, gateways, routers and firewalls. There are also specialized protocols associated with VoIP, as well as specialized end-user equipment. VoIP systems typically support standards, such as H.323, the session initiation protocol (SIP), as well as media gateway control protocol (MGCP) and Megaco/H.248. Voice packets use real time protocol (RTP).

H.323 H.323 is the International Telecommunication Union (ITU) specification for audio and video communication across packetized networks. This specification includes several protocols, such as H.225, H.245 and others. H.323 is a wrapper for a suite of ITU media control recommendations. Each protocol in the H.323 specification has a specific role in the call setup process. An H.323 network typically includes a gateway and possibly a gatekeeper, multipoint control unit (MCU) and back end service (BES).

The purpose of the gateway is to serve as a bridge between the H.323 network and the external network of non-H.323 devices, such as SIP or traditional PSTN networks. The gateway also supports address resolution and bandwidth control. The MCU is an optional component that facilitates multipoint conferencing and other communications between more than two endpoints. Gatekeepers are also optional, and their main purpose is to optimize network tasks. If a gatekeeper is present, then a BES may exist to support functions, such as maintaining data about endpoints, including permissions, services and configuration.

Almost all H.323 traffic is routed through dynamic ports. This is especially challenging for stateless firewalls that cannot comprehend H.323 traffic. Thus, organizations need to configure a stateful firewall that supports VoIP, especially H.323. Network address translation (NAT) is another serious issue because the external IP address and port specified in the H.323 headers and messages are not the actual IP address and port numbers used internally. Security practitioners will need to make sure that H.323 traffic is read and modified by authorized systems, so that the correct address/port numbers are sent to the endpoints establishing a call connection.

Session Initiation Protocol (SIP) The Session Initiation Protocol (SIP) provides similar functionality to H.323. SIP is specified by the Internet Engineering Task Force (IETF) for initiating a two-way VoIP communication session. SIP is a text-based protocol, while H.323 is based on ASN.1. SIP is an application-layer protocol that can use the services of UDP or TCP.

The SIP network consists of endpoints, a proxy or redirect server, a location server and a registrar. The user initially reports its location to a registrar, which may be integrated into a proxy or redirect server. This information is stored on an external location server. Messages from endpoints are routed through a proxy or redirect server. Redirect servers obtain the actual address of the destination from the location server and return this information to the original sender, which then sends the message directly to the resolved address.

Media Gateway Control Protocol (MGCP) Decomposed VoIP gateways consist of media gateways (MGs) and media gateway controllers (MGCs). They appear to the outside as a single VoIP gateway. MGCP is used to communicate between the separate components of a decomposed VoIP gateway. MGs handle the audio signal translation function, performing conversion between the audio signals carried on telephone circuits and data packets carried over the Internet. The MGC handles the signaling data between the MGs and the other network components, such as the H.323 gatekeeper or the SIP server. A single MGC can control multiple MGs.

Real Time Protocol (RTP) Real Time Protocol (RTP) is used to transport voice packets over the Internet. RTP packets are encapsulated with UDP packets. RTP packets have special fields that hold data needed to correctly re-assemble the packets into a voice signal at the other end.

How Does It Work? With VoIP, the user enters the phone number, and this phone number needs to connect (map) to an IP address. A number of protocols are used to determine the IP address that corresponds to the phone number. Once the call has been established and the party answers, the voice must be converted into a digitized form, resulting in a stream of packets created for transmission. It all starts with the analog voice signals converted into digital using an analog-digital converter.

A compression algorithm is used to reduce the number of bits transmitted because digitized voice generates a large number of bits. The UDP protocol is then used with RTP to transmit voice packets on the Internet. Once the packet reaches the destination, the packets are disassembled and put in the right sequence. The digitized voice data is then extracted from the packets and uncompressed. The digitized voice is processed by a digital-to-analog converter, and the result is an analog signal that is transmitted to the phone system.

VoIP Network Design Separate DHCP servers should be considered for voice and data. The firewall systems that are deployed must be designed for VoIP traffic—through either application level gateways (ALGs) or firewall control proxies. For example, in a SIP-based VoIP network, firewall systems must be stateful and monitor SIP traffic to determine which RTP ports are to be opened and made available to which addresses.

Further, IPSec or Secure Shell (SSH) should be used for remote management and auditing access. Periodically, a detailed analysis of voice and network components should be conducted. This includes a thorough and comprehensive review of voice gateways, remote-access devices, firewalls, intrusion detection systems and routers.

VoIP-Based Firewall Systems Security practitioners need to understand the performance of the firewall system in terms of how fast it can handle VoIP packets. Most VoIP traffic is UDP-based. Since numerous RTP ports (which are dynamic UDP ports) may be open at any time, it is recommended that all PC-based phones be placed behind a stateful firewall to monitor VoIP media traffic. Otherwise, there will be degradation in the quality of service (QoS).

The large number of small RTP packets also impacts the performance of firewall systems in a VoIP environment. The firewall has to inspect each packet. As the number of packets increases, it puts a strain on the firewall CPU. This problem is further compounded by NAT, which introduces significant media traffic control in VoIP networks.

Security practitioners must closely review the firewall architecture and NAT and the impact both have on VoIP QoS. Application-level-gateway types of firewall systems are ideal for VoIP. These firewalls can parse and understand H.323 or SIP and dynamically open and close necessary ports.

Summary Potential increases in productivity, mobility and resilience will lead to more deployments of VoIP networks. The challenge is to ensure that the IP telephony infrastructure is secure and protected from dis