The Voice over IP for the Cisco 3600 Series Software Configuration Guide shows you how to configure your Cisco 3600 series router to support voice transmission. Cisco's voice support is implemented using voice packet technology. In voice packet technology, voice signals are packetized and transported in compliance with ITU-T specification H.323, which is the ITU-T specification for transmitting multimedia (voice, video, and data) across a local-area network.

The Software Configuration Guide Overview describes the chapter contents in the Voice over IP for the Cisco 3600 Series. The Voice Primer section provides supplementary information for those users unfamiliar with voice telephony.

Voice over IP enables a Cisco 3600 series router to carry voice traffic (for example, telephone calls and faxes) over an IP network. In Voice over IP, the DSP segments the voice signal into frames, which are then coupled in groups of two and stored in voice packets. These voice packets are transported using IP in compliance with ITU-T specification H.323. Because it is a delay-sensitive application, you need to have a well-engineered network end-to-end to successfully use Voice over IP. Fine-tuning your network to adequately support Voice over IP involves a series of protocols and features geared toward quality of service (QoS). Traffic shaping considerations must be taken into account to ensure the reliability of the voice connection.

Voice over IP is primarily a software feature; however, to use this feature on a Cisco 3600 series router, you must install a Voice Network Module (VNM). The VNM can hold either two or four Voice Interface Cards (VIC), each of which is specific to a particular signaling type associated with a voice port.

The Voice over IP for the Cisco 3600 Series Configuration Guide is divided into three parts:

Configure Frame Relay for Voice over IPDiscusses the issues around transporting packetized voice over Frame Relay.

Configure Number ExpansionDescribes how to configure number expansion if your telephone network is configured so that you can reach a destination by dialing only a portion (an extension number) of the full E.164 telephone number.

Configure Dial PeersDescribes how to configure POTS and VoIP dial peers. Each dial peer defines the characteristics associated with a particular call leg.

The key to understanding Cisco's voice implementation is to understand the use of dial peers. Dial peers describe the entities to and/or from which a call is established. All of the voice technologies use dial peers to define the characteristics associated with a call leg. A call leg is a discrete segment of a call connection that lies between two points in the connection, as shown in Figure 1 and Figure 2. An end-to-end call is comprised of four call legs, two from the perspective of the source router as shown in Figure 1, and two from the perspective of the destination router as shown in Figure 2. You use dial peers to apply specific attributes to call legs and to identify call origin and destination. Attributes applied to a call leg include Quality of Service (QoS), compression/decompression (CODEC), Voice Activation Detection (VAD), and fax rate.

There are basically two different kinds of dial peers with each voice implementation:

POTSDial peer describing the characteristics of a traditional telephony network connection. POTS peers point to a particular voice port on a voice network device.

When configuring POTS dial peers, the key commands that must be configured are the port and destination-pattern commands. The destination-pattern command defines the telephone number associated with the POTS dial peer. The port command associates the POTS dial peer with a specific logical dial interface, normally the voice port connecting the Cisco 3600 series to the local POTS network.

When configuring Voice over IP on the Cisco 3600 series, direct inward dial can be configured on a POTS dial peer. In this case, the key commands that must be configured are the destination-pattern and direct-inward-dial commands.

Voice-NetworkDial peer describing the characteristics of a packet network connection; for example, in the case of Voice over IP, this is an IP network. Voice-network peers point to specific voice-network devices.

When configuring voice-network dial peers, the key commands that must be configured are the destination-pattern and session-target commands. The destination-pattern command defines the telephone number associated with the voice-network dial peer. The session-target command specifies a destination address for the voice-network peer.

Voice port commands for both the Cisco 3600 series define the characteristics associated with a particular voice-port signaling type. Voice ports for both the Cisco 3600 series routers provide support for three basic voice signaling formats:

FXOForeign Exchange Office interface. The FXO interface is an RJ-11 connector that allows a connection to be directed at the public switched telephone network's (PSTN's) central office (or to a standard PBX interface, if the local telecommunications authority permits). This interface is of value for off-premise extension applications.

E&MThe "Ear and Mouth" interface (or "RecEive and TransMit" interface) is an RJ-48 connector that allows connection for PBX trunk lines (tie lines). It is a signaling technique for 2-wire and 4-wire telephone and trunk interfaces.

The Cisco 3600 series currently provides only analog voice ports for its implementation of Voice over IP. The type of signaling associated with these analog voice ports depends on the interface module installed into the device.

To understand Cisco's voice implementations, it helps to have some understanding of analog and digital transmission and signaling. This section provides some very basic, abbreviated voice telephony information as background to help you configure Voice over IP, Voice over Frame Relay, Voice over ATM, and Voice over HDLC and includes the following topics:

The standard PSTN is basically a large, circuit-switched network. It uses a specific numbering scheme, which complies to the ITU-T E.164 recommendations. For example, in North America, the North American Numbering Plan (NANP) is used, which consists of an area code, an office code, and a station code. Area codes are assigned geographically, office codes are assigned to specific switches, and station codes identify a specific port on that switch. The format in North America is 1Nxx-Nxx-xxxx, with N = digits 2 through 9 and x = digits 0 through 9. Internationally, each country is assigned a one- to three-digit country code; the country's dialing plan follows the country code. In Cisco's voice implementations, numbering schemes are configured using the destination-pattern command.

Until recently, the telephone network was based on an analog infrastructure. Analog transmission is not particularly robust or efficient at recovering from line noise. Because analog signals degrade over distance, they need to be periodically amplified; this amplification boosts both the voice signal and ambient line noise, resulting in degradation of the quality of the transmitted sound.

In response to the limitations of analog transmission, the telephony network migrated to digital transmission using pulse code modulation (PCM) or adaptive differential pulse code modulation (ADPCM). In both cases, analog sound is converted into digital form by sampling the analog sound 8000 times per second and converting each sample into a numeric code.

PCM and ADPCM are examples of "waveform" CODEC techniques. Waveform CODECs are compression techniques that exploit the redundant characteristics of the waveform itself. In addition to waveform CODECs, there are source CODECs that compress speech by sending only simplified parametric information about voice transmission; these CODECs require less bandwidth. Source CODECs include linear predicative coding (LPC), code-excited linear prediction (CELP) and multi-pulse, multi-level quantization (MP-MLQ).

Coding techniques are standardized by the ITU-T in its G-series recommendations. The most popular coding standards for telephony and voice packet are:

G.711Describes the 64-kbps PCM voice coding technique. In G.711, encoded voice is already in the correct format for digital voice delivery in the public switched telephone network (PSTN) or through PBXes.

G.723.1Describes a compression technique that can be used for compressing speech or audio signal components at a very low bit rate as part of the H.324 family of standards. This CODEC has two bit rates associated with it: 5.3 and 6.3 kbps. The higher bit rate is based on ML-MLQ technology and provides a somewhat higher quality of sound. The lower bit rate is based on CELP and provides system designers with additional flexibility.

G.726Describes ADPCM coding at 40, 32, 24, and 16 kbps. ADPCM-encoded voice can be interchanged between packet voice, PSTN, and PBX networks if the PBX networks are configured to support ADPCM.

G.728Describes a 16-kbps low-delay variation of CELP voice compression. CELP voice coding must be translated into a public telephony format for delivery to or through the PSTN.

G.729Describes CELP compression where voice is coded into 8-kbps streams. There are two variations of this standard (G.729 and G.729 Annex A) that differ mainly in computational complexity; both provide speech quality similar to 32-kbps ADPCM.

In Cisco's voice implementations, compression schemes are configured using the codec command.

Each CODEC provides a certain quality of speech. The quality of transmitted speech is a subjective response of the listener. A common benchmark used to determine the quality of sound produced by specific CODECs is the mean opinion score (MOS). With MOS, a wide range of listeners judge the quality of a voice sample (corresponding to a particular CODEC) on a scale of 1 (bad) to 5 (excellent). The scores are averaged to provide the mean opinion score for that sample. Table 1 shows the relationship between CODECs and MOS scores.

Although it might seem logical from a financial standpoint to convert all calls to low-bit rate CODECs to save on infrastructure costs, you should exercise additional care when designing voice networks with low-bit rate compression. There are drawbacks to compressing voice. One of the main drawbacks is signal distortion due to multiple encodings (called tandem encodings). For example, when a G.729 voice signal is tandem encoded three times, the MOS score drops from 3.92 (very good) to 2.68 (unacceptable). Another drawback is CODEC-induced delay with low bit-rate CODECs.

One of the most important design considerations in implementing voice is minimizing one-way, end-to-end delay. Voice traffic is real-time traffic; if there is too long a delay in voice packet delivery, speech will be unrecognizable. Delay is inherent in voice-networking and is caused by a number of different factors. An acceptable delay is less than 200 milliseconds.

There are basically two kinds of delay inherent in today's telephony networks: propagation delay and handling delay. Propagation delay is caused by the characteristics of the speed of light traveling via a fiberoptic-based or copper-based media. Handling delay (sometimes called serialization delay) is caused by the devices that handle voice information. Handling delays have a significant impact on voice quality in a packetized network.

CODEC-induced delays are considered a handling delay. Table 2 shows the delay introduced by different CODECs.

Another handling delay is the time it takes to generate a voice packet. In Voice over IP, the DSP generates a frame every 10 milliseconds. Two of these frames are then placed within one voice packet; the packet delay is therefore 20 milliseconds.

Another source of handling delay is the time it takes to move the packet to the output queue. Cisco IOS software expedites the process of determining packet destination and getting the packet to the output queue. The actual delay at the output queue is another source of handling delay and should be kept to under 10 milliseconds whenever possible by using whatever queuing methods are optimal for your network. Output queue delays are a quality of service (QoS) issue in Voice over IP for the Cisco 3600 series and discussed in the "Configure IP Networks for Real-Time Voice Traffic" section.

In Voice over Frame Relay, you need to make sure that voice traffic is not crowded out by data traffic. Strategies on how to manage Voice over Frame Relay voice traffic are discussed in "Configuring Voice over Frame Relay."

Jitter is another factor that affects delay. Jitter occurs when there is a variation between when a voice packet is expected to be received and when it actually is received, causing a discontinuity in the real-time voice stream. Voice devices such as the Cisco 3600 series and the Cisco MC3810 compensate for jitter by setting up a playout buffer to playback voice in a smooth fashion. Playout control is handled through RTP encapsulation, either by selecting adaptive or non-adaptive playout-delay mode. In either mode, the default value for nominal delay is sufficient.

Figuring out the end-to-end delay is not difficult if you know the end-to-end signal paths/data paths, the CODEC, and the payload size of the packets. Adding the delays from the end points to the CODECs at both ends, the encoder delay (which is 5 milliseconds for G.711 and G.726 CODECs and 10 milliseconds G.729 CODEC), the packetization delay, and the fixed portion of the network delay yields the end-to-end delay for the connection.

Echo is hearing your own voice in the telephone receiver while you are talking. When timed properly, echo is reassuring to the speaker; if the echo exceeds approximately 25 milliseconds, it can be distracting and cause breaks in the conversation. In a traditional telephony network, echo is normally caused by a mismatch in impedance from the 4-wire network switch conversion to the 2-wire local loop and controlled by echo cancellers. In voice packet-based networks, echo cancellers are built into the low-bit rate CODECs and are operated on each DSP. Echo cancellers are limited by design by the total amount of time they will wait for the reflected speech to be received, which is known as an echo trail. The echo trail is normally 32 milliseconds.

In Cisco's voice implementations, echo cancellers are enabled using the echo-cancel enable command. The echo trails configured using the echo-cancel-coverage command. For example, Voice over IP has configurable echo trails of 16, 24, and 32 milliseconds.

Although there are various types of signaling used in telecommunications today, this document describes only those with direct applicability to Cisco's voice implementations. The first one involves access signaling, which determines when a line has gone off-hook or on-hook (in other words, dial tone). FXO and FXS are types of access signaling. There are two common methods of providing this basic signal:

Loop start is the most common technique for access signaling in a standard PSTN end-loop network. When a handset is picked-up (goes off-hook), this action closes the circuit that draws current from the telephone company's central office (CO), indicating a change in status. This change is status signals the CO to provide dial tone. An incoming call is signalled from the CO to the handset by sending a signal in a standard on/off pattern, which causes the telephone to ring.

Ground start is another access signaling method used to indicate on-hook/off-hook status to the CO but this signaling method is primarily used on trunk lines or tie-lines between PBXs. Ground start signaling works by using ground and current detectors. This allows the network to indicate off-hook or seizure of an incoming calls independent of the ringing signal.

In Cisco's voice implementations, access signaling is configured using the signal command.

Another signaling technique used mainly between PBXes or other network-to-network telephony switches is known as E&M. There are five types of E&M signaling, as well as two different wiring methods. Cisco's voice implementation supports E&M types I, II, III, and V, using both 2-wire and 4-wire implementations. In Cisco's voice implementations, E&M signal types are configured using the type command.