Testing Challenges in Packet Telephony

Bill Douskalis offers a perspective on the issues facing deployment and testing engineers before a network infrastructure supporting IP telephony can be declared functional. Testing challenges of real-time situations are discussed as well.

From the author of

Testing communications equipment and networks has never been an easy task.
Even a moderately complex topology of a data network can result in myriads of
permutations of test scenarios between the protocols and the applications that
they serve. To appreciate the level of complexities involved in the general
case, you must start from the bottom of the protocol stacks supported by the
network and move upward, toward the applications themselves. A network topology
can consist of switches, routers, and Integrated Access Devices (IADs). In turn,
those devices can employ OC-n, SONET, and Ethernet interfaces of various
flavors, in addition to physical access interfaces such as DSL, cable, or simple
twisted-pair wire. The permutations of protocols at layers 2 and 3 make the test
scenarios exciting, and the applications complete the scene with their own
requirements and quirks in using the underlying protocols.

The tricky part is identifying and enumerating the test scenarios that must
be executed to ensure complete coverage of functionality and performance
testing. In functionality testing, we must make sure that we include failover
(rainy-day) scenarios; in performance testing, we must ascertain that the
system's performance characteristics remain invariant when failover
conditions occur.

It is, however, a tedious but necessary process to enumerate all the
possible operating conditions of a system that must be reached before we
"hit" it with a failover scenario. What does this mean? Consider this
example: A system in the process of completing a single call while no other
calls are active will most likely not behave the same way as a system with
hundreds or thousands of stable calls that is also in the process of completing
a few dozen calls when the failure occurs and that has a switch that must
undertake recovery procedures. Simply speaking, you must make sure that the
"state" of the switch is preserved and that it continues to operate
"normally," depending on how the product specifications define
"normally" under a failure scenario.

Designers of switching systems will always be looking for newer and better
test tools to ascertain complete conformance at the functional and performance
levels of any product before it leaves the lab. Functional and design problems
usually do not manifest themselves as hardware issues in the system test phase,
except in cases of VLSI functionality, which might not have been fully tested in
the unit test phase. Life gets interesting when platform and architectural
issues surface in integration testing:

The wrong CPU platform was picked to meet the specified performance in
calls per second and call capacity.

A distributed switch architecture reaches a plateau in performance and
stops scaling.

The custom VLSI devices either have bugs or are not up to par with
specifications.

Prevention of such issues is the best defense, through elaborate diligence of
the system in the design phase and thorough testing in unit testing through a
comprehensive test suite that spans all aspects of the design except those that
need other modules to ensure complete system functionality. Architectural
problems can be virtually impossible to solve after the fact and can doom a
system while it is still in the lab. Diligence of an architecture is very hard
to do while still on paper or in someone's mind. That's why it is a
good idea to construct detailed scenarios to drill into the operation of the
system before pen meets paper to start the design phase. Simulation is a good
approach to get a comfortable feeling at this stage, but simulation needs to be
fed the correct protocol behavior(s), the complete set of protocols that will be
supported, the permutations of protocol interactions that will be encountered
during call setup, the feature call flows, the queuing behavior and
stability of the system under heavy load, and the impact of the underlying
operating system on performance.

The last part, the OS, is tricky to account for. This is because its impact
can be hard to simulate unless there is a good understanding of the
incremental impact of the OS on system performance, such as disk
operations in the middle of call setup, which might or might not be related to
the call setup itself. Often the case in missed performance expectations is the
impact of "other" things that are running on the platform (such as
billing, FCAPS, database ops, and so on), which must be accounted for when
setting the expectations in advance. Therefore, an accurate assessment involves
the complete understanding of everything that is running on the system while it
performs call processing.

Establishing system stability early on in the architecture and design process
is vital because a well-designed system must never crash under traffic
conditions of heavy load or unexpected events. A lot of this footwork can
be done during simulation by feeding excessive traffic into the various parts of
the platform, establishing the continuing system operation, and documenting its
expected behavior. For example, if certain packet-discard policies have been
designed into the system, the only way to ensure that the system is functional
is to cause the conditions that will invoke them. If the requirement is for the
system to accept a 911 call regardless of current system load, then load up the
system with 100% traffic capacity and send a 911 call through it to see what
happens. Successful simulation gives the architect and project managers the
confidence that they need to proceed with the next step: the design phase.

After such in-advance testing has been done, via simulation first and in the
lab later, your customer will get an equally warm feeling about the system.

Unit Testing

In unit testing, in which the various system components are tested for
functionality and performance at an earlyand mostly standalonephase,
it is important to cover as much of the functionality as possible before
proceeding with system integration. For example, if a switch consists of a CPU
platform and a variety of gateways, with all sorts of physical interfaces and
capacity specifications, a thorough test of a single type of interface with all
the supported signaling protocols and media transport methods will reveal
whether the subsequent integration test will be successful and quick or whether
trouble should be expected. Furthermore, if sufficient test resources are
available (such as time and personnel to write simulated call processing scripts
on the real CPU), protocol interworking can be checked out well before
connecting to a gateway, via call arrival and processing over simulated physical
interfaces.

If firewalls are to be used, it wouldn't be a bad idea at this time to
check the viability of the performance specification with a single firewall, a
single physical interface, and as many protocol interworking scenarios as
possible in this stage. If this stage shows weaknesses in a centralized
configuration, system integration in a distributed environment is certain to
reveal even more problems. The rule of thumb is that the more coverage is
achieved in unit test, the fewer headaches will be caused during system
integration.