A distributed network performance framework

It is one thing to blindly push packets onto the Internet, and quite another to monitor, record, and act on packet performance. For instance, say that the available network bandwidth is of interest and you would like to act on a sudden constriction of a connection's available bandwidth. Identifying a sustained drop in bandwidth would allow you to terminate lower priority streaming video connections or transfer the connection to a different server. In this article, I present a simple framework that can support this behavior and more. It is flexible enough in utility that the framework is analogous to a network Swiss army knife. It can fit into your network to aid in the diagnosing and resolution of network events as they occur.

This framework, which I call "PerfScout," is based on a message-queuing system where messages are dispatched between different blocks of responsibility. Each block operates independently and dispatches messages in a fire and forget manner. What's more, PerfScout framework is fairly OS agnostic and can be recompiled to a large number of supported operating systems with minimal work (over 32 different flavors including Mac OS X, most UNIX variants, Windows, plus others). The complete source code for PerfScout is available electronically; see "Resource Center," page 3.

Knowing Your Performance Needs

Say that your digital video processing site has certain expectations on the quality of your network connection to specific destinations. And suppose that bandwidth per connection is an important quantity for customers trying to upload videos to your web site for processing. Low bandwidth would tend to indicate that a customer would give up trying to upload datawhich translates to a lost opportunity. That is one scenario; other important network performance scenarios could include Voice over IP and latency, streaming video and the available bandwidth, file copying and the packet loss, and router congestion and low-priority data transmission.

In situations such as these, you would want to monitor network performance and act on poorly performing connections. Actions that might be appropriate for poor connections would be to move the client to less loaded servers, to limit the number of connections to servers, delay in sending data, notifying network engineers or automated route control systems, or to simply record your network's performance. Heck, recording your network's performance may even be a legal requirement (or you may also want to independently verify your IPS's records).

PerfScout allows for extensions in three areas:

Sources. How data destinations of interest are to be identified.

Tools. What network performance is measured.

Actions. How to respond to network events.

PerfScout is built on a framework that provides for concurrent processing at each stage of the pipeline with these capabilities. The provided implementation contains two types of network performance tests: network bandwidth estimator and a latency tool. Other sources, tools, and actions can easily be plugged in to provide a more complete picture of your network's connections.

Efficient Processing of Results

A key aspect of the PerfScout design is its flexibility. This model can be split into a distributed set of components. PerfScout encapsulates communication between each block of work and allows the engineer to focus on just two details:

How to efficiently manage work that is requested of me.

How to perform the work that is being requested in a timely manner.

Figure 1 shows a processing block layout of PerfScout in which there are three basic blocks in this design. The "glue" between each block is the messaging framework that provides the conduit for the dispatching and reception of messages. This glue can be upgraded for various distributed implementations (such as TCP/UDP communication or interprocess communication) and is highly portable (through the use of the ACE library; see the accompanying text box entitled "The ACE Library"). It certainly is possible to mix transport mechanisms between these blocks. The relative order of processing is from left to right.

A fully distributed PerfScout would be pretty neat. Potential uses could be:

Coordination of multiple streaming servers at various locations.

Notifications of users accessing streams.

Shutdown of streams if bandwidth limitations are reached.

Figure 2 shows a distributed PerfScout ensconced in a hypothetical network where the web server dispatches destinations that are of interest (in this case, newly connected clients) to be measured. Finally, these results are recorded in a database, and an action request (such as dropping low-priority clients when bandwidth drops) sent back to the web server.

PerfScout pipelines messages through each stage of processing (ownership of a message is the responsibility of the input queue's owner). Each stage must manage its input queue of work intelligently and in a manner that avoids queue overflow. Relying on this messaging interface cleanly encapsulates the specific processing details for each component, and concretely defines the input/output relationship for each processing block. Correct management of each block's input queue results in a highly scalable framework. It is worth noting that this type of design is used in several commercial products, such as Microsoft's Message Queue Center (MSMQ) or IBM's MQSeries. For our needs, these canned messaging systems are a bit of an overkill. For the task at hand, what is needed is a thin, streamlined implementation that provides a highly flexible and efficient implementation that is portable across a wide variety of platforms.

The job of software designers within this framework is to optimize the processing of messages within each processing block. This work is generally algorithm specific and encapsulated within the component, but involves specific workflow rules. This design scales well and lets you add more processing requirements at any step, provided that queue sizes are managed in a timely manner (so that queue overflow is avoided). Finally, it is worth noting that PerfScout can support additional components without requiring a recompilation of code (with some small modifications to the current designperhaps a dynamically loading processing block).

PerfScout Design

Figure 3 shows the object diagram for PerfScout (the Source, Tool, and Action blocks are shown here in relation to Figure 1). The Tool and Action blocks require a manager to effectively handle incoming messages. These messages are subsequently dispatched internally for processing (the Source processing block has no input queue and, therefore, no managersources operate independently with no coordination). Operations within the Tool and Action blocks are coordinated through their respective Managers. These blocks contain processing clients that perform the real work within PerfScout. These clients derive from their respective base classes (ToolBase and ActionBase) and provide a common interface for initialization and communication.

Each entity that communicates using a message queue within PerfScout derives from the Task base class (Listing One). Task, in turn, derives from the ACE base class Task_Base. The Task class is a simple wrapper around the ACE library that lets PerfScout support a message notification system. Task provides two basic services: a worker thread and a message queue. Task::put() and Task::get() are used to access the message queue, while the run() method services the worker thread. Task::put() and Task::get() are mutexed to prevent concurrent access, and Task::get() blocks when the message queue is empty. Exiting Task::run() terminates execution of the thread, which should not occur unless a Task-derived object is to be restarted or shutdown. Task is templatized on the message type, which, for PerfScout, is always the Test object. Derived classes are required to implement their own ::run() method. Pretty straightforward stuff. Any modifications specific to the message queue behavior, or transport of messages, needs only to be done within the Task base class.

The common communication token dispatched between objects that derive from the Task base class is the Test object (Listing Two). The Test object contains data used to identify target destination, requested tests, current active test, time of test request and collection, and finally the test results (in the Result object). The current active test, m_curTest, is required so that requested test types can be performed sequentially on a Test object within the Tool Manager. The accessor ::getActiveTest() returns the value of m_curTest. Encapsulating the current test means that the Tool Manager doesn't need to track progress of individual Tests. When all tests are complete, the Test object will return ::kEnd via Test::getActiveTest() signaling the completion of processing for a specific Test object. The Test object wraps an additional objectResult. The Result object maps tests to their results. The Result object is contained as a map collection within the Test object and is templatized on the data stored.

There are many ways to handle input queue processing. The initial implementation of PerfScout uses a rather straightforward (or simple) approach where the maximum size of the queue is defined at instantiationany puts into queues that exceed this limit are silently dropped. Therefore, all objects that derive from Task need to manage their queues in a quick and efficient manner, ensuring that messages do not back up. The Tool Manager and Action Manager dispatch messages to their clients quickly; therefore, the real work at the client level is to efficiently process these tests within the queue of each client.

There are two source implementations included within PerfScout: TestSource dispatches a fixed set of tests at five minute intervals and contains the location and types of tests to be performed, and ExternalSource contains a UDP connection that lets requests be remotely dispatched to this component. Other sources can obviously be built and inserted at this level, such as integration with a web server, router, route-control device, and so on.

The second processing block, known as the "tool block," contains the ToolManager (Listing Three) that is derived from the Task base class. Incoming Test requests are received by the ToolManager and dispatched to the tool clients contained within the Tool block. These clients derive from ToolBase, which provides a basic framework to perform specific network tests. The run() method in ToolManager receives messages and dispatches these requests to the corresponding tool as specified by the Test object. There is a completion thread as part of the ToolManager encapsulated within ToolManager (called the CompleteTest class). The CompleteTest::run() method receives completed tests and either circulates the Test to the next test or pushes out the finished tests to the next processing block (the Action Block). The determination of when a Test has finished processing within this block is determined by the Test object itself. For tool clients, it often makes sense to allow for processing of multiple tests to occur simultaneously across multiple worker threads, as is the case for the two provided tool implementations.

Now to the final processing block, the Action block, which contains the ActionManager. Test results are interpreted here and actions can be performed based on these results. The ActionManager receives a single Test message containing test results for a specific destination. In this block, there again can be multiple clients as in the Tool block. Each client within the ActionManager block receives a copy of the Test object as in method ActionManager::run(). In PerfScout, there are two client implementations: One is a notification action to drop poorly performing connections, and the other is an interface to writing the data to a SQL database. All kinds of useful action modules are possible at this level, such as a module that shutdowns a stream if the bandwidth reaches a certain lower limit, or an e-mail notification system on certain network events, or a recording/reporting module that lets you capture network health over a long period of time, and so on.

Bandwidth Detection and Latency

Two tool clients are provided within PerfScouta bandwidth detection tool and a latency tool. Latency is a fairly straightforward quantity. Latency measures the roundtrip delay in the transport of the packet. There are several ways to capture latency. This tool uses UDP probes to compute the latency. This approach means that latency can only be captured from hosts that respond to TTL-expired packets (other techniques such as http SYN/ACK or icmp probes can avoid this limitation, but have other limitations in their use). Latency captures the round-trip packet travel time. Latency is important to applications that are sensitive to delays such as voice over IP.

It is also worth noting that passively collected data such as an http SYN/ACK latency retrieved from a web page with a 1 pixel gif would mean that the Source processing block would effectively collect the test data, which would then push the Test data directly to the Action processing block, bypassing the Tools block.

Available bandwidth is not a direct quantity that can be measured but must be inferred from at least two different latency measurements. The algorithm used in this implementation is a rather simple algorithm used to compute bandwidthremember, it is a simple matter to plug in different versions as they are developed.

Bandwidth is inferred from latency values received for packets of differing sizesthe theory is that a larger packet will encounter more latency due to longer send/receive queues at the Layer 2 level (Ethernet)where the original data packet is broken down into smaller chunks for transmission over the wire. If you were to draw a linear relationship between two measurements of differing packet sizes, then this equation can be used: Bandwidth=(lat2-lat1)/(bytes2-bytes1) (bytes/ms), where lat2 and lat1 are the results of two latency measurements, and bytes2 and bytes1 are the results of respective byte counts of packets used in the latency measurements. For the PerfScout implementation, the first packet is 40 bytes and the second packet is 1300 bytes in size. The assumption of a linear relationship between latency and bandwidth is not always true for various reasons such as router input queues, congestion, and so on. Therefore, multiple tests help identify and remove outliers in computing this value.

Conclusion

PerfScout can be customized and expanded in many directions, as suits the needs of your network. New tools, sources, and actions can be added and processing blocks can be redistributed across your network. A flexible monitoring tool is a must have for performance critical networks. PerfScout can fulfill this role by providing a customizable framework that eases integration into a network environment. Once integrated PerfScout can monitor, record, and act on events within your network.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!