We use a dumbbell topology of dummynet routers where each end points
consists of a set of Dell Linux servers dedicated to high-speed TCP
variant flows and background traffic. Background traffic is generated
by using a modification of a web-traffic generator, called
Surge[1] and Iperf. We modified the traffic generator to
generate a wider range of flow sizes in order to increase variability
in cross traffic because medium size flows tend to fully execute the
slow start and increase the variability in available bandwidth. The
RTT of each background flow is set based on an exponential
distribution found in [4]. The maximum bandwidth of the
bottleneck router is set to 400 Mbps. The same amount of background
traffic is pushed into forward and backward directions of the
dumbbell. We use the drop-tail router at the bottleneck.

2. Background traffic

We use three types of background traffic with different degrees of
rate variations and congestion. The first two traffic types consume
about 70Mbps when they run without any other flows, representing
congestion-free network conditions. These two traffic types differ
only in the amount of variations in available bandwidth (the first
one is less varying). By varying the distribution of flow sizes, we
vary the CoV of transmission rates which vary the available bandwidth
usable by other competing flows. The figure in above shows the
cumulative rate distribution of the two cross traffic types - one
extremely varying and the other moderately so (the CoVs of available
bandwidth are 0.15 and 0.05 respectively). The third traffic type
emulates a congested network environment which is created by adding a
few tens of long-lived TCP(sack) flows on top of the first traffic
type. In most experiments we run, we use the first traffic type. We
later use the more varying one to see the effect of increased traffic
variability. With background traffic, we run multiple flows of a
protocol being examined from two end points and measure the
performance parameters at the bottleneck router.

3. Is CoV a good metric?

Rate fluctuations influence fluctuations in router queue sizes and
frequent queue overflows. These overflows may cause loss
synchronizations across many co-existing flows and severe
under-utilization of link capacity. When the under-utilization due to
loss synchronization occurs very often, we say that the network is
``unstable''. Therefore, the window adjustment policies of protocols
have a great impact on network stability.

A realistic experiment setting always contains multiple flows of a
protocol being observed and a significant amount of cross traffic
whose rates are varying over time. In this environment, we can
consider two possible metrics for rate variance. One is to consider
aggregate flow statistics -- the CoV of aggregate transmission rate
samples of all the flows of that protocol (CoV-PTA) and the other is
to consider per-flow statistics -- the CoV of per-flow transmission
rate samples (CoV-PTP). Both are measured in the bottleneck router at
a fixed interval of one second.

We first measure the degree of network instability that these metrics
represent. This correlation is important because if rate variance
does not significantly affect network stability (i.e., if they are
good metrics of network stability), then although we measure the
relative ordering in the rate variance of protocols, our theoretical
would not have much practical value.

Our experiments are run in two different network environments: In the
first one, high-speed protocol flows are dominant. We only run the
cross traffic with CoV of available bandwidth 0.05 and different
number of high-speed flows. In this experiment, cross traffic
consumes about 10 to 15% of the total bandwidth. The second
environment contains a large amount of long lived TCP-SACK flows on
top of the same cross traffic as the first one. By only TCP traffic
alone, the network has about 70-90% utilization. In this
environment, we run only two to four flows of high-speed protocols.
This environment represents a congested environment where high-speed
flows are not dominating.

Under moderate congestion

Under heavy congestion

The figures in above plot the correlation
coefficients between rate fluctuations and network stability in the
above two different environments. We measure network stability by
three metrics: link utilization, CoV of queue variations, and CoV of
link utilization variations. The correlation coefficients are
obtained from data collected from over 1500 experimental runs,
irrespective of protocols being tested. In these tests, we vary RTTs,
buffer sizes, and the number of high speed flows. In both
environments, we find that CoV-PTA has a stronger correlation to
network stability than CoV-PTP. When the network traffic is dominated
by high-speed flows, we find more correlations between CoV-PTA and
network stability and also between CoV-PTP and network stability.
When the network is congested, protocol rate fluctuations do not have
many cases of low link utilization. This is the reason why CoV-PTP
and CoV-PTA show lower correlations. But even in this case, we find
that some high-speed protocols cause many loss synchronizations and
thus low utilization. Although weaker than in the uncontested case,
the correlation between CoV-PTA and network stability under
congestion is a lot higher than CoV-PTP.

We explain this phenomenon as follows. In a network environment with
a significant amount of cross traffic, protocol flows are always
adapting themselves to the time-varying available bandwidth. So there
exist always some amount of inherent variations of per-flow
transmission rates due to this adaptation. However, not all these
adaptive variations of protocol rates cause loss synchronization and
fluctuation in the link utilization. This behavior is highly
dependent on the aggressiveness (or stability) of the protocol. As
protocols are more aggressive, these variations are translated into
loss in the utilization more often. Therefore, per-flow statistics do
not have a way to filter out these benign fluctuations and thus may
not faithfully represent the degree of inherent protocol stability.
On the other hand, aggregate flow statistics have this filtering
effect and tend to better capture the rate fluctuations affecting the
global network stability. The high correlation between CoV-PTA and
CoV of link utilization is the strong evidence.

We claim that CoV-PTA is also a more faithful representation of
protocol stability or protocol rate variance that is captured by our
theoretical analysis. Our theory assumes that a protocol
flow does not affect the loss process. In reality, there is always
some amount of self-induced losses by a protocol flow because
high-speed flows tend to have large window sizes. Per-flow
statistics, therefore, factor in this effect. On the other hand,
aggregate flow rate statistics tend to ameliorate it, and represents
the protocol window variance independent of its flows the theoretical
results more faithfully than per-flow statistics. We use CoV-PTA
throughout this paper as the main metric of window size fluctuations
of protocols under ideal conditions.

To illustrate the degree of rate fluctuations,
the figure in above shows the cumulative distribution of
the total number of bits observed in the bottleneck link at each
second for each protocol during an experimental run with 320ms RTT.
The total bits include the bits from all sources including the
background traffic. HTCP has the widest distribution indicating high
fluctuation in utilization over the course of the experiment and BIC
shows the steepest distribution implying very stable performance.