Rate Based Pacing for TCP

Vikram Visweswaraiah and
John Heidemann

Abstract:

TCP's congestion avoidance mechanisms are not tuned for
request-response traffic like HTTP. Prior work on HTTP performance
has shown that enhancements to HTTP (P-HTTP) can result in poorer
performance than expected. This suggests that certain changes may need to
be made to TCP to obtain the expected performance. The increasing use of
the World Wide Web and the use of HTTP in areas
other than the Web require a clearer understanding of the need for
these changes and the problems that would exist without the changes.
One such problem has to do with some TCP implementations
forcing slow-start in the middle of a connection that has been idle
for a certain amount of time, even if there is no packet loss. Other
existing TCP implementations do not treat idle time as a special case
and use the prior value of the congestion window to send data. Both
extremes lead to poor performance of P-HTTP over TCP. This document
describes the motivation and implementation of rate based pacing for
TCP, which provides a good compromise between the two extremes.

The infrastructures for information exchange have evolved rapidly in
the recent past. Changes in application behavior have resulted in
different network dynamics, driving the networking community to
tune the underlying protocols for optimal performance. The World Wide
Web, which uses HTTP, is one such application. The increasing use of
the web and the use of HTTP in applications outside the Web domain
emphasize the need to enhance the performance of HTTP. One such
enhancement, only recently being standardized in HTTP/1.1, is
P-HTTP, an implementation of HTTP which avoids the need
for multiple TCP connections across a transaction to the same
server [1]. However, P-HTTP interacts with current TCP
implementations in a manner that degrades performance
[2]. One of the interactions has to do with TCP's
congestion avoidance mechanisms, which is examined in this document.

We describe the problem, named ``slow-start restart'' and propose a
possible solution. We then describe our implementation of the solution
and discuss the new behavior in contrast with existing TCP
implementations. Finally we describe the current status of our work
and discuss future goals.

TCP is not optimized for multiple request/responses over a single
connection. This is the common case with HTTP/1.1. When a new
request/response occurs after the connection has been idle,
how should TCP on the server behave? Some TCP implementations force
slow-start
again (for example 4.4 BSD and Linux 2.x). Other implementations
(SunOS) do not even detect this idle
time and thus use the old value of the congestion window. The latter
approach can overrun queues at intermediate routers, leading to packet
loss. Though restarting with slow-start avoids this risk, it means
added delay for each time we slow-start and get to steady state.
This can degrade the performance of layers that TCP provides service to, a
strong example being P-HTTP.

Prior work has suggested that this ``slow-start restart'' problem is a
contributor to poor performance of P-HTTP over TCP [2].
One way of solving the problem is to send segments at a certain
``pace'' until we get the ACK clock running again. This pace or rate
should be based on a fraction of prior estimates of data transfer
rate, since that is the closest estimate of available bandwidth that
we have (if we had some magical way of knowing the exact available
bandwidth at the end of the idle time, we could have used that).
We believe that this modification, called Rate Based Pacing
(RBP), will give better performance for the circumstances mentioned
in the problem.

Calculation of the window that we expect to send in RBP and
the timing between segments in that window.

A mechanism that clocks the segments sent in RBP.

Idle time detection is done by some TCP implementations (4.4 BSD,
Linux 2.x). Instead of forcing slow start upon
detection of idle time, we modify the behavior to RBP.
TCP Vegas gives us a method for bandwidth estimation [3].
We borrowed the Vegas port of USC for our implementation
[4]. The RBP window and the timing between segments in that
window are based on functions of the estimated bandwidth and the
RTT.
The segments are clocked by a custom RBP timer which is operational
only for the time that RBP is in effect.

The implementation does not change the default behavior of the
kernel. RBP mode is available on a connection only via the
setsockopt interface. We modified the interface to be able to
select RBP and slow-start restart. Using the setsockopt
interface avoids the need to recompile the kernel to test each case.

The goal of the experiments we conducted was to verify that RBP mode
works. An easy method to do this is
to send a bunch of data from a RBP enabled machine, pause for some
time (greater than the retransmission time-out interval) and send
another bunch of data again. RBP behavior should be observable at the
beginning of the second data transfer phase.

A program capable of doing such data transfer is Stevens'
sock program [5]. To compare RBP behavior with
the other two extremes (slow-start restart and no slow-start restart),
we modified Stevens' sock program to understand
setsockopt options for all three cases. This included adding
a command line switch capable of taking the following values:

TCP_RENO_RESTART:

For slow-start restart.

TCP_RBP:

For rate based pacing.

With an unspecified switch, the behavior is that of SunOS 4.1.3, which
is no slow-start restart.

Now using sock we ran tests from a Sun SPARC
20/71 at ISI West, running the modified SunOS 4.1.3 kernel to a machine on
the east coast (metro). Typical network conditions were 12 hops, 200ms
avg RTT and approximately 32KB/s bandwidth.

The tests send 2 chunks of 512KB to metro with a
pause of 20 seconds in between. metro had sock
running as a sink. This test gives the conditions
we need for the idle time to occur, enabling us to observe
what each flavor of TCP does when a midstream data transfer begins.

Using tcpdump on the sending side and programs to graph
tcpdump output, we were able
to get sequence number vs time plots for each case.

There is no idle time detection and
hence data is dumped all at once using the prior value of the
congestion window. This is clear from Figure 1. In
networks where bandwidth is dynamically allocated, such as the
Internet, this behaviour can be aggressive, leading to router queue
overflows and subsequent packet loss. Such losses will cause
TCP to slow start again, leading to increased delay and low
throughput. This bursty behavior is harmful to other users behind the
same router; if the router enforces drop tail queuing, packets from
other connections will be overwhelmed by the burst.

Flavors like 4.4 BSD and Linux 2.x force
slow-start. Figure 2 illustrates this behavior. Slow
start restart solves the problems associated with sending back to back
packets. Building up from slow start means low chances of suffering
packet loss as against sending a burst and also fairness to other
users. Being conservative is good behavior in the Internet; hence this
option has been adopted by many people. However, slow start restart
adds the extra delay of getting to steady state each time a data
transfer is initiated midstream.

Rate based pacing:

Our implementation's behaviour is shown in Figure
3. Here, we can observe
the initial 5 segments being sent at a certain pace, that we
calculated based on the rate given to us by Vegas. This is an
excellent compromise between the two extremes of dumping segments back
to back and restarting with slow-start. We believe that this
implementation will give much better performance, at least for the
situations mentioned in [2].

Restarting with slow-start in the middle of a connection can lead to
poor performance. At the same time, dumping segments all at once can
mean overrunning intermediate router queues, leading to a drop in
throughput. Rate based pacing gives a good compromise between the
two extremes and solves the slow-start restart problem.

We are currently conducting experiments to examine the impact of RBP
on HTTP throughput.