Abstract Source-level Modeling

Abstract source-level modeling provides a method to describe the
workload of a TCP connection at the source level in a manner than
is not tied to the specifics of individual applications.
The starting point of this method is the observation that at the transport
level, a TCP endpoint is doing nothing more than sending and receiving data.
Each application (i.e., web browsing, file sharing, etc.) employs its own set
of data units for carrying application-level control messages,
files, and other information.
The actual meaning of the data is irrelevant to TCP,
which is only responsible for delivering data in a reliable,
ordered, and congestion-responsive manner.
As a consequence, we can describe the workload of TCP in terms of the
demands by upper layers of the protocol stack
for sending and receiving Application Data Units (ADUs).
This workload characterization captures only the sizes of the units of data
that TCP is responsible for delivering,
and abstracts away the details of each application (e.g., the meaning of its ADUs,
the size of the socket reads and writes, etc.).
The approach makes it feasible to model the entire range of
TCP workloads, and not just those that derive from a few well-understood
applications as is the case today.
This provides a way to overcome the inherent scalability problem of
application-level modeling.

While the work of a TCP endpoint is to
send and receive data units, its lifetime
is not only dictated by the time these operations take, but also by quiet
times in which the TCP connection remains idle, waiting for upper layers
to make new demands.
TCP is only affected by the duration of these periods of inactivity
and not by the cause of these quiet times, which depends on the dynamics
of each application (e.g., waiting for user input, processing a file, etc.).
Longer lifetimes have an important impact, since the endpoint
resources needed to handle TCP state must remain reserved
for a longer period of time3.1.
Furthermore, the window mechanism in TCP tends to aggregate
the data of those ADUs that are sent within a short period of time, reducing
the number of segments that have to travel from source to destination.
This is only possible when TCP receives a number of back-to-back requests to
send data. If these requests are separated by significant quiet times,
no aggregation occurs and the data is sent using
at least as many segments as ADUs.

We have formalized these ideas into the a-b-t model, which
describes TCP connections as sets of ADU exchanges and quiet times.
The term a-b-t is descriptive of the basic building blocks of this model:
a-type ADUs ('s),
which are sent from the connection initiator to
the connection acceptor, b-type ADUs ('s),
which flow in the opposite direction, and quiet times ('s), during which no
data segments are exchanged.
We will make use of these terms to describe the source-level behavior of
TCP connections throughout this dissertation.
The a-b-t model has two different flavors depending on whether ADU
interleaving is sequential or concurrent.
The sequential a-b-t model
is used for modeling connections in which only one
ADU is being sent from one endpoint to the other at any given point in time.
This means that the two endpoints engage in an orderly conversation in which
one endpoint will not send a new ADU until it has completely received
the previous ADU from the other endpoint. On the contrary,
the concurrent a-b-t model
is used for modeling connections in which both
endpoints send and receive ADUs simultaneously.

The a-b-t model not only provides a reasonable description of the workload of
TCP at the source-level, but it is also simple enough to be populated from
measurement. Control data contained in TCP headers provide enough
information to determine the number and sizes of the ADUs in a TCP connection
and the durations of the quiet times between these ADUs.
This makes it possible to convert an arbitrary trace of
segment headers into a set of a-b-t connection vectors, in which each
vector describes one of the TCP connections in the trace.
As long as this process is accurate,
this approach provides realistic characterizations of TCP workloads, in the
sense that they can be empirically derived from measurements of real Internet
links.

In this chapter, we describe the a-b-t model and its two flavors in detail.
For each flavor, we first discuss a number of sample connections that
illustrate the power of the a-b-t model to describe TCP connections driven by
different applications, and point out some limitations of this approach.
We then present a set of techniques for analyzing segment headers
in order to construct a-b-t connection vectors and provide
a validation of these techniques using traces from synthetic applications.
We finally examine the characteristics of a set of real traces from the point
of view of the a-b-t model, providing a source-level view of the workload of TCP.