Page Contents

Introduction

Both of these tools make end-to-end active performance measurements of the Internet.
As can be seen below, they should
be regarded as complementary to one another,
Surveyor being more detailed and PingER being more lightweight.

The
Surveyor
(and RIPE)
monitoring project relies on a dedicated
PC running Unix to be placed at each monitoring site. Each PC in turn relies
on a
Global Positioning System (GPS) device to obtain accurate time
and to synchronize time
between each of the monitors. The monitors send packets at Poisson
randomized time intervals to each other and use these packets to gather
one way end-to-end delay and loss measurements. Surveyor also makes concurrent
traceroutes which provides route history information.
Surveyor is more accurate and better for short term measurement, especially
for sites which have good connectivity.
Surveyor currently provides daily snapshots of performance.
The community for
Surveyor is Internet 2,
though there are monitors at non Internet 2 sites,
and in particular at 3 Higher Energy Physics (HEP) sites
CERN, FNAL
and
SLAC that are also
PingER monitor
sites.

PingER uses the ICMP echo facility (ping) and thus only makes
round trip measurements. PingER uses an
existing host with no special software installed at the monitored site and does not
require a GPS system.
PingER is a more light weight solution, requires less management, uses less
bandwidth, requires less storage, and
nothing needs to be installed at the remotely monitored sites.
PingER is good for remote sites with poor connectivity.
PingER, today, has more reports available for showing long term trends.
The community of interest for PingER is ESnet, High Energy and Nuclear Physics
(HENP) sites and the Cross Industry Working Team (XIWT). More general
information comparing the Surveyor and PingER can be found in
Comparison of some Internet Active End-to-end Performance
Measurement projects.

Comparing ping with Surveyor

We made some high statistics (~250K samples)
long term measurements with ping from SLAC to
CERN from May 9 thru May 12, 1999. The pings were made using the standard
ping utility with 100 data bytes
(including the 8 ICMP bytes but not the IP header), were made at one
second intervals and had a timeout of 20 seconds. The host (ping client)
issuing the ping echo requests was an IBM RS/6000 250/80 running AIX 4.1.5.
It is the same host (minos) that is used for the PingER monitoring at SLAC.
The host echoing the pings (ping server) at CERN was the same host that
is monitored by PingER (ping.cern.ch).

The distribution of these pings (see the magenta squares in the chart
the right or above)
indicates a sharp peak (95% of the Round Trip Times (RTTs) are contained in
a 9.5 msec.) centered around 220 msec. There is both a high and a low RTT
tail. The figure also shows the Surveyor delay frequency distributions (green
and blue triangles) for
the same time period. The Surveyor distributions also show sharp peaks with a
high RTT tail. The medians of the two delay distributions (113 msec. SLAC to CERN
and 105 msec. CERN to SLAC) add up to roughly
the RTT seen by pinger (221 msec.). Note, they are not expected to be exactly
equal since the packet sizes are different.
The SLAC to CERN delay distribution (the blue dots)
also exhibits a low RTT tail similar to that seen in the ping distribution.
During this period Surveyor observed packet losses of 0.71% from CERN to SLAC,
0.68% from SLAC to CERN and the pings observed 1.04% for the round trip.

We then investigated the causes of the low and high RTT tails. The time
distribution of pings with a high RTT (> 260 msec.) is shown below. For
Tuesday May 11th, several clusters of high RTT are apparent.
The cluster aroung 18:00 hours UTC is seen below. A route change occured
(seen both from SLAC to CERN and in the reverse direction) at about 18:10
hours UTC causing traffic to take a shorter but more congested route (note
the increase in lost packets). See
Ping high statistics results for more
details. As can be seen this change in RTT performance is also evident in the
Surveyor reports for the same period. Comparing the Surveyor graph with
the ping graph above it is also evident that the ping clusters at about
01:00, 07:00 and 14:00 hours also show up in the Surveyor data.

We also binned the Surveyor and ping data into 1 minute bins with the contents
of each time bin being the average Surveyor one-way delay or ping RTT for
that minute. We also added together the Surveyor one-way delays from each
direction for each minute to create a Surveyor round trip delay for
each minute. This data is
shown in the chart below or to the left.
The magenta and black dots (the bottom and next to
bottom sets of points) show the Surveyor one-way
delays, the green dots show the Surveyor round trip delays, and the blue dots
(the top set of dots)
show the ping RTT. Note that the left hand y axis is for the SLAC to CERN Surveyor delay and
the Surveyor round trip delay, and the right hand y axis is for the Surveyor to
SLAC delay and the ping RTT. The use of 2 separate y axes enables us to display
the points so they do not overlap and hide one another. Careful examination of
this chart reveals that the green and blue dots track one another very well
reproducing all the peaks and flat periods.
Scatter plotting Surveyor round
trip delays for each minute vs the ping RTT for the same minute yields the
chart below or to the right. It should be noted that the timestamps of the
pings were adjusted (see below) to the nearest minute to account
for the lack of an accurate record of time correlation between the
clocks of the
hosts making the measurements (i.e. between surveyor and minos) at the time the
measurements were made.
It is seen that the points roughly follow a straight line
with an
R2
of 0.918 indicating a strong correlation
between the two sets of measurements.
Part of the reason for the slope not being one may be due to the
difference in packet sizes used by Surveyor and these pings.

We optimized the adjustment of the ping timestamps mentioned above,
by varying the adjustment from -60
seconds to +60 seconds and calculating the correlation coefficient R between the
timestamp adjusted ping RTTs and the Surveyor round-trip delays. The results are shown
to the left or below. It is seen that there is a sharp peak at an adjustement of +2
minutes with a width (IQR) of about 5.5 minutes. By the time the adjustment is off
by 30 minutes or more in either direction, it is seen that the correlation has disappeared.

Comparing PingER and Surveyor

To enable us to compare the Surveyor data with the PingER data,
Matt Zerkauskas of the Surveyor project kindly made available to us
Surveyor data for the six pairs between CERN, FNAL and SLAC from November
1998 thru May 1999. We aggregated the Surveyor data to match the
time "ticks" used in PingER (hourly, daily, monthly).
Then we reformatted the
into PingER format and made it available via the
PingER tools.
We then exported the data from PingER to Excel and
added the delays and losses from site a to site b and
b to a to create an RTT between
a and b (see Tutorial
on Internet Monitoring and PingER at SLAC for how to combine
the one way results to come up with the round trip results.)

Long term - monthly

To compare the long term data (i.e. one point per month) we scatter
plotted the monthly Surveyor round trip delays (derived as described
above) against the monthly PingER round trip delays for the 3 sites
to yield the plot below. The line is a linear least square fit to a straight
line with the parameters show, and the correlation coefficient
R2 (see Microsoft Excel User's Guide, Microsoft
Corporation, for how the correlation coefficient is defined)
indicates that there is a strong correlation between
the two sets of data.

Medium term - daily

We scatter plotted the daily Surveyor data versus the daily PingER
data to yield the scatter plot below. The straight line fit
probably has a slope of < 1 since the Surveyor packets are shorter
by about a factor of two compared to the PingER packets. Again the
R2 indicates a strong correlation.

Short term - hourly

Finally we repeated the above for hourly ticks to yield the results below
for a PingER monitor at SLAC monitoring 2 hosts at CERN and vice versa.

Summary

Surveyor has more detailed measurements of the performance both in the
frequency at which the measurements are made and also in the fact that
it has one way measurements. Surveyor relies on dedicated platforms with
strong central management.

Pinger is more parsimonious with resources (bandwidth, disk space and cpu)
and does not require a dedicated host and GPS aerial to be installed at
every site.
This enables it to be attractive for sites that have limited bandwdith,
or are unwilling to install a dedicated host and GPS aerial. It has
also turned out to be attractive to groups such as the
XIWT
that have limited resources to gather the data and analyze the data.
Though PingER is less accurate especially at low time resolution (< an hour)
it is very good for looking at long term trends and grouping of sites where
limited statistcs are less of a problem.

The strong correlation, both visually and statistically,
between the Surveyor and PingER data for RTT and the Surveyor and ping
RTT data (on which PingER is built) indicate that
the results from both projects can be used together in complementary ways.