Introduction

There were about 40 attendeess mainly from Research &
Education sites and in particular Internet 2 sites. Most of the
invitees run various Internet monitoring activities, or fund
them.

NSF Measurement Perspective - Bill Decker

Programs like NSF are all about demonstrated results, success
helps future awards. Thus measurements is very important to the
funding agencies. Interest comes from many quarters include:
Congress has bipartisan support for NGI, there is the Presidents
Information Technical Advisory Committee (PITAC), universities
(constituents) expectations, network research community, there is
a tension between populism and elitism (e.g. not enough money to
support everybody, so fund the more successful universities),
tension between putting money into infrastructure versus
research.

The motivations for measurements include advancing the state
of measurements themselves, and measurements as a tool for
demonstrating return on investment (ROI), e.g. how much of a
difference has plowing $100M into vBNS made.

For research he identified development of tools and
information, problem avoidance & anticipation, identification
of behaviors that norms, how to scale for growth, adaptive
access.

For ROI demonstration how does it differ from the
commodity Internet, what is being done, who is doing it,
what is happening in endpoints, in LANs, in CANs (Campus Area
Networks), in WANs across LANs, CANs, WANs.

Other issues include educating the people asking the
questions, preserving anonymity, non-intrusiveness, responding to
multiple constituencies, multi-provider dynamics, involvement of
other disciplines (e.g. statisticians).

They want us to report the number of testbed desktops and
sites. In the 1999 review, the maximum end-to-end
performance reported to PITAC was about 10M Bytes/sec for
IP and 0.6 M bytes/sec for TCP, they are disappointed
with this and hope for substantial progress as the
initiative is implemented. Where they got these numbers
from is unclear.

What do we do, e.g.: start tabulating testing the 100 sites;
set up a measurement mesh between NGI sites; build an acceptable
traffic report; keep track of top speeds/apps; greater focus on
the campus and end-systems. We need a response in the next 6
months, i.e. by end of year. It would doable to create, configure
of the order of 20 AMPs (or ~ $1.5K each) or Surveyors (~$4.5K
each, extra cost for GPS stuff) to place at NGI sites. Need to
define sites, define mesh, populate the sites with machines.
Still need to define the questions as to what the mesh should
answer. Other things could be to include stats from OC3MONs, and
also TCP analyses from Matt Mathis and provide information back
to the users as to how they need to improve their apps/stacks
etc. It would also be useful to show improvements in the
Internet, such as PingER long term plots with additions for the
NGI sites starting ASAP.

Just for background the PITAC has very few network experts and
though the people on the sub-committee were smart none of them
were network experts.

Responding to the PITAC request

Need list of 100 (or more) sites with >= 100Mbps
connectivity. This list could include the Point OF Contact (POC),
Lat/long, test address, nets (DREN, Abilene, vBNS, ESnet etc.),
speed of Internet connect, what monitoring they have (Surveyor,
NIMI, AMP, Netperf ...) the short name of the site.

FOIA - Bill Decker

Shared information on a proposal from the Federal register
from OMB " require Federal awarding agencies to ensure that
all research results, including underlying research data, funded
by the Federal Government are made available to the public
through the procedures established under the Freedom of
Information Act". This could give rise to problems with
sanitizing data to ensure privacy (e.g. removing addresses from
traces). "The amended Circular shall apply to all federally
funded research, regardless of the level of funding or whether
the award recipient is also using non-Federal funds". A
discussion ensued: what if the unsantized data owned by someone
else, e.g. MCI, XIWT but available to the federally funded
research, who santize it and make their own copy. UCSD are taking
up the issue at the president level, since they believe it
undermines any scientific research. It is not just the
measurement community, it is all research.

Internet 2 - Matt Zekauskas

Goals are to support advanced applications: high bandwidth,
low latency, low loss, multicast, QoS. How to answer the question
"my application has a problem"; is it the end-host, the
network.

Focus was on NGI inter-exchanges. West coast was NASA-Ames,
midddle was STAR-Tap, E. Coast to be be College Park Md. There
are 11 exchange points today.

NGI needs from performance & measurements. Need to
tabulate the 100 sites, how does one choose the 100. NSF has
connected 150, other agencies have connected sites. Need to test
and verify sites exist. Then set up a measurement mesh between
the sites. Want to verify that sites do have > 100 Mbps. PITAC
requested an Internet traffic report for the layman even though
most research folks feel it is bogus, so we need to come up with
the same ease of use and understanding and yet is not bogus. Also
want to better track top speeds/applications and a greater focus
on the campus and end systems.

Important messages: window size is very important (e.g.
default of coast to coast with typical default window size of
8KBytes only allows < 2 Mbps, 64KB window gives 11 Mbps coast
to coast); latency matters (routes, may need to have better ways
to choose a route with low latency vers); loss rate matters; MTU
matters (need to use MTU discovery, need a path from every host
to DMZ that does not reduce MTU to 1500 Bytes, i.e. preserve the
MTU); disable slow interfaces; campus infrastructure matters

Software has to deal with more than 4 orders of magnitude of
bandwidth (i.e. from 56 kbps thru Gbps). Most traffic on Internet
is TCP and most of TCP is HTTP, and typical HTTP
"session" is about 12 packets, which does not allow TCP
to get through start-up.

Define a campus DMZ measurement machine for making RTT and
throughput measurements to. Also need a test suite, create an NGI
traffic/status report. Concerning burstiness are 5 minute samples
good enough. Would be nice to have a no-brainer that tells you
what your network performance is like (e.g. like security scanner
but for performance), relationship between utilization and packet
loss (RED increases link utilization, but also increases packet
loss, does individual flow performance go down); is there a wave
phenomena in the Internet (i.e. periodicity in frequency space,
analogous to reflections in an audio network).

Other issues/discussion topics:

site-to-site vs. end-to-end performance

Measurement & Analysis in vBNS - Kevin Thompson

They monitor the SNMP MIBS in switches, routers etc. The Cisco
routers were found to be reporting incorrect counter values which
means the data is only correct from the Fore switches. The RTT
are now surpased by the NLANR AMP results. For performance tests gmiller@mci.net uses ttcp,
mping and treno between Sun Ultras at all terminals and Dec
alphas at SCCs (Super Computer Centers). The commercial internet
is dominated by Web traffic, games also shows up in the top 3
protocols followed by NNTP. vBNS shows HTTP and FTP about equal.
Port 514 shows up and is heavily used by USGS & NCAR for
large data transfers of geographic data. vBNS is more bursty,
fewer aggregated flows, much larger flows, larger packet sizes,
higher percentage of UDP flows. They make a lot of data to the
research community. All SNMP data is available, the normal flows
data is available, packet & cell traces may be made available
(but concern over sensitivity may require node identification
filtering).

Relationships with NLANR (NCNE, DAST, MOAT), CAIDA.

They are looking at how to support QoS. Can't run WRED
(Weighted Random Early Drop - Cisco code release problem).
Looking at a proof of concept of putting in an OCxMON box in-line
to intercept all packets, check the precedence and remap the
traffic to VCs.

Abilene NOC tools- Matt Zekauskas

10 Cisco 12008 GSRs, OC48 coast to coast, OC12 links too, POS,
Access OC12, OC3. Can't use OC12MON yet, an OC12POS coming, there
is no netflow, OC48 counters never worked right, now fixed; there
is lots & lots of data.Now getting utilization routing &
external performance. They have NEMO (network monitor), Abilene
weather map, RPG (router proxy gateway). They also use
"what's up" and MRTG. NEMO monitors all the interfaces
on all routers every 3 seconds. The data is stored in binary form
and they are working on a standard export summary format. The
weather map shows traffic between the Abilene backbone nodes,
with colored arrows, can click link to come up with MRTG plot, if
click on points then get all router links available. The plot is
for data aggregated over 5 minutes. Also has animated replay. See
http://www.abilene.iu.edu/

The router proxy gateway (RPG) allows you to execute Cisco
commands, it checks the command to ensure it is OK, it rate
limits, provides access to a limited number of Cisco commands
(e.g. show IP *, show interface *,) limits output to 300 lines.
See http://hydra.uits.iu.edu/~abilene/proxy/
for more information or contact Steve Wallace at ssw@ui.edu.Surveyor

IP Performance Measurements (NGI) - Matt Zekauskas

This reported on Surveyor measurements. Internet complex, even
providers don't understand how it works in great detail,
applications are getting more demanding. Surveyor uses the
IETF/IPPM framework in particular one way loss & delay.

Minimum of delay gives transmission/propoagation delay,
variability gives queuing. They use measorements for problem
determination, engineering (trends, lofas), monitor QoS, network
research, feedback to advanced applications etc. They have a
centralized database to store data, and web based reporting
tools. Will move from BSDI to FreeBSD.

They are making active one way delay and loss, test packets
are time-stamped with GPS time, back to back calibration: 95% of
measurements +1 100us => 10 us soon. Measurements centrall
managed. They also make concurrent routing measurements. Thye are
QoS ready (via EF DS byte). Poisson scheduled, average rate of 2
/ second. Use a random port number per session (sessions stops if
lose GPS lock, or software problem or hardware problems) to stop
routers treating them specially.

Modified traceroutes randomly scheduled every 10 minutes.

They use a flat file database (used to be Oracle), collect via
ssh. Now have 55 machines with 1883 paths at Universities,
tele-immersion Labs, National Labs, APAN, NZ, Canada.

They have daily summary reports, integration with route
measurements, Java applet to dynamically graph almost ready,
developing aynchronous notification of problems. Have added an
indication of when routes were measured, have an animated display
of the last month data day by day.

Observe that routing is asymmetric, even if symmetric queuing
may be assymmetric, can detect level 2 changes (SONET failover or
ATM routing). Have observed low delay with bad loss, high delay
without loss. HPC connections provide low-latency low-loss.

Want to look more at correlation of the measurements and
applications. They are concerned about the N**2 problem of full
mesh measurements.

NLANR's Network Analysis Infrastructure - Hans Werner Braun

NLANR AMP Activities - Tony McGregor

AMP is a commodity PC based on FreeBSD is used to actively
monitor at all or most HPC sites, hope to deploy at every single
campus, have 65 monitors so far. Site-to-site measurements
between NSF funded HPC sites: to complement pop-to-pop
measurements taken by HPNSPs. There is no GPS requirement which
makes things simpler. The consistent machine at each sites
simplifies things by removing unknowns specific to a particular
machine. Measure RTT, loss, topology and throughput between about
65 sites with a full-mesh. NLANR provided hardware and
administration. Monitor a few extra machines which are not
monitors (e.g. OCxMONs). Have two analysis machines which each
store the data for redundancy. Are willing to ship raw data to
people who ask for. They sample once/minute. The throughput
measurements (see http://moat.nlanr.net/QActmon/)
are not made regularly rather on demand via a Web page. One can
choose the mechanism (e.g. treno). They can schedule throughput
tests on demand from users. Traceroutes are done every 10
minutes. They do nothing with BGP at the moment, they are
interested in cross-correlation between BGP & traceroute.
Have not done much work between ping loss, RTT and throughput.
Tony thinks there will not be a strong correlation in general
since there are so many things going on, it may depend on the
observation period.

They are working on event detection: using process charts (a
simple statistical method used in process management where one
takes a window and report the percentiles in the window, then
report when have n samples in a row outside the percentile) &
heuristics; localization: path decomposition and IPMP (a
measurement friendly protocol that is lightweight, can be
implemented in the routers (but does not have to be, but if it
is, it is as easy to implement in the router as just forwarding a
packet so it is a good indicator of router performance, and does
not cost extra so ISPs will not limit), and at end nodes). See http://moat.nlanr.net/AMP/IPMP/
for more details on IPMP.

NREN NGI Activities - Phil Dykstra

They have surveyors at MC, CA, AK & HI. They see steps in
performance due to load balancing on routes. They were looking at
reachability to 300 HPCMP PIs with locations from business email
addresses & Lat Long from phone number. Approximately 200 of
300 on DREN & 20 of the 100 were on vBNS. Phil has a map of
where N. American exchange numbers are located. He pointed out
the needs for IP address to ASN mapping (in Europe RIPE will
revamp and concentrate on ASN origin authentication and it may be
enforced), IP address to Lat/Long & country (RIPE database
will give the country by prefix and by AS), ASN to name mapping
(RIPE has a database which provides this), connect to a web site
to tell you about your system, portable surveyors for use on
campus to measure to the DMZ at times (the RIPE machine could be
made portable, but has to be up for a couple of hours in order to
get better than millisecond accuracy), a test machine at every
campus DMZ

PITAC NGI Review

This is important for future funding of NGI programs. Their
goal for the testbeds is to provide necessary bandwidth, low
latency, QoS & security. They are having a problem learning
how well the NGI testbeds are operating, they want a systematic
measurement of bandwidth latency etc. to desktop. Would be
helpful if agencies report daily averages and peak-minute
measurements of these metrics for many links (e.g.
http://www.internettraffic report.com/). In 1999 review max
performance was ~ 10MBytes/sec for IP and 0.6MBps for TCP these
are disappointing. They hope for substantial progress.

Internet statistics - Graduate student at UCLA

Shown

Hurst parameter is not an accurate indicator of queuing
performance. They are looking at bps throughput for TCP, then
used tcptrace from Shawn Osterman. An example of the reports is
to show all flows with over 1Mbps for port 80 (FTP) and 80
(http). At UCLA they have a netperf client that can access
netperf.ucla.edu to run a netperf to the client machine.
Performance problems are usually caused by the on and off ramps
to the backbones or by the clients.

WAND group - John Cleary U of Waikato

They have built some bidirectional measurement hardware that
is deployed at many sites. DAG is the generic name for a series
of boards that they have built. They use an FPGA to get the time
from pps and an ARM processor that can provide the GPS to the PC.
The physical interfaces working are 25Mbps, E3, DS3, OC3, OC12,
OC48 (2 mos), thinking about OC192. For the higher speeds there
are difficulties in how to capture the information. They also
support POS (Packet Over Sonet)and 10/100 Ethernet. They are looking at Gbps
Ethernet. The FPGA now have up to 1M gate equivalents running at
100MHz with a downloaded image from the PC. They support Linux
& FreeBSD drivers and they also support CORAL.

The GPS is accurate to +- 0.25us to UTC (1 sec pulses
corrected for drift offline (oscillators drift by a few parts per
million), will do online soon). They are looking at temperature
compensated oscillators.

They have recorded many ATM traces and can do full cell
capture of ATM (at OC12) but usually run out of diskspace so tend
to get timestamp and CRC header for ATM, for IP get the
timestamp, CRC and header for 56 bytes which gets most of the
header.

For measurements they are looking at traffic statistics, can
measure switch delays down to SONET frame jitter. Current sites
are: Aukland, NZIX, vBNS/NLANR, U of Calgary, Monash. They do not
run continuously, run for a five minutes twice a day (this is
long enough for many flows).

What can go wrong: the OS has bugs which are version
dependent, single/multi user etc. So need to validate the
measurements, run dual configurations to check setup. The
hardware solution removes the OS, so is likely to be more
accurate. The more expensive hardware system can be used to
calibrate other more lightweight systems.

VoIP measurements between US and NZ shows minimum delay of
about 90 msec with good consistency to the US (e.g. < 120
msec. max). US to NZ much more varied 120 msec to 350 msec. Also
show little difference in VoIP to non VoIp traffic. They are
looking at how to gather long term trends for VoIP by detecting
H.323 and other protocols (nb UDP uses non fixed ports). For TCP
they want to extract individual sessions, estimate latency,
queuing delays, server delays etc.

They would like to make the data available, but where to store
it (NZ pays for transit of data across Pacific). They are working
on validation and careful recording of OS etc., they are also
concerned about security and hiding IP addresses.

More validation of traces is a big issue, need to ensure
timestamps are in order, check that merged bidirectional data is
properly merged.

For statistics they are using wavelet transforms in
collaboration with a group in Melbourne, mixture of exponentials
(can heavy tails look like self similarity), non parametric
weighted models.

There will be a PAM (Performance Analysis and Measurements) -
2000 NZ conference at Waikato, April 3-4 just after the IETF in
Adelaide Australia.

CAIDA Tools - David Moore

They have a Perl and C interface to their program libraries so
can write initially in Perl then port to C later. They have
tcpdump support so can read tcpdump files (instead of getting
from OCxMON) into their tools (so can use instead of lib/pcap)
and they support ATM (which lib/pcap) does not. They provide
analysis examples, e.g. how to count bytes for protocols. They
have continuous data collection for operational situations with a
lot of examples for generating reports. They are working on
security modules to filter things (e.g. start doing full trace
when see some activity such as a portmapper request). Cleaning up
the drivers so better integrated.

Active Testing of Internet - Andy Germain NASA

Test process runs script hourly runs traceroute (# hops &
from changes get congestion analysis), throughput (runs for 30
secs to keep socket buffer full) and pings (100). They have a lot
of collaboration sites on vBNS so interested in it. Showed plots
of min/max/median throughput. Compared vBNSwith SprintLink. Show
correlation of Loss RTT & thruput and they track when plotted
hourly for several days.

After the talk I got together with Andy to look in detail at
some of his data to see how it might be used to indicate the
correlation between thruput, loss and RTT, and a formula for TCP
bandwidth (BW < 1460/(RTT*sqrt(loss))). The agreement looks
encouraging for data between GSFC and LANL.

There is a new link to STAR-Tap from APAN (AU, JP, KR, SG, US)
and they are making measurements on this. The DOE has NGI
applications with combustion, global climtae modeling, particle
physics, teleimmersion, Xray crystallography, they are
identtifying the application characteristics and instrument the
network accordingly. A 3rd interest is Globus which needs
middleware to enable advanced distributed computing, they use
Netperf to make measurements.

STAR-TAP Engineering

Interchange for many international networks, the NGIX and
several universities and research sites (e.g. FNAL). Will install
an OC3MON, Surveyor and an AMP at STAR-TAP. Looking at having a
second AMP mesh for International sites and then link at the
STAR-TAP to the US mesh. Would like to see more Surveyors
supported outside the US. Prefer third party measurements
rather than self measurements.

RIPE-NCC Test Traffic Project Henk Uijterwaal - RIPE

Active one way delay, loss, routes, follow relevant RFCs,
IPPM, similar to Surveyor, independent platform and inter-ISP
traffic. Run FreeBSD, have GPS, get time to few us. Have 43 test
boxes, 37 in Europe, 5 in US, 1 in Israel. 33 taking data. Get
daily plots, access to data base with routing vectors. Working on
delay & loss alarms, weather map. Routing vectors show paths
from one host to another user can query, changes in routing can
explian changes in delays. Working on delay & loss. Look at
distributions to see multiple peaks. Constantly calculate
percentiles and compare with previous values looking for a
significant difference, then send alarm to sites running boxes.
Concerned about false alarms, maybe let site set own thresholds.

They are also looking at long term trends. They will
parametrize losses/delays following relevant RFCs. They will plot
over time, look for effects, intended as a tool for planning
start work in August.

They want to create a quality of the network page summary on 1
page, both short & long term, explain what is in table, be
scientically defendable, define some acceptable use plocy, define
metrics are needed. Need to discuss what people want to see in a
report such as this.

Further analysis projects: delay vs packet sizes, bandwidth
vs. throughput, more detailed study of packet losses, relation
between delays and routing changes, N**2 problem & modeling
paths maybe dynamically reduce the number of paths with as few
assumptions on the network as possible.

Will continue to expand with more machines, sites will have to
pay for boxes ($2.5K/box), want to improve the clocks, use for
external networks only (basic support for "private"
experiments), watch RIPE email list tt-wg@ripe.net for details
(majordomo list).

NIMI - Andrew Adams

Software system to facilitate widespread deployment of
measurement platforms. To be scalable (> 1000 nodes) should be
no problem. Dynamic so it can run any active measurement tool.
Secure, encrypts messages (data & configuration), Self
configuring, the nimid downloads configured information from a
configuration POC. Support a wide range of policies, each nimid
has its own ACL table.

They have 15 sites running for a long time (includes SLAC,
CERN, FNAL, LBNL, NASA, PSC ...). Current measurements,
traceroute, treno, mtrace and zing (one way measurements without
a GPS clock). Platforms are NetBSD & FreeBSD, Linux is on the
way.

To do they want to use public key serving, GPS hooks, else
absolute time, tool uploading (beta) automatically when the tool
is updated. Multicast for measurement coordination and
disemminnating results.

I talked to Andrew afterwards, he said he is hoping in the
next couple of months to provide some reports on the analysis of
the data they have gathered so far. One of the collaborators on
this project (Jamshid Mahdavi) has left PSC to join Novell. The
main person still working on NIMI is Andrew with guidance from
Vern Paxson.

Collaborations requirements meeting

Introduction

This meeting was held on the third day (Thursday July 1, 1999)
and was a smaller meeting (about 20 attendees). The attendance
was not a subset of the previous 2 days, in particular there were
new attendees for the 3rd day including luminaries such as Guy
Almes, Matt Mathis, and Craig Labovitz. The objectives were to
identify strategies & opportunities deriving from high
performance environments. This includes new applications, new
technologies & protocols. There was some overlap, but also a
lot of new (typically more technical) information introduced.

Bill Decker NSF

NSF needs to report on results from the investment in NGI. The
NGI implementation plan has 3 goals, research, testbeds and
applications. Bill feels the network enables the NGI goals, and
wants to cite the results from network measurement activities.
The testbed goals has 2 sub-goals: 100 sites at 100 times speed,
10 at 1000 times speed. The first goal (2.1) is more in the NSF
area (the latter is more DARPA). The 3rd goal (applications
enabled by the network) requires input more from the applications
folks than the network measurement folks. So wants to cite
network measurement activities. One important thing is to be able
to cite how scientists get better performance as a result of the
measurements and research, e.g. the tuning etc. on protocols that
improves performance to the desktop.

Since the agencies have to do this evaluation and reporting
Bill wants to see: plans for use of results both near (next year)
and long term. These should report on testbeds and how they are
behaving and what results the people are getting. Longer term
create a more sustainable measurement activity, and address
things which cannot get into the short term. Secondly Bill wants
to see something a bit less self serving to funding agencies,
i.e. what should be the nature of the funding agencies to support
activities in this area, what kind of dollar levels, how to tie
to other activities.

A recent workshop on NGI suggested removing the testbeds as a
goal since they enable the other things, i.e. testbeds are not a
goal in themselves but rather are part of supporting an
application.

One question raised was if the community goes to an NSP and
sets up an SLA to provide the needs, then can use the commodity
Internet and the R&E networks are no longer specialized out.
However, this requires a a whole suite of meaurements and
architecture etc. to support both to write the SLA and to check
for conformance. However the problem is typically not with the
individual NSPs but rather with interconnecting the clouds run by
each NSP, and this is much easier to do with a limited community
of the R&E community than for commercial Internet due to the
competition between the cloud owners.

I2/Abilene Perspectives - Guy Almes

There is a close feedback loop between engineering and
applications, applications motivate engineering and engineering
enables applications. What makes it hard is the need for high
bandwidth, low latency, wide area, bursty apps, multicast, QoS
and need for meaurements, the introduction of new apps and new
users, the exponential growth etc.

I2 has 150+ campus LANs about 35 GPops, OC-48 from QWEST,
Cisco GSRs, connects at OC12 and OC3. Connections on order or in
progress will double the number of connections. Most of current
14 Abilene backbone connections are multi-homed to vBNS &
Abilene and currently prefer vBNS. Exchange points are NASA Ames,
Chicago. Traffic on Abilene now is low and performance is good.
In 3 or 4 months almost all GPops will have Abilene connections.

Current loads understate the need, there is a lot of pent up
need, new paradigms will increase the load in unanticipated
ways. Part of the problem is that all bugs in the infrastructure
(BER, ATM switch problems, mis-tuning, poor stacks, router bugs)
result in less than expected performance. Any one cause is easy
to fix but hard to know where the problems are. The folklore that
vBNS uses is less than expected implies that users don't need the
bandwidth. This is not true, since the users need bandwidth but
can't get it since they can't easily identify the problems that
are causing the poor performance and by-pass these problems.

NREN NGI Perspective - Phil Dykstra

Javad Boroumand has a nice map of the interconnects between
JET nets. Need to enumerate sites that have 100 Mbps connections
then verify that can get that performance. Need large MTUs (>
1500 bytes Ethernet limit). With 64KB window size can only get
11Mbps coast to coast. They have concerns about the denial of
ICMP and its impact. May need test machines identified at each
site.

One suggestion from Matthis was that priming of caches etc.
may require > 1 initial ping (in case the ping is lost).
Another remark is that simply turning on ICMP limiting effects
the code path in the routers and so affects the performance.
Would be nice to get some measurements of the commodity Internet
performance compared to that of the R&E networks.

NLANR - Hans Werner Braun

They see the challenges as: making data more available;
formulating questions that can realistically be answered, getting
better answers and results; improving and tighter integration of
analysis and visualization tools (e.g. fly though of parts of he
Internet such as vBNS map with packets flowing etc.); aggregation
of various data sets for comparisons; how do we make the most of
the existing data (make available, provide reports).

Thinking of sending a short shockwave from say AMP machines
into the network to see the effect on routers etc.

Other things that might be done with AMP. They want to deploy
AMP machines back into the campuses; looking at portable AMP;
regular thruput tests and if so how many (beware of network
impact); triggered throughput meaurements or some other
more detailed measure following the occurence of an event (but
may make problem worse); reservations about the effort required
to progress IPMP (need a serious expectation that it is worth
proceeding with and that it would be adopted before putting the
effort in).

On validation front is AMP OK, is it producing valid numbers,
how precise is it, is kernel level timestamping necessary and how
much does one care; is round trip enough, does one also need one
way measurements or do the routes in the 2 directions identify
things well enough.

One interest is how does recovery work from high load. It may be
like a potential barrier (i.e with hysteresis). How do things
work at high frequency, e.g. rapid onset of problems and slow
recovery, how does one characterize this. Integration of active
and passive data (e.g. traceroute with BGP).

What TCP tells us about the network - Matt Mathis

Can "sniff" flows at source, destination or
somewhere in the middle. If measure passively, at any one of 3
tap points, the loss, window & RTT then can check whether it
is application or network limited. Knowing this can decide where
to tackle the problem. Can also tell whether it is upstream or
downstream limited. If look at destination or in middle then can
see receiver mistune, whether it is a cpu or application
bottelneck.

Percent duplicate data at receiver indicates how well TCP is
coping, it rises monotonically towards the sender, differential
rise is quality of any hop. Can even tell which of two ISPs is
causing problem.

The measurement infrastructure required is sniffers at the
right places. One problem might be privacy issues of passively
monitoring data. Matt want to develop his tools to work with
tcpdump, OC3MONs.

Another thought is to develop a TCP MIB that can be
interrogated. One problem is what happens if the sender runs out
of cpu capacity.

Needs for university network researchers - Arne Nilsson

Traffic measurements at North Carolina State University. They
have been measuring the NC-REN at various points. T

he major results were that the traffic is bursty &
non-Poisson. In the late 80s they did some modeling with a Markov
modulated Poisson process. Later modeled results from VISTANET
with a session level on/off model, packet train models. They also
measured NCIH looking at transfer of medical images. Major
findings were that the data was extremely bursty, but well
behaved (did not hurt network) for most of the tiem. There were
some bursts of ATM cell losses associated with a time constant
(time for source to dramatically alter behavior, e.g. rate of
generation of cells) of 20 to 50 msec. They were using UDP.

Recent work deals with how where & when to make
measurement, characterizarion (long-range dependency, stationary
versus non-stationary), etc. The stochastic processes have been
assumed to be stationary, but are they? They looked at the
Bellcore data sets looking for tests for stationarity
(Dickey-Fuller, Covarience stationarity), and the result was that
there is a high confidence that it is non stationary. They have
characterized mathematically the non stationary behavior. They
get good agreement with number of arrivals in a time interval.
The variability is captured in a precise manner, can use for
synthetic generation. They have ongoing measurement activities
with OCxMONs. He has good access to statisticians.

We could provide access to our data. They could come up with
how to do statistical experiments to optimize results. He has no
foot in the religious wars of whether the data is
self-similar/multi fractal versus other theories. They are using
home grown tools, SPlus and SAS. He will provide a pointer to
more information for the proceedings.

Ongoing needs in routing measurement and analysis - Craig
Labovitz

Challenges: as the internet grows how does the core scale, how
do core routing protocols work as Internet grows, what are the
limitations on growth, how far can BGP go in terms of
convergence, impact of new policy changes (e.g. moving from hot
potato routing), what are the limitations, how does one deal with
debugging, how do we provide the highly available, fault tolerant
Internet service, what is the impact of dampening, policy
filtering authentication.

Multi-homing does not provide near instantaneous fail-over,
withdrawals travel slower than announcements, time for failure
increases exponentially with the size of the mesh. They have been
doing measurements. Injecting faults (into BGP) to see effect.

Growth of topological state, CIDR solves prefix entries
(slowly growing) but not the growth in routes. This growth
affects convergence. Cumulative frequency of failure versus
number of days shows that within 30 days have lost 80% of the
routes. The source of failures from a published case study (start
at http://www.merit.edu/ipma/)
is mainly (16%) due to maintenance (can't do an upgrade without
killing peering sessions), power outages (16%), fiber
cuts/circuit/carrier problem (15%), unknown 12%, hardware fail 9%
... Given a failure most (50%) are repaired in 30 minutes. The
distribution is heavy tailed. Distance vector algorithms (BGP)
require multiple passes for convergence, majority changes
converge in 5 mins, some take much longer. Withdraws and
route-fail over may induce iterations of BGP procesing during
convergence. The rate of convergence grows exponentially
with meshiness of network.Convergence for 50%
of routes after injecting a fault takes about 100 secs and 5%
take 3 minutes or more. Convergence of announcements is
90% done in 50 secs, while withdraws is about 120 secs. During
convergence one gets packet loss.

Challenge is how to provide highly available Internet
services. How does one do maintenance, e.g. hot swappable
components, protocol aware swapping etc. Can we reduce the
meshiness, e.g. reduce redundant links to help convergence,
reduce peering, could providers make their nets less meshy.

See http://www.merit.edu/ipma/
They would like to deploy their tools (IRR & Route Tracker)
on the commodity Internet, CAIRN, I2 ...

CA*net view - Rene Halson

They are concerned with monitoring at the higher speeds, e.g.
OC48MON, OC192MON etc.

If people have comments or position statements relevant to the
working group then they should be sent in by email by Wednesday
7th July in order to appear in the proceedings.

Gtrace is a new Java tool being developed by ram@caida.org
to provide a visual traceroute that displays on a map
and also provides extra information on the nodes passed through.
It does a 5 step search to find the location of each node looking
in the DNS record, CAIDA databases and NDG, analyzing the name
for .net nodes to uncover a city or airport code (he has
configuration files for 12 nets now showing how the location
information may be available) and then looking these up. In
addition to the map output it provides tabular information
telling how authoritative the location information is likely to
be, the loss on the 3 traceroutes, the RTT. You can click on the
node for whois information. He will add the route distance. It is
written in Java using the swing graphic utilities. It will be
released in October 1999.

A second tool demonstrated by ram@caida.org
is called Geoplot and enables one to develop and display
web maps displaying nodes with links between the nodes, apply
color and thickness to the links, and make the links and nodes
clickable. The nodes and links are described by a simple
file which is accessed from an html file that provides the static
configuration information. The idea is that the simple file would
be constantly updated from a perl script, and he is also going to
provide a Perl API to create the simple file. Apart from the Perl
API this should be available by the 3rd week in July.

I met with Tony McGregor. He is very interested in us
correlating the AMP data with PingER and Ping. He will
make the AMP data available to us. To get started we can start
pinging between several mutually monitored site-pairs (e.g.
representative sites of interest to us such as FNAL, UMD, CMU,
ColoState, Waikato ...) for a period. AMP makes measurements
1/minute and does not advertize the names of the sites, so he
will provide the IP addresses. I will need to decide on the ping
interval. The AMP can also, on demand, make thruput measurements,
so we can make some checks on how and when thruput correlates
with loss & RTT.