Overview

The publicly available network simulator
ns has become a popular and widely
used simulator for research in telecommunications networks. However,
the design of ns is such that simulation of very large networks is difficult,
if not impossible, due to excessive memory and CPU time requirements. The PADS research group at
Georgia Tech has developed extensions and enhancements to the ns simulator to
allow a network simulation to be run in a parallel and distributed fashion,
on a network of workstations.

Objectives

We set out to provide a means for ns users to distribute their simulation on
several (e.g. 8-16) workstations connected either via a Myrinet network,
or a standard Ethernet network using the TCP/IP protocol stack. By
distributing the network model on several machines, the memory requirements
on any single system can be substantially smaller than the memory
used in a single-workstation simulation. The overall execution time of the
simulation should be at least as fast at the original single-workstation
simulation, and can be several times faster. In all cases we can support
proportionally larger network models by distributing the model on multiple
systems.

A key goal was to minimize the number of modifications required to the
released ns source code and to insure that all existing ns simulations would
still run properly when used with our modified ns. Minimizing the number of
changes to ns allows the parallel simulator to readily take advantage of new,
improved versions of ns as they become available. Any new or revised ns
syntax should be directly related to the parallelization of the simulation,
and does not affect ns users who are not using PDNS.

Approach

In order to achieve the goal of limited modifications to the base ns
software, we chose to use a federated simulation approach where separate
instantiations of ns modeling different subnetworks execute on
different processors. PDNS uses a conservative (blocking based) approach to
synchronization. No federate in the parallel simulation will ever process an event that would later have to be undone due to receiving messages in the
simulated past. This avoids the need to implement state saving in the existing ns code.

The PADS research
group at Georgia Tech has previously developed an extensive library of
support software for implementing parallel and distributed simulations (see
libSynk and
RTIKIT). The
sofware has support for global virtual time management, group data
communications, and message buffer management. It has
support for a variety of communication interconnects, including
shared memory, Myrinet and TCP/IP networks, and runs on a variety of
platforms. By using this synchronization software for the parallelization
of ns, we
were able to rapidly modify the main event processing loop of ns to support
the distributed time management functions needed to insure that no unsafe
event is ever processed by any federate.

The modifications needed to ns can be broadly classified in two major
categories, the modifications to the ns event processing infrastructure, and
extensions to the ns TCL script syntax for describing simulations. Each of
those categories are described in detail below.

Modifications to ns event processing

The standard ns release has several variants of the main event processing
loop which can be specified by the ns user as follows:

$ns use-scheduler Heap

which specifies that the heap based scheduler should be used. We developed a
new event scheduler known as Scheduler/RTI, which is specified by the ns user
as follows:

$ns use-scheduler RTI

The Scheduler/RTI uses the time management functions of the libSynk/RTIKIT
to insure
that local simulation time advances do not allow for the processing of unsafe
events. The Scheduler/RTI also contains code to process events received by a
federate which were generated by another federate, and places those new events
in the proper location in the event list.

Also related to event scheduling is the transmission of events from one
federate to another in the distributed simulation. This is handled by a newly
created ns agent called Agent/RTI. The Agent/RTI is responsible for
determining that a received event is destined for a remote federate,
preparing a message containing the complete event information, and forwarding
that to the remote federate using the RTI MCAST functions. The RTI Agents
are automatically inserted on any ns node that has a link to a remote
simulator.

Modifications to ns TCL syntax

In addition to the event scheduling modifications mentioned above, the way
that a network topology and network data flows are defined by ns need to be
enhanced to allow a federated ns simulation. Consider the simple topology
shown below. If this simple eight node simulation were run on a single ns,
then nodes R0 and R2 and their connecting link are simply defined as:

set r0 [$ns node]

set r2 [$ns node]

$ns duplex-link $r0 $r2 1.5mb 10ms DropTail

But when we decide to run the simulation in a distributed fashion on
Simulators A and B as shown below, the definition of the duplex-link is
problematic. In simulator A there is no notion of node r2, and in simulator B
there is no notion of node r0. We solve this problem by extending the ns
syntax to include the specification of a remote link, called an rlink. With
our extended ns syntax, the simulated links which are across federates are
defined using an rlink command, just specifying the local endpoint, and
identifying it with an IP Address. The other end of the simulated link is
defined in simulator B, and is also assigned an IP address. At runtime,
remote links with matching network addresses are logically connected, and
simulated packets which leave simulator A on the rlink are delivered to the
corresponding rlink in simulator B. Details on how this is done can be seen
in the example script linked below.

Defining data flows in ns is done similarly, defining the two endpoints of
the data flow by name. We have the same problem when the remote endpoint is
located on another federate, and we solve it in a similar fashion,
allowing a remote connection, called an ip-connect. With the
ip-connect command, the remote endpoint of a data connection is
specified by IP Address and port number rather than by ns node and
agent name.

PDNS Interface and Syntax

Assigning IP Addresses to Links

As shown in the examples, when using pdns you must assign an IP address to
any node that will be referenced remotely. In keeping with normal
networking practice, the IP addresses are actually assigned to the links, not
the nodes. The syntax for assigning an IP address is:

$linkname set-ipaddr address mask

Example:

[$ns link $n1 $n2] set-ipaddr 10.1.1.1 255.255.255.0

[$ns link $n2 $n1] set-ipaddr 10.1.1.2 255.255.255.0

linkname is the ns object for the link being assigned

address is the address being assigned, either in dotted
notation, or a 32 bit value.

mask is the corresponding netmask, either in dotted
notation or a 32 bit value. The netmask value is not used in
the current PDNS implementation, but is included for completeness.

Creating Remote Links

When using pdns, a Remote Link is a ns link where one node endpoint is
defined on one instance of pdns, and the second node endpoint is on
a different instance of pdns. Note that one end of the remote link
must have an IP address ending in ".1". For example, if you create a
remote link 192.168.1.1, the other end of the remote link on a
different PDNS instance must be 192.168.1.2 (assuming the netmask is
255.255.255.0). This is how PDNS matches remote links between two
different PDNS instances. The syntax to create a remote link is:

$nodename rlink linkspeed linkdelay qtype ipaddr addrmask

Example:

$n2 rlink 100Mb 5ms DropTail 192.168.1.1 255.255.255.0

nodename is the ns object for the node where the remote
link is being defined

linkspeed is the speed of the link

linkdelay is the propagation delay for the link

qtype is the associated queuing discipline for the link

ipaddr is the IP address of the local end of the link. The
other end of the link must have an IP address with an identical network
address (the IP address AND'ed with the network mask). Also, one end of the
rlink must have a host value of ".1".

addrmask is the mask value for the network portion of
the IP address.

Defining Remote Routes

When using pdns, since only a portion of the global topology is known to
each simulator, there are times when choosing the correct route between
nodes defined on different simulators is problematic. In the sample topology
above, router R0 has insufficient information to determine which of
the two defined remote links is the shortest path to host H0. We solve this
problem by explicitly specifying routes for each remote link, as follows:

In the above example, the first line would route data originating from
subnet 10.1.1.X through the rlink 192.168.1.1 if the destination IP
address is in the subnet 10.1.2.X. The second line would direct PDNS to
route all traffic originating from its instance through
the rlink 192.168.1.2 if the destination IP is in the subnet 10.1.3.X.
As you can see, the source IP and mask are optional; if it is not
specified then src-ip and src-mask will both be set to 0.0.0.0.

Note: If you have agents that produce traffic destined for
other agents within the same PDNS instance but must travel through
remote routes because no local paths exist between the two agents,
you must specify this route path. See the "Sample Scripts"
section below for a simple script that performs local->remote->local
routing.

ns is the simulator object

router-node is the node containing the rlink-ip

rlink-ip is the rlink-ip that will be used to route traffic
specified

dest-ip is the target IP address

dest-mask is the target IP mask

src-ip is the source IP address (optional)

src-mask is the source IP mask (optional)

Binding Port Numbers to Agents

As shown in the examples, when using pdns you must assign a port number to
any agent that will be referenced remotely. Both ends of a TCP connection
must have a port number. The syntax for assigning a port number is:

$nodename bind $agent portnum

Example:

$n1 bind $tcp_agent 80

nodename is the ns object for the node where the agent is
being bound

agent is the ns object for the agent being
bound

portnum is the port number desired. If zero is specified,
then the next available port number is assigned.

Defining Remote and Local Connections

In order for PDNS to correctly send data from one end-host to another,
the usual ns "connect" command will not suffice. Instead, the
following new directive allows PDNS to correctly route traffic destined
for remote simulators to the correct local border routers. However,
note that the following command can be used for both remote and
local connects.

$ns ip-connect $agent dest-ip port

Example:

$ns ip-connect $tcp_agent 192.168.3.4 80

ns is the simulator object

agent is the local source agent reference

dest-ip is the target ip to where data should be sent (can
be a remote or a local IP address)

port is the target's port number

Creating Dynamic TCP Endpoints

Sometimes it is necessary to have dynamic TCP connections created
at runtime. PDNS now includes a TCP Listener Agent which dynamically
allocates connections on the fly as the simulator is running, instead
of determining all connections at initiliazation. TCP Listener closely
models real TCP connections (i.e. multiple TCP connections per
destination port, a TCP connection table). Of course, the TCP Listener
Agent can accept pre-determined connections at initialization time. The
underlying TCP protocol behavior is inherited from tcp-full. Once
the three-way handshake is complete, TCP Listener promotes the
connection to tcp-full status by instantiating tcp-full agents and
connecting the source to it automatically. TCP Listener handles all
data de-multiplexing. TCP Listeners must be used with agents
that have IP addresses associated with them.

set tcpl [new Agent/TCP/Listener]

Example:

set tcpl [new Agent/TCP/Listener]

$ns attach-agent $n2 $tcpl

$n2 bind $tcpl 80

$tcpl set-application MyAgent/WebServer 0 callback 1 1

You can send TCP data to this agent from any PDNS instance via
ip-connect. If you need to attach an application (similar to
the WebServer above) to the listener, use the set-application
command below. Note that the sending tcp-full agent (initiator of the
TCP connection) should set close_on_empty_ true so completed
connections can be removed from the connection table (active
connection teardown):

$tcp_agent set close_on_empty_ true

Defaults:

max_synack_retries_ 0, number of SYN+ACK retries to attempt.
Note that this is set to 0 to maintain the same behavior as tcp-full.
In the real world, this is usually non-zero.

table_size_ 256, number of entries in the TCP connection
table.

entry_expire_ 6.0, seconds TCP Listener should wait before
timing out an entry in the connection table. Note that this is only
enabled if max_synack_retries_ is set to 0 because the
tcp-full timers and retry mechanisms are used to remove failed
three-way handshake attempts instead.

prebinding_ 0, enables or disables tcp-full prebinding
support. If enabled, you can add tcp-full agents to a list of
available agents TCP Listener can use during runtime (if TCP Listener
runs out of available agents, it will create them on the fly). This is
primarily a performance optimization.

params is the parameter to give agent application for
instantiation (e.g. during "new" call). Set to 0 to disable.

callback is the tcl procedure to callback when the
application is set. The callback procedure should be defined in the
application class. Note that the callback procedure should accept
a single parameter, which is the spawned tcp-full agent name. If
you do not need a callback, set this parameter to 0.

recv is either 0 or 1. Set to 1 to enable recv processing in
your application class.

reuse is either 0 or 1. Set to 1 to allow PDNS to reuse the
application agent for any incoming connections that has completed
the three-way handshake. If set to 0, PDNS will spawn new
application agents for each tcp-full agent it creates.

Accessing rlink Queues

You may need to access a remote link queue (i.e. for statistics), just as
any link in ns. The method for obtaining the queue object is:

$ns rqueue $node ipaddr

Example:

set $qo [$ns rqueue $n2 192.168.1.1]

ns is the simulator object

node is the node containing the remote link

ipaddr is the IP Address of the remote link

PDNS Simulations

The current version of PDNS has been tested on as many as 136 processors
simulating a 600,000+ node network topology. We have run on Myrinet
interconnected systems, Ethernet TCP/IP network connected systems, and
using shared memory on multiprocessor systems. The overall speedup
achieved by running a distributed simulation is dependent on many factors,
so it cannot accurately be predicted for any given simulation. In our
testing environment, the speedup achieved when running on eight systems
varies from about eight (near perfect speedup) to less than one.

Download and Install PDNS

All of the modifications to existing ns code only work with the
2.27 release. You will also need the libSynk libraries, which can
be compiled for Intel Linux, Intel Solaris, Sparc Solaris, SGI Irix, HP
UX, Macintosh OS X, and Microsoft Windows systems. These instructions
assume you have loaded the ns source directory at
~/ns-allinone-2.27, and libSynk at ~/libsynk.
If either one is located elsewhere, be sure to modify these instructions
accordingly. Please do not download the PDNS source with Internet
Explorer!

Source Files

Download either pdns-2.27-v1a or pdns-gtemu-2.27-v1a.
pdns-2.27-v1a is the most compatible version and can be compiled using
Intel's icc/ecc or gcc-3.2 (or previous) series compilers. If you need
GTemulator capability, download pdns-gtemu-2.27-v1a, however, please be
aware this version is only compatible with Linux systems and gcc.

Building libSynk

Decompress libSynk:

cd $(HOME)

gunzip -c libsynk-current.tar.Z | tar -xvpf -

cd libsynk

If you are using Myrinet, you must modify the appropriate makefiles before
compiling. For instance, using Myrinet under Linux:

For improved performance, you may want to specify -O2,
-O3, or any other optimization parameters for your
architecture.

Now create the libraries:

make

cd fdkcompat

make

Building ns-2 and PDNS

Decompress the baseline ns software:

cd $(HOME)

gunzip -c ns-allinone-2.27.tar.gz | tar -xvf -

Move the PDNS patches (pdns_2.27_v1a.gz) file
to the ~/ns-allinone-2.27/ns-2.27 directory and patch the
stock ns:

cd $(HOME)/ns-allinone-2.27/ns-2.27

gunzip -c pdns_2.27_v1a.gz | patch -p3

Next, edit the ~/ns-allinone-2.27/ns-2.27/Makefile.in
file:

Edit the KITHOME macro on line 64 to
reflect the correct directory of libSynk.

If you are using Myrinet, you will need to add
-lgm on line 103 of the Makefile.in
file.

Message Compression is enabled by default. To disable (not
recommended), remove the define -DUSE_COMPRESSION on line
68.

Again, you may want to add an optimization parameters for your
architecture into the Makefile under
CCOPT.

Install ns normally:

cd $(HOME)/ns-allinone-2.27

./install

The resulting binary for Parallel/Distributed ns will be
~/ns-allinone-2.27/ns-2.27/pdns.

Running PDNS

The pdns software uses libSynk to coordinate the startup
synchronization of the distributed simulation. The startup uses two
environment variables, NODEINFO (and usually
FMTCP_SESSIONNAME for TCP communications) to
coordinate the creation of the inter-process communication (sockets,
shared memory, etc.). These environment variables are described in
detail in the libSynk usage
documentation.

Note that you may specify the initial TSO Queue heap size and increments at
run time through the environment variables: BRTI_HEAPSIZE
and BRTI_HEAPINCR. The default initial heap size is
1000 events and the default increment is 10000 events.

Sample Scripts

test.tcl requires 3 pdns instances. To run, modify
run_test.sh to reflect the correct hostname(s) to be
used. Note the use of export for bash-style shells,
modify appropriately for csh, tcsh, etc. You do not need 3 machines to
run 3 pdns instances for this small script. You can place the same
hostname 3 times in the NODEINFO string (and the
script will ssh 3 times to the same host to spawn 3 pdns instances
accordingly). The file out.0 should have 5 dlink hops, 3
rlink hops, 8 total hops. The file out.1 should have 5
dlink hops, 2 rlink hops, 7 total hops. The file out.2
should have 0 dlink hops, 0 rlink hops, 0 total hops. dlink hops
stands for duplex-link hops or local hops within a pdns instance,
while rlink hops stands for remote-link hops or hops that span two
pdns instances. Note that rlink hops invoke libSynk's communication
primivites which are one of or a combination of shared memory, TCP/IP,
or Myrinet.

Network Emulation using Veil

This release of pdns also includes parts of emulation support for routing
real application traffic via pdns networks (designed and developed
by Kalyan Perumalla). This
is achieved using the Veil emulation framework, which captures the
application data at the socket API layer. The data capture portion is
included in the veil overloading library, while the data
injection/emitting code is included in pdns as the
LiveApplication Agent in
rti/liveapp.cc. For additional information, see the Veil homepage.