Getting a slave pvmd started is a messy task
with no good solution.
The goal is to
get a process running on the new host,
with enough
identity
to let it be fully configured and added
as a peer.

Ideally,
the mechanism used
should be
widely available, secure, and fast,
while
leaving the system easy to install.
We'd like to avoid having to type passwords all the time,
but don't want to put them in a file from where they can be stolen.
No one system meets all of these criteria.
Using
inetd
or
connecting to an already-running pvmd or pvmd server at a
reserved port
would allow fast,
reliable startup,
but would require that a system
administrator install PVM on each host.
Starting the pvmd
via
rlogin or telnet
with a chat script
would allow access even to
IP-connected hosts behind firewall machines
and
would require no special privilege
to install;
the main drawbacks are speed
and the
effort needed to get the chat program working
reliably.

Two widely available systems are
rsh and rexec();
we
use both to cover the cases where a password
does and does not
need to be typed.
A manual startup
option allows the user to take the place of a
chat program,
starting the pvmd by hand and typing in the configuration.
rsh is a privileged program that can be
used to run commands on another host without a password,
provided the destination host
can be made to trust the source host.
This can be done either
by making it equivalent (requires a system administrator)
or by creating a .rhosts file on the destination host
(this isn't a great idea).
The alternative,
rexec(), is a function compiled into the pvmd.
Unlike rsh,
which doesn't take a password,
rexec() requires the user to supply one at run time,
either by typing it in
or by placing it in a .netrc file (this is a really bad idea).

Figure: Timeline of addhost operation

Figure
shows
a host being added to the machine.
A task calls pvm_addhosts()
to
send a request to its pvmd,
which in turn sends a DM_ADD message to the master
(possibly itself).
The master pvmd
creates a new host table entry for each
host
requested,
looks up the IP addresses,
and sets the options
from host file entries
or defaults.
The host descriptors are kept in a waitc_add structure
(attached to a wait context)
and not yet added to the host table.
The master
forks the pvmd'
to do the dirty work,
passing it a list of hosts and commands to execute
(an SM_STHOST message).
The pvmd' uses rsh, rexec() or manual startup
to start each pvmd,
pass it parameters,
and
get a line of configuration data back.
The configuration dialog between pvmd'
and a new slave is as follows:

The
addresses of the master and slave pvmds
are passed on the command line.
The slave writes its configuration on standard output,
then waits for
an EOF from the pvmd'
and disconnects.
It runs in
probationary status
(runstate = PVMDSTARTUP)
until it
receives the rest of its configuration
from the master pvmd.
If it isn't configured within five minutes
(parameter DDBAILTIME),
it assumes there is some problem with the master
and quits.
The protocol revision
(DDPROTOCOL)
of the slave pvmd must match that of the master.
This number is incremented whenever a change in the protocol
makes it incompatible with the previous
version.
When several hosts are added at once,
startup is done in parallel.
The pvmd' sends the data (or errors)
in a DM_STARTACK message
to the
master pvmd,
which completes the host descriptors
held in the wait context.

If a special task
called a hoster
is registered with the master pvmd when it receives
the DM_ADD message,
the pvmd' is not used.
Instead,
the SM_STHOST message
is sent to the hoster,
which
starts the remote processes as described above
using any mechanism it wants,
then
sends a SM_STHOSTACK message (same format as DM_STARTACK)
back to the master pvmd.
Thus,
the method of starting slave pvmds is dynamically replaceable,
but the hoster does not have to understand the configuration
protocol.
If the hoster task fails during an add operation,
the pvmd uses the wait context to recover.
It assumes none of the slaves were started
and sends a DM_ADDACK message indicating a system error.

After the slaves are started,
the master
sends each a DM_SLCONF message
to set parameters not included in the startup protocol.
It then
broadcasts a DM_HTUPD message
to all new and existing slaves.
Upon receiving this message,
each slave knows the configuration of
the new virtual machine.
The master waits for an acknowledging DM_HTUPDACK message
from every slave,
then broadcasts
an HT_COMMIT message,
shifting all to the new host table.
Two phases are needed so that new hosts are not advertised
(e.g., by pvm_config()) until all pvmds know the new
configuration.
Finally,
the master
sends a DM_ADDACK reply to the original request,
giving the new host id's.

Note:
Recent experience suggests it would be cleaner to
manage the pvmd' through
the task interface
instead of the host interface.
This approach would allow
multiple starters to run at once
(parallel startup is implemented explicitly
in a single pvmd' process).