Copyright Notice

This text is copyright by CMP Media, LLC, and is used with
their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in
SysAdmin/PerformanceComputing/UnixReview magazine.
However, the version you are reading here is as the author
originally submitted the article for publication, not after their
editors applied their creativity.

As a Unix system administrator, I'm often faced with those little
mundane tasks that seem so trivial to me but so important to the
community I'm supporting. Little things like ``hey, is that host up
and responding to pings?''. Such tasks generally have a very
repetitive nature to them, and scripting them seems to be the only way
to have time to concentrate on the tasks that really need my
attention.

Let's look at the specific task of pinging a number of hosts on a
subnet. Now, there are tools to do this quickly (like nmap), and
there are even Perl modules to perform the ping (as in
Net::Ping), but I wanted to focus on something familar that can be
launched from Perl as an external process, and the system ping
command seems mighty appropriate for that.

Here, I'm firing up a subshell to execute the ping -i 1 -c 1
command, which on my system requests ping have a 1-second timeout,
and selects (as Sean Connery's character said in The Hunt for Red
October so eloquently) ``one ping only''. Your ping parameters may
vary: check your manpage.

The output is scanned for the string 0 packets rec, which if absent
means we got a good ping. So if the match is found, we return 0 (the
ping was bad), otherwise we'll return 1. The ping command spits
out some diagnostics on standard error, which we'll toss using
Bourne-shell syntax.

Note that the value of $host is not checked here for sanity. We
certainly wouldn't want to accept a random command-line parameter or
(gasp) a web form value here without some serious validation.
However, as we use this in our program, all of the values will be
internally generated, so we've got some degree of safety.

So to scan a particular subnet, looking for hosts that are alive, we
would add to that subroutine something like:

Now, this routine completes very quickly for hosts that are alive, but
is slow-as-molasses for hosts that aren't present, because the TCP
protocol demands that the host have a chance to respond.

So how can we speed that up? This is not a CPU-intensive loop:
practically the entire time is waiting for some remote host to
respond. We'll leave the ping_a_host subroutine alone, because
that's not where we have a problem: it's doing its job as fast as it
can. What we need to do is to do more of them at a time.

One first approach is to fork a separate process for each host we want
to ping. We'll then sit back in a wait loop. As each child process
completes, we'll note its exit status, and when there are no more
kids, we'll spit out a report.

So, first, we'll define the host list for the task:

my @hosts = map "10.0.1.$_", "001".."010";

The numbers here are padded to three digits so that they sort as
strings in a numeric sequence, a cheap but effective trick. Note also
that I'm only selecting the first 10 hosts this time. I'll explain
that shortly.

Next, we'll want a hash to keep track of the kids:

my %pid_to_host;

The keys of this hash will be the child process ID (PID), and the
value will be the corresponding host that the child is processing.
Next, we'll want to loop over the host list, firing up a child for
each:

As each host is placed into $_, we'll fork. The result of fork is
a child process running in parallel with the parent process. These
processes are distinguished only by the return value of fork, which
is 0 in the child, but the child's PID in the parent. So, if we get
back a non-zero value, we're the parent, and we'll store the PID into
the hash, along with the host that particular child is processing. If
we're the child, then we'll call the ping_a_host routine, and
arrange for our exit status to be good (0) if that routine gives a
thumbs up.

The warn in the loop is merely for diagnostic purposes so that you
can see what's happening. In a production program, I'd certainly
remove that.

At the end of this loop, we'll have a number of processes. Far too
many, in fact. For each host to check, we'll have two processes
running: the shell forked by the backquotes, and the ping process
itself. Perl has to fork a shell because I needed that child to have
its standard error output redirected. If I could have gotten the
redirection out of those backquotes somehow, we'd have only one child
process per host, not two.

Launching 20 processes to check 10 hosts will start pushing us up
against the typical per-user process limit. And now you can see why I
didn't do all 254 hosts at once!

Now it's time to wait for the results. A simple ``wait'' loop will
reap the children as fast as they complete their task. First, we'll
declare a hash to hold the results:

my %host_result;

The key will be the host, and the value will be 1 if the child said
it was pingable, otherwise 0.

As long as we've got kids (indicated by the ever decreasing size of
the %pid_to_host hash), we'll wait for them. The child process ID
comes back from wait, which we'll stick into $pid. At this
point, the exit status of that particular child is in $?. If the
return value of wait is negative, then we don't have any more kids.
This is an unexpected result, which we could check later by noticing
that %pid_to_host is not yet empty, or we could have simply died
here.

Next, we'll use the %pid_to_host hash to map the PID into the host
for which it was processing. Again, we might have accidentally reaped
a completed child which wasn't one of ours, so defensive programming
requires checking for that. This won't happen unless other parts of
this program are also forking children somehow, but I'm a cautious
programmer most of the time.

Finally, we'll take the exit status in $?, and map it into the
appropriate good/bad value for the result hash.

When this loop completes, we have no more kids performing tasks, and
it's time to show the result:

For each key of the result table, we'll say whether the result was
good or bad.

Putting this all together makes a nice little demo of forking 20 kids
to check 10 hosts, but it won't scale to 254 hosts, because that would
require more process slots than we typically have (or want to use,
actually). What we need to do is perform the forking gradually, so
that we never have more than 20 kids at a time. One naive approach
is to chunk the data into bite-size bits:

Here, most of the code above gets wrapped into an outer loop which
hands 10 hosts at a time to be processed, using splice to peel them
off of the master list. While this strategy certainly solves the ``no
more than 10 at a time'' condition, each batch of 10 has to wait for
the slowest of the 10 to complete.

A better way would be to fork until we hit the limit of active
children, then wait for any one child to finish before we need to fork
again. First, we'll need to factor out ``waiting for a kid'' into a
subroutine so we can call it in two different places: while forking a
new task, and at the end to reap all the remaining children:

Note that we're accessing %pid_to_host and %host_result directly
here, so those variables must be in scope before the subroutine
definition. The subroutine now returns 1 if a kid was reaped, and 0
otherwise. The final reap loop now becomes:

## final reap:
1 while wait_for_a_kid();

At this point, the program functions identically to the prior one,
except that I've refactored the kid reaping. The magic happens next.
We'll put wait_for_a_kid in the middle of the forking loop as well,
just before we're about to fork, conditionally if the number of kids is
already at the maximum we chose:

for (@hosts) {
wait_for_a_kid() if keys %pid_to_host >= 10;
...

Ahh. That does it. We can now crank @hosts back up to our 254
items. As we fire off the first 10, this new statement has no effect.
But when it comes time for the 11th, we'll wait until at least one of
the other 10 to complete first. So, at no time do we have more than
10 hosts active (using 20 child processes for reasons explained
earlier). The entire program is given here in case you want to see
it all in context:

As a working program, this does pretty good, although it could be made
a bit more robust, and is very specific to the particular ping
program on my machine. If you don't want to write this pattern of
code into each program that wants to do parallel things, look at
Parallel::ForkManager in the CPAN, which does pretty much the same
thing with a friendly interface.

One improvement to this program might be to pre-fork and re-use the
children, using some sort of IPC (pipes or sockets) to communicate
additional tasks to perform as each task completes, but I've run out
of space to talk about that here. Until next time, enjoy!