Search

SIGALRM Timers and Stdin Analysis

It's not hard to create functions to ensure that your script
doesn't run forever. But what if you want portions to be timed while others
can take as long as they need? Not so fast, Dave explains in his latest
Work
the Shell.

In an earlier article, I started building out a skeleton script that would have the
basic functions needed for any decent shell script you might want to create.
I started with command-line argument processing with getopts, then
explored syslog and status logging as scripts. Finally, I ended that column
by talking about how to capture signals like Ctrl-C and invoke functions
that can clean up temp files and so on before actually giving up control of your
shell script.

This time, I want to explore a different facet of signal management in a
shell script: having built-in timers that let you specify an allowable
quantum of time for a specific function or command to complete with explicit
consequences if it hangs.

When does a command hang? Often when you're tapping into a network
resource. For example, you might have a script that looks up definitions by
handing a query to Google via curl. If everything's running
fine, it'll complete in a second or two, and you're on your way.

But if the network's off-line or Google's having a problem or any of
the million other reasons that a network query can fail, what happens to
your script? Does it just hang forever, relying on the curl
program to have its own timeout feature? That's not good.

Alarm Timers

One of the most common alarm timer approaches is to give the entire script a
specific amount of time within which it has to finish by spawning a
subshell that waits that quantum, then kills its parent. Yeah, kinda
Oedipal, but at least we're not poking any eyes out in this script!

There's no "trap" involved—easy enough. Notice especially that the closing
parenthesis has a trailing ampersand to ensure that the subshell is pushed into
the background and runs without blocking the parent script from proceeding.

A smarter, cleaner way to do this would be for the timer child subshell to
send the appropriate SIGALRM signal to the parent—a small tweak:

If you do that, however, what do you need in the parent script to capture the
SIGALRM? Let's add that, and let's set up a few functions along the way
to continue the theme of useful generic additions to your scripts:

Note that both scripts have debugging output that's probably not needed
for actual production code. It's easily commented out, but running it as is will
help you understand how things interact and work together.

The problem is, what happens if it finishes up in less time than allotted? The
subshell is still out there, waiting, and it pushes out the signal to a
nonexistent process, causing the following sloppy error message to show
up:

sigtest.sh: line 7: kill: (10532) - No such process

There are two ways to fix this, either kill the subshell when the parent
shell exits or have the subshell test for the existence of the parent
shell just before it sends the signal.

Let's do the latter. It's easier, and having the subshell float
around for a few seconds in a sleep is certainly not going to be a waste of
computing resources.

The easiest way to test for the existence of a specified process is to use
ps and check the return code, like this:

ps $$ >/dev/null ; echo $?

If the process exists, the return code will be 0. If it's gone, the
return code will be nonzero. This suggests a simple test:

if [ ! $(ps $$ > /dev/null) ]

But, that won't work because it's the return code, not what's
handed to the shell. The solution? Simply invoke the ps command, then
have the expression test the return code:

The situation arises if the second code block is started before the first
timer runs out. Imagine that you've allocated 100 seconds for the first
timed block and it finishes in 90 seconds. Regular code takes five seconds,
then you're in block two, for exactly ten seconds. Then the first ALRM
timer triggers, after ten seconds rather than another 100.
Not good.

This is admittedly a bit of a corner case, but to fix it, let's
reverse the decision about having child processes test for the existence of
the parent before sending the signal and instead have the parent script kill
all child subshells upon completion of the timed portion. It's a bit tricky to
build, because it requires the use of ps and picks up more processes than
just that subshell, so you not only need to screen out your own process,
you
also want to get rid of any subshell processes that aren't actually the
script itself.

I use the following:

ps -g $$ | grep $myname | cut -f1 -d\ | grep -v $$

This generates a list of process IDs (pids) for all the subshells running,
which you then can feed to kill:

pids=$(ps -g $$ | grep $myname | cut -f1 -d\ | grep -v $$)
kill $pids

The problem is that not all of those processes are still around by the time
they're handed to the kill program. The solution? Ignore any
errors generated by PID not found:

If you're thinking "holy cow, multiple timers in the same script is
a bit of a mess", you're right. At the point where you need something
of this nature, it's quite possible that a different solution would be a
smarter path.

Further, I'm sure there are other ways to address this, in which
case I'd be most interested in hearing from readers about whether
you've encountered a situation where you need to have multiple timed
portions of your code, and if so, how you managed it! Send e-mail via
http://www.linuxjournal.com/contact.

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a
really long time. He's the author of Learning Unix for Mac OS
X and Wicked Cool Shell Scripts. You can find him on Twitter
as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.