How to wait in a bash script for several subprocesses spawned from that script to finish and return exit code !=0 when any of the subprocesses ends with code !=0 ?

Simple script:

#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait

The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?

Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?

17 Answers
17

wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.

Weel, since you are going to wait for all the processes it doesn't matter if e.g. you are waiting on the first one while the second has already finished (the 2nd will be picked at the next iteration anyway). It's the same approach that you'd use in C with wait(2).
–
Luca TettamantiDec 10 '08 at 14:41

5

Ah, I see - different interpretation :) I read the question as meaning "return exit code 1 immediately when any of subprocesses exit".
–
AlnitakDec 10 '08 at 14:51

26

PID may be reused indeed, but you cannot wait for a process that is not a child of the current process (wait fails in that case).
–
tkokoszkaDec 10 '08 at 15:27

6

You can also use %n to refer to the n:th backgrounded job, and %% to refer to the most recent one.
–
connyAug 12 '10 at 11:13

7

@Nils_M: You're right, I'm sorry. So it would be something like: for i in $n_procs; do ./procs[${i}] & ; pids[${i}]=$!; done; for pid in ${pids[*]}; do wait $pid; done;, right?
–
synackMay 27 '14 at 15:15

jobs -p is giving PIDs of subprocesses that are in execution state. It will skip a process if the process finishes before jobs -p is called. So if any of subprocess ends before jobs -p, that process's exit status will be lost.
–
tkokoszkaFeb 8 '09 at 15:06

5

Wow, this answer is way better than the top rated one. :/
–
e40Mar 29 '12 at 0:03

1

@e40 and the answer below is probably even better. And even better would probably be to run each command with '(cmd; echo "$?" >> "$tmpfile"), use this wait, and then read file for the fails. Also annotate-output. … or just use this script when you don't care that much.
–
HoverHellMar 29 '12 at 10:18

This looks like a great tool, but I don't think the above works as-is in a Bash script where doCalculations is a function defined in that same script (although the OP wasn't clear about this requirement). When I try, parallel says /bin/bash: doCalculations: command not found (it says this 10 times for the seq 0 9 example above). See here for a workaround.
–
nobarMay 28 '13 at 22:26

And if doCalculations relies on any other script-internal environment variables (custom PATH, etc.), they probably need to be explicitly exported before launching parallel.
–
nobarJun 4 '13 at 1:35

1

@nobar The confusion is due to some packagers messing things up for their users. If you install using wget -O - pi.dk/3 | sh you will get no confusions. If your packager has messed things up for you I encourage you to raise the issue with your packager. Variables and functions should be exported (export -f) for GNU Parallel to see them (see man parallel: gnu.org/software/parallel/…)
–
Ole TangeJul 7 '13 at 14:21

If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.

Exporting functions and variables may or may not be necessary, in any particular case.

You can set --max-procs based on how much parallelism you want (0 means "all at once").

GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.

The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.

Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.

According to wait man pages, wait with multiple PID's only returns the return value of the last process waited for. So you do need an extra loop and wait for each PID separately, as suggested in the accepted answer (in comments).
–
Vlad FrolovJul 6 at 19:17

However there's no apparent way to get the child's exit status in the signal handler.

Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.

What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.

If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.

The script launches all tasks in the first loop and consumes the results in the second one.

This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.

I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.

#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done

set -m allows you to use fg & bg in a script

fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds

while fg will stop looping when any fg exits with a non-zero exit status

unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)

I've just been modifying a script to background and parallelise a process.

I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.

#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo

From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.

I'm thinking maybe run doCalculations; echo "$?" >>/tmp/acc in a subshell that is sent to the background, then the wait, then /tmp/acc would contain the exit statuses, one per line. I don't know about any consequences of the multiple processes appending to the accumulator file, though.

There's should be any issues with multiple appenders, though return values may be written out of order so you don't known which process returned what...
–
Luca TettamantiDec 10 '08 at 15:22

You could just send identification info with the statuses. At any rate, OP only wanted to know if any of the subprocesses returned with status ≠ 0 without regard to which ones specifically.
–
Nietzche-jouDec 10 '08 at 15:29

to get 20 results instead of 21 do for i in $( seq 1 20 )
–
qkrijgerFeb 27 at 10:50