Anonymous Monk has asked for the
wisdom of the Perl Monks concerning the following question:

This is sort of a follow-up on a previous post asking for help with forks. I've been trying to learn, but I'm still very confused. I *think* I understand the basic idea, but the implementation still escapes me.

RichardK suggested I might be better off describing what I want to achieve so someone might make a suggestion as to the approach or module to use. So here it is in, pseudo-kind-of-code:

my @array1 = (1..5); # N=5
my @array2 = qw/a b/;
for my $val1 (@array1) {
# Start N processes here that can run in parallel.
# Each process outputs data to its own separate file.
# I will need it in the future.
for my $val2 (@array2) {
# For each of the N processes, wait until it is done,
# then start 2 parallel processes which use
# the output data as input.
# Save the output of each process separately to 2N files
# (two files with N elements would be better, but as I
# couldn't figure that out, I just postprocess the data ;-)
}
}
# "waitallchildren" or equivalent
# (Postprocess step to reduce the final output to 2 files
# - that I can do)

And here is my second attempt, using another module, but which still doesn't work. The outer part does, but as soon as I uncomment the inner part, my prompt doesn't return anymore. No idea what's going on.

This would run max 5 processes at the outer level, plus max 2*5 at the inner level, i.e. max 15 processes total (10 inner from the last round + 5 outer for the next round). I'm not entirely sure that's what you want, but it's at least something to play with...

P.S. If you uncomment the $0 = ... lines, you can grep for "processing" in the ps output to observe what's going on.

Thanks a lot! That's exactly what I was trying to write. I'm a little ashamed because it seems I should have been able to come up with it myself.

There is one funny thing, though: If I comment out the
"for my $val2 (@array2) {..." and the corresponding closing curly bracket (and change $val2 to a constant like "step2"), it breaks with Cannot start another process while you are in the child process at .../perl/lib/perl5/Parallel/ForkManager.pm line 463. Not that I would want that - in my case I need the two nested loops. It's just that I don't know what is happening there. I mean, in the nested loop, I would have assumed I was in the 1st child process, which is itself the parent of the 2nd child process, right?

You do "For each of the N processes, wait until it is done" N times, which means you wait for N*N processes when there are only N. Your pseudo code is flawed. As a result, I'm not sure what you are trying to do, but I think those loops shouldn't be nested.

I don't have Proc::Fork installed on my box, but using just ordinary fork (and swapping to Indirect Filehandles to avoid possible global collision issues) the following code generates 4 files that contain "Output of step 1" and 16 files that contain "Output of step 2". Assuming this is the intended result, this should hopefully be a helpful guide toward your real use case.

Update: It occurs to me that you likely don't want a blocking wait on your children, since this is specifically not generating simultaneous workers. You have a potential race condition in that scenario, since generating the first set of files may take more time than is required to get to generating the second set. You can resolve this in classic style using flock. You can also just ignore the problem of reaping kids using local $SIG{CHLD} = 'IGNORE';. The following code includes a 1 second sleep in the first loop to demonstrate blocking, and generates 21 simultaneous processes at max:

To avoid race conditions I'd suggest to delegate the starting of the two postprocessing children to the child itself, i.e. make them grandchildren. Fork off the function that writes the file, and as the last step fork again twice to produce the two children that will consume the output.

This would also facilitate things if you wanted to eliminate the temporary files. If it's possible to stream data to the two grandchildren, you could just "open my $kid, '|-"" a pipe to them and write to that instead of the file that you're going to read again later.

Yes, “learn about pipes” because this would make your job a helluva lot simpler. Start a pool of child-processes that read from a pipe and do the work that has been given to them by means of that pipe. (When the pipe is closed by the writer, the children’s read-requests fail and when this happens they terminate themselves.)

Likewise, instead of “starting” the second-stage processes when the first stage has finished, have the first-stage processes write messages to a second pipe that is listened-to by the second-stage processes which are built using the same design. After the first-stage processes consume their work and die-off, the second-stage processes in turn consume their work and die, and so on, until the parent finally realizes that all of its children have died (as expected) and it then terminates.

Now, all of the processes (regardless of their role) do their initialization and termination only once, and perform their jobs as quickly as they are able, and the pipes take up the slack. You tune the behavior of the system for maximum throughput by tweaking the number of processes that you create, and they perform work at that constant rate no matter how full or how empty the pipes may be.

Think: production line.

Edit: Responding if I may to BrowserUK’s not-so-Anonymous reply (and his exceedingly discourteous but not-unexpected downvote) to the above ... kindly notice that most multiprogrammed systems are and always have been built around the notion of a limited (but variable) number of persistent worker processes that produce and consume work from a flexible queue of some kind. Even in the earliest days of computing, when hulking IBM mainframe computers barely had enough horsepower to get out of their own way, their batch-job processing engines and interactive systems (e.g. CICS) had and still do have this essential architecture. The reason why is quite simple: you can tune it readily (just by adjusting the number of workers and/or their handling of the queues), and it performs at a predictable sustained rate without over-committing itself. The queues absorb the slack. Such an arrangement naturally conforms itself to, for example, computing clusters, and it gracefully supports the adding and removing and re-deployment of computing resources.

“Over-committing” a system produces performance degradation that becomes exponential after a period of time in which it is linear, a harrowing phenomenon called (politely) “hitting the wall.” The curve has an elbow-shaped bend which goes straight up (to hell). For instance, I once worked at a school which needed to run computationally-expensive engineering packages on a too-small machine. If one instance was running, it took about 2 minutes; with five, about 4. But with seven, each one took about 18 minutes and it went downhill from there ... fast. A little math will tell you that the right way to get seven jobs done in 6 minutes (on average) is to allow no more than five to run at one time. It worked, much to the disappointment of the IBM hardware salesman. The rest sit in a queue, costing nothing for the entire time they sit there not-yet-started. Likewise, a queue-based architecture will consistently deliver x results-per-minute at a sustained rate even if there are larger-y pieces of work to be performed. A thread (or process) is not a unit-of-work.

Thanks. I hadn't noticed the package in CPAN because it's not common to find the package you want so far down the list. However in this case I don't need to limit the number of processes, so I will forgo both Parallel::ForkManager and Forks::Super

So in the end, I am opting for plain standard fork as in the 1st example from kennethk, but, as suggested by mbethke with the inner loop and fork moved to the child of the outer fork (which is what happens in Eliya's example, I think)