To avoid race conditions I'd suggest to delegate the starting of the two postprocessing children to the child itself, i.e. make them grandchildren. Fork off the function that writes the file, and as the last step fork again twice to produce the two children that will consume the output.

This would also facilitate things if you wanted to eliminate the temporary files. If it's possible to stream data to the two grandchildren, you could just "open my $kid, '|-"" a pipe to them and write to that instead of the file that you're going to read again later.

Yes, “learn about pipes” because this would make your job a helluva lot simpler. Start a pool of child-processes that read from a pipe and do the work that has been given to them by means of that pipe. (When the pipe is closed by the writer, the children’s read-requests fail and when this happens they terminate themselves.)

Likewise, instead of “starting” the second-stage processes when the first stage has finished, have the first-stage processes write messages to a second pipe that is listened-to by the second-stage processes which are built using the same design. After the first-stage processes consume their work and die-off, the second-stage processes in turn consume their work and die, and so on, until the parent finally realizes that all of its children have died (as expected) and it then terminates.

Now, all of the processes (regardless of their role) do their initialization and termination only once, and perform their jobs as quickly as they are able, and the pipes take up the slack. You tune the behavior of the system for maximum throughput by tweaking the number of processes that you create, and they perform work at that constant rate no matter how full or how empty the pipes may be.

Think: production line.

Edit: Responding if I may to BrowserUK’s not-so-Anonymous reply (and his exceedingly discourteous but not-unexpected downvote) to the above ... kindly notice that most multiprogrammed systems are and always have been built around the notion of a limited (but variable) number of persistent worker processes that produce and consume work from a flexible queue of some kind. Even in the earliest days of computing, when hulking IBM mainframe computers barely had enough horsepower to get out of their own way, their batch-job processing engines and interactive systems (e.g. CICS) had and still do have this essential architecture. The reason why is quite simple: you can tune it readily (just by adjusting the number of workers and/or their handling of the queues), and it performs at a predictable sustained rate without over-committing itself. The queues absorb the slack. Such an arrangement naturally conforms itself to, for example, computing clusters, and it gracefully supports the adding and removing and re-deployment of computing resources.

“Over-committing” a system produces performance degradation that becomes exponential after a period of time in which it is linear, a harrowing phenomenon called (politely) “hitting the wall.” The curve has an elbow-shaped bend which goes straight up (to hell). For instance, I once worked at a school which needed to run computationally-expensive engineering packages on a too-small machine. If one instance was running, it took about 2 minutes; with five, about 4. But with seven, each one took about 18 minutes and it went downhill from there ... fast. A little math will tell you that the right way to get seven jobs done in 6 minutes (on average) is to allow no more than five to run at one time. It worked, much to the disappointment of the IBM hardware salesman. The rest sit in a queue, costing nothing for the entire time they sit there not-yet-started. Likewise, a queue-based architecture will consistently deliver x results-per-minute at a sustained rate even if there are larger-y pieces of work to be performed. A thread (or process) is not a unit-of-work.

But that's not what I want. I need the files created by the outer loop (the "$val1.txt" ones) for later work (next week maybe, when I get around to programming it), and I need to know exactly which is which.

What I don't necessarily need is the output from the 2nd step. I would rather have just 2 files, but I don't know yet how to do it. I used to simply open two files and print to them accordingly, but the input was all jumbled, and I was told to use pipes instead. And here I am up to my knees in parents and children again :-/

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other