Re: [RFC]serialize the output of parallel make?

From:

Paul Smith

Subject:

Re: [RFC]serialize the output of parallel make?

Date:

Fri, 30 Jul 2010 09:43:35 -0400

On Thu, 2010-07-29 at 22:44 -0700, Howard Chu wrote:
> The scheme that I grew up with on Alliant Concentrix was just to prefix each
> output line with its job number "|xx|blah blah blah". It obviously requires a
> pipe for each child process' output, so that lines can be read by the parent
> make and then the prefix attached.
The resource issue is one thing for sure, but even more than that I'm
not sure that would work with make's current, single-threaded design.
Make doesn't really have any "central loop" where we could add a
select() or whatever to check which children had output ready to be
processed, so _where_ to add this is a big issue... if we don't read the
pipe fast enough then jobs will slow down as they hang on the write().
I think asking make to do this work will simply cause your builds to
slow down a lot, unless we introduced threads and had a separate thread
doing that work.
Or, we could implement the other idea you had for more reliable
jobservers (avoiding the RESTART issue), which had make fork a process
and then had that process fork the job: in that environment there's an
extra process that can be used to manage each child's output. Of course
this has its own drawbacks on systems with very high process creation
overhead, like Windows.
> And "serialzation" you mean is not the same as I mean.
>
> I believe Paul and Edward fully understand what I mean.
I think Tim is saying the same thing: his solution will definitely work,
at least as well as having make do it. If make did the work then it
would invoke the command with stdout/stderr redirected to a temporary
file, then when the job was complete make would read and print those
files to stdout.
In Tim's solution, the command that make invokes (really, the shell make
invokes to run the command) saves its OWN output to a temporary file,
then when the command is done it gets a semaphore (to ensure
serialization) and dumps all that output.
Actually I suspect that Tim's solution would be MORE efficient, because
if make is reading large output files and streaming them to stdout,
that's time it DOESN'T spend doing other, make-like things. If you have
the command itself doing it then you get the advantage of
multi-processing involved.
I certainly don't see how it could be SLOWER; if you want to enforce
serialization then at some point, someone is going to have to
wait--that's more or less the definition of serialization. I don't see
how the command waiting is any less efficient than make doing basically
the same thing.
This is all assuming that by serialization you mean ONLY that the output
from each command will be grouped together, without interspersing any
other command's output. If you mean something more, such as that the
output of the commands appears in some deterministic fashion (for
example, given the rule "a: b c d" that the output of the command to
build "b" would always come before "c" and that would always come before
"d") then that's much more difficult, and not what I was suggesting.
--
-------------------------------------------------------------------------------
Paul D. Smith <address@hidden> Find some GNU make tips at:
http://www.gnu.orghttp://make.mad-scientist.net
"Please remain calm...I may be mad, but I am a professional." --Mad Scientist