copyTree parallel

This is the parallel version of the directory copy function. The difference is that instead of waiting for
a file to finish copying before starting the next, it will process
multiple files at once. Both versions presented here run about 3x faster
than their sequential counterparts.

In NodeJS

It might seem more obvious how to achieve parallelism in NodeJS than in
λanguage. With Node you pass the callback and know that execution of the
caller continues immediately at the next line, so it opens the possibility
to use a for loop. We must be careful not to invoke the callback
until all the files were processed, though. The parallel version in
NodeJS looks about as ugly as the sequential one:

By using that forEach loop it starts copying all files in a
directory in parallel. The callback (next) needs to check if all
files were copied before invoking the original callback.
Additionally, we need to check if there are zero files in a directory in
which case we directly invoke callback.

In λanguage

To write the parallel version in λanguage we need a primitive function.
That's because our wrappers around the asynchronous Node API will only
invoke their callback after the work is done, so by default our program
can only run sequentially.

I wrote a parallel primitive which takes a function and calls it
with a single argument that I'm gonna name pcall. pcall
receives a function argument and calls it “asynchronously” — that is, it
resumes the caller program immediately while that function continues
running. Finally, the parallel() call returns only after all
pcall-s in its body have finished. If that sounds complicated,
seeing the usage might be enlightening:

So the change we had to do is literally trivial: wrap the forEach
loop in a call to parallel, and use pcall to process each
file. We need not count how many files there are and invoke a callback in
random places; everything is handled by the primitive. Our code still
looks sequential, but works in parallel!

A primitive that combines parallel and
forEach would allow for even better looking code, and not hard to
do; but parallel seems more general.

One might argue that our λanguage code is beautiful because of our
parallel abstraction, and had we done something similar in NodeJS
we'd end up with readable JavaScript code. This is somewhat true, as
we'll see, but even with a helper function the NodeJS code won't get as
clear and concise as above.

k is the continuation of the parallel call itself. And
f is the function to invoke with the pcall argument.
f is being passed a continuation which does if (i == 0)
k(false) — this is necessary in case no pcall was invoked in the
body of f; we just call the original continuation passing a
false result.

pcall is a well behaved function: it takes a continuation
(kpcall) and a function to run (chunk). It calls the
chunk but immediately invokes its own pcall continuation (with
false; we don't have a meaningful result at this point). So, if
chunk is asynchronous, then it will run in parallel with the rest
of the program (represented by kpcall).

pcall does some housekeeping to keep track of how many times it
was invoked. On the last invocation (where n == 0) it executes
the original continuation of parallel, passing to it an array with
the result values (this is not needed for our copyTree case, but
it might be useful in general).

Wanna play with it? It's defined in this page. The dostuff function
below is “asynchronous” — it just prints something after a timeout, and
returns the text. Check the difference of using it with and without
parallel below.

Notice the time for the non-parallel calls amounts to the total of time
spent in each task, that is, 1700ms; while the time for the parallel one
is roughly the time spent in the longer task (1000ms). The results are
collected in an array and returned after all pcall-s have
finished, in the proper order.

Problems of copyTree

If you try to run it on a directory containing thousands of files you
might have the surprise of getting a “EMFILE” exception. That occurs
because by processing multiple files at once, we're hitting the filesystem
limit of open files. Fortunately, this is very easy to fix in λanguage by
just changing the primitive. Here's the new version:

A global PCALLS variable holds the maximum number of parallel
calls allowed at once. When that maximum is reached, we schedule a retry
after 5 milliseconds. I found 1000 to be a good value for PCALLS (if you
make it too low it will easily starve and will run even much slower than
the synchronous version).

That's it, our copyTree in λanguage remains intact — we only had
to make the primitive throttle the pcalls when more than 1000 are running.
Our program remains beautiful and sequential-looking, yet it's fast, runs
in parallel and can handle directories of any size.