Tuesday, April 24, 2007

Java process management arghhs

Pretend like you'd like to write a Java program that wants to launch and lightly manage
some operating system processes. Know how to do this? You'll want to use one of the
flavors of the
Runtime exec() methods.
If you happen to be running on a J2SE 1.5 or greater JVM, you can use the new
ProcessBuilder
class, but there's seems to be not much benefit to doing so, and of course, your code won't
run on a J2SE 1.4 or earlier JVM if you do. Both Runtime.exec() and ProcessBuilder
end up returning an instance of the Process class, which models the launched operating
processes (the child).

One of the things I learned the hard way with the Process class, many years ago, was that you really need
to process the stdout and stderr output streams of the child process, because if you
don't, and your child process writes more than a certain amount
(operating system dependent) of data on those streams, it'll block on the write, and thus
'freeze'. So, you need to get the stdout and stderr streams via the
(confusingly named) getInputStream() and (logically named) getErrorStream()
methods of the
Process class.

The simplest thing to do to handle this is to launch two threads, each reading from
these input streams, to keep the pipes from getting clogged. Do whatever you need to do
with the data being output by your child process; I'm sure you want to do something
interesting with it.

I've not really had a chance to work with the
java.nio
package before, so I happened to think that this would be a good chance to play; let's see if we
can get from two threads handling the child process's output, down to one,
by using the class
Selector,
which would seem to be the moral equivalent of the *nix function
select(),
which can be used to determine the readiness of i/o operations of multiple handles at
the same time.

Looking at Selector, you can tell right from the top of the doc, that this class deals with
SelectableChannel objects. Now, you need to figure out how to get from an
InputStream (returned by Process.get[Input|Error]Stream()) to
a SelectableChannel. Except, you can't. From any InputStream,
you can call Channels.newChannel() to get a ReadableByteChannel,
but a ReadableByteChannel is not a SelectableChannel.
(ReadableByteChannel is an interface, and SelectableChannel
is a class.)
This is not the end of the world; it may
just be that the object returned by Channels.newChannel() is actually
an instance of SelectableChannel (or a subclass thereof). Never know.
So, here's a little experiment for an Eclipse scrapbook page:

Bad news; it's not a SelectableChannel, and would appear to be an instance of an inner class
of the Channels class itself. Poking into inner class, that class is a subclass of
java.nio.channels.spi.AbstractInteruptibleChannel. Not selectable at all. All this
inner class business is for the default 1.5 JRE I use on my mac. Other implementations
might well be different (and better), but that doesn't help me on the mac.

So, unfortunately, you can't reduce my two threads to process a child process's stdout and
stderr down to one, using this select technique.
This might not sound so bad, but what if you wanted to be able to handle
a lot of processes? To handle N simultaneous processes, you'll need N * 2 threads to
process all the stdout and stderr streams. If you could have used the select logic, you'd only
need 1 thread.

Note that you could also try polling these streams, by calling
available() on them, to determine if they have anything to read. Polling isn't
very elegant, and is obviously going to be somewhat cpu intensive. But for me, I got
burned on available() a long time ago, don't trust it, and never use it.

Bummer.

But wait, it gets better!

Another thing you're going to want to do with these child processes is to determine
when they're done. There's two ways of doing this. You can call
Process.exitValue() which returns the exit value of the process. Unless the
process hasn't actually exited, in which case it throws an
IllegalThreadStateException. The other way to determine when the process
is done is to call Process.waitFor(); this method will block until the
process has exited.

Neither of these is very nice. If you use waitFor(), you'll burn
a thread while it blocks (now up to N * 3 threads per process!).
If you use exitValue(), you can poll, but
every check of the process, while it's not complete, is guaranteed to throw
an exception, which is going to burn even more cpu.

8 comments:

Found your post via google after running into the ReadableByteChannel/SelectableChannel dilemma. I can't believe that the java.nio implementation in JDK1.5 doesn't handle the stdout/stderr incantation that would surely be its most common use case, possibly after the ServerSocket multiplexing that is the only other example one ever sees. I would love to know a solution if there was one, or a rationale for why there isn't one otherwise.

You can reduce this down to one thread - you call ProcessBuilder.redirectErrorStream(true) which merges stdout and stderr into one InputStream. After that, it's just a matter of reading the InputStream until the stream closes, i.e. when the process has terminated.

John, that's true, that can be done, but most of the time I actually want them separate - external processes I'm dealing with don't guarantee that their output to stdout and stderr are distinguishable. If I mix them up then I can't always tell if there was any error logged for example.

A real pity, the N*3 constraint (3 threads needed per monitored process) , I just tried to "update" my ProcessShell class by using a Selector and ended on this page....Is there something new (and better) regarding this subject in Java 1.6 ?

Found your page by googling for a solution to the same problem you outline. I'm still finding it hard to believe there's no way to multiplex the output of multiple sub-processes (yes, Process.getInputStream is confusingly named) using the Selector framework -- seems like an obvious use of Selector. It does seem like you can get away with N*2 threads instead of N*3 by having your 2 threads reading stdout and stderr of the sub-process watch for read() returning -1 indicating end of stream -- at that point I would guess it's safe to assume the process has exited. Just a guess. Thanks for the original post.