Be careful when redirecting both a process’s stdin and stdout to pipes, for you can easily deadlock

A common problem when people create a process and redirect
both stdin and stdout to pipes is that they fail to keep the pipes flowing.
Once a pipe clogs, the disturbance propagates backward until everything
clogs up.

You see code like this
all over the place.
I want to generate some input to a program and capture the output,
so I pump the input as the process's stdin and read the output
from the process's stdout.
What could possibly go wrong?

This problem is well-known to unix programmers,
but it seems that the knowledge hasn't migrated to Win32 programmers.
(Or .NET programmers, who also encounter this problem.)

Recall how anonymous pipes work.
(Actually, this description is oversimplified,
but it gets the point across.)
A pipe is a marketplace
for a single commodity: Bytes in the pipe.
If there is somebody selling bytes (Write­File),
the seller waits until there is a buyer (Read­File).
If there is somebody looking to buy bytes,
then the buyer waits until there is a seller.

In other words,
when somebody writes to a pipe,
the call to Write­File waits until
somebody issues a Read­File.
Conversely, when somebody reads from a pipe,
the call to Read­File waits until somebody
calls Write­File.
When there is a matching read and write, the bytes are
transferred from the writer's buffer to the reader's buffer.
If the reader asks for fewer bytes than the writer provided,
then the writer continues waiting until all the bytes have been read.
(On the other hand, if the writer provides fewer bytes than the
reader requested, the reader is given a partial read.
Yes, there's asymmetry there.)

Okay, so where's the deadlock in the above code fragment?
We write some data into one pipe (connected to a process's stdin)
and then read from another pipe (connected to a process's stdout).
For example, the program might take some input, do some transformation
on it, and print the result to stdout.
Consider:

You

Helper

WriteFile(stdin, "AB")

(waits for reader)

ReadFile(stdin, ch)

reads A

(still waiting since not all data read)

encounters errors

WriteFile(stdout,
"Error: Widget unavailable\r\n")

(waits for reader)

And now we're deadlocked.
Your process is waiting for the helper process to finish reading
all the data you wrote
(specifically, waiting for it to read B),
and the helper process is waiting for your process to finish
reading the data it wrote to its stdout (specifically,
waiting for you to read the error message).

There's a feature of pipes that can mask this problem for a long time:
Buffering.

The pipe manager might decide that when somebody offers
some bytes for sale,
instead of making the writer wait for a reader to arrive,
the pipe manager will be a market-maker and buy the bytes himself.
The writer is then unblocked and permitted to continue execution.
Meanwhile, when a reader finally arrives, the request is
satisfied from the stash of bytes the pipe manager had previously
bought.
(But the pipe manager doesn't take a 10% cut.)

Therefore,
the error case above happens to work, because the buffering
has masked the problem:

You

Helper

WriteFile(stdin, "AB")

pipe manager accepts the write

ReadFile(stdout, result)

(waits for read)

ReadFile(stdin, ch)

reads A

encounters errors

WriteFile(stdout,
"Error: Widget unavailable\r\n")

Read completes

As long as the amount of unread data in the pipe is
within the budget of the pipe manager,
the deadlock is temporarily avoided.
Of course, that just means it will
show up later under harder-to-debug situations.
(For example, if the program you are driving prints a prompt
for each line of input,
then the problem won't show up until you give the program a large
input data set: For small data sets, all the prompts will fit in
the pipe buffer, but once you hit the magic number,
the program hangs because the pipe is waiting for you to drain
all those prompts.)

To avoid this problem,
your program needs to keep reading from stdout
while it's writing to stdin,
so that neither will block the other.
The easiest way to do this is to perform the two operations
on separate threads.

Next time, another common problem with pipes.

Exercise:
A customer reported that this function would sometimes hang
waiting for the process to exit.
Discuss.

Simultaneously reading from a child process's stdout and stderr without the possibility of deadlock is tricky. One option is to use multiple threads. Another option is to use a named pipe opened with FILE_FLAG_OVERLAPPED (overlapped operations are not supported by anonymous pipes) and use asynchronous ReadFile calls along with WaitForMultipleObjects. Neither option is trivial.

You should be able to avoid a deadlock without threads by using PeekNamedPipe, which according to MSN allows a peek into anonymous pipes as well (despite the name). Haven't tested it yet though. However, in combination with writing to the processes stdin it is quite tricky to get things right without a deadlock (if not impossible as you cannot peek into the pipes buffer as far as i know). I am currently dealing with a situation like this in Axapta 3.0, which doesn't support sane multi-threading making things – well – interesting to say the least…

Exercise 1 – it's hanging because the process is waiting for input. And since you've redirected stdout/err, you never know it is waiting… you just get a blinky cursor. So you should see what is actually being generated and process it accordingly.

-customer attempts to follow the advice-

Exercise 2 – The bytes in stdout are emptied and the bytes in stderr are emptied before the condition occurs that requires attention. All you've done is, well, nothing. And you fall back into the same condition in Exercise 1.

-customer realizes theyre going about the problem the wrong way and cleans their act up-

Exercise #1: If the program being created writes enough data to the standard output/error to exceed the pipe buffers then there will be a deadlock. The helper program will be waiting for the spawning program to read the data, while the spawning program will be waiting for the helper program to exit.

Exercise #2: The MSDN points out this specific problem. If the helper program writes enough to the standard error stream to exceed the pipe buffers then a deadlock will occur. The helper program will be stuck waiting for the spawning program to read from standard error, but the spawning program will be waiting for the program to write to standard output or close the pipes/exit.

MSDN points out that you can use the asynchronous methods BeginOutputReadLine or BeginErrorReadLine of the Process class to solve this issue, by asynchronously reading one stream and synchronously reading the other. (You can also asynchronously read both or put synchronously read both in separate threads)

If you use a separate event (or other synchronization object), you might as well use that to signal when it's time to do a normal read of the pipe.

If you don't, you end up polling, which is terrible.

It really should be possible to pass pipe handles to the wait functions. Overlapped I/O is ridiculously difficult to get right if you work through all the possible cases (as Raymond has discussed in the past), unless you cheat and only read one character at a time or something.

@Medinoc: Actually, it's the other way around. According to the docs for CreatePipe, "Anonymous pipes are implemented using a named pipe with a unique name. Therefore, you can often pass a handle to an anonymous pipe to a function that requires a handle to a named pipe."

And yes, it's incredibly annoying that the blog software eats comments if you don't submit them within X seconds of loading the page, for some absurdly small value of X. I sincerely hope somebody's informed the MSDN blogs folks about this.

The list of objects that MSDN says WaitForSingleObject can wait on is not an exhaustive list. For example, file handles are waitable yet not on the list. So it's quite likely that named pipes are perfectly acceptable to wait on too.

As for pipes, if they even are waitable handles, I would not use them in a wait call without knowing what their wait semantics are (i.e. when they become signalled/unsignalled). You'd have to make an awful lot of assumptions to use them, and it'd be behaviour which could change.