Tarik here. When you think of scripts, you don’t usually think of threads, synchronization or deadlocks. However, that was precisely the curious bug I had to investigate a few weeks ago. The script I was trying to write had a simple goal: parse the standard and error outputs from another existing command-line tool (a console application), and write the processed (modified) text to the command prompt.

The repro case

My console application tool was fairly complex, but I was able to simulate the hang condition even with the following simple program:

To help isolate the problem further, I went ahead and removed all the parsing logic that my Visual Basic script contained. Even with this simplified configuration where the output was redirected as-is, I continued to get the hang behavior, as shown in the following output:

C:\ScriptHang>cscript test.vbs consoleapp.exe TRACE: --- Output begin --- This is an output sent to stdout. This is an output sent to stdout. This is an output sent to stdout. ... This is an outpu^C

Here is the simplified script that I was using for this test experiment. It sure looked innocent enough!

If execObj.ProcessID = 0 And execObj.Status = 1 Then ' Error launching the executable. Run = -1 Exit Function End If

Do ' Process stdout. Do Until execObj.StdOut.AtEndOfStream WScript.StdOut.Write(execObj.StdOut.Read(1)) Loop ' Process stderr. Do Until execObj.StdErr.AtEndOfStream WScript.StdErr.Write(execObj.StdErr.Read(1)) Loop Loop Until (execObj.Status <> 0 And execObj.StdOut.AtEndOfStream And execObj.StdErr.AtEndOfStream)

' Return the exit code Run = execObj.ExitCode End Function

Choosing the debugging tool/environment

I must admit that I was both surprised and stumped at first. Why would a fairly simple script like this one hang? My immediate reaction was to attach the WinDbg user-mode debugger to the script host process (cscript.exe) and see what it was waiting on. Unfortunately, that wasn’t terribly useful. All I could see was a call to ReadFile, presumably related to the script’s call in progress to read from the console application’s stdout stream (as evidenced by the truncated message coming out of the script when it stopped printing to the console). However, there was no telling what was preventing this call from completing:

Attaching WinDbg to the consoleapp.exe test tool showed a similar call stack, except the main thread in the process was stuck inside a call to the WriteFile Win32 API, which was in turn invoked by one of the fprintf calls in the C++ test program:

At this point, I knew I had to use a kernel debugger so that I could peek at the kernel side of these two threads and see precisely what file objects they were waiting on. Because I didn’t have a second machine to use as a host kernel mode debugger, I went for the next best thing: local (live) kernel debugging.

Digging deeper with the debugger

After enabling kernel debugging on the machine (and rebooting), I re-ran the scenario once again and used local kernel debugging to examine the previous two call stacks. I started with the call in the script host process:

It turns out the thread is waiting on the internal event of a named pipe file object to get signaled after the I/O read request was posted. This can also be verified by comparing the event object stored in the named pipe object structure with the notification event encountered earlier in the thread’s call stack:

Because the named pipe is currently empty, the thread blocks and waits for new bytes to be written to the named pipe stream.

Note The blocking behavior in the named pipes case (NPFS file system) isn’t really surprising. However, you should also remember that even in the case of NTFS file handles, synchronous I/O read or write requests are handled similarly and involve a two-phase process. First, the request is queued up and the thread blocks, and then when the I/O is actually completed (hopefully very shortly thereafter in the case of NTFS), the internal event handle of the file object is signaled to unblock the requesting thread, which then completes the call and returns to user mode.

The question now is why consoleapp.exe isn’t writing any new data to its stdout stream (keeping the script waiting for new input). For that, let’s look at the state of its main thread (which we had already seen was blocked in a WriteFile call). It’s easy again to see that we have a very similar situation, where WriteFile is “pended” but cannot complete because the named pipe became full.

Everything starts to make sense now. It appears that the stderr pipe of the console process is full (this happens when the internal pipe buffer fills up with unprocessed data), and so the test tool isn’t able to write any more characters into the pipe until a client has pumped data off of it. However, cscript.exe hasn’t gotten around to reading from stderr just yet (the second loop in test.vbs). At the same time, cscript.exe has read all the characters from the stdout stream of the tool and is waiting on the test tool to emit more data into the empty pipe. This results in a deadlock!

Potential mitigations

Clearly, the console application tool isn’t at fault here. It is simply writing messages to its stdout and stderr streams. The problem lies within the script stream object implementation. The AtEndOfStream property was unfortunately designed as a blocking call when it could arguably have been designed to first peek the pipe (using for instance PeekNamedPipe) to see if there was any data in the pipe’s buffer before reading those available bytes. This issue has in fact been documented in a Microsoft KB article; see http://support.microsoft.com/kb/960246.

If you were writing your own client program (C++ or C#, for example) to parse the output of the console program, you could easily do just that (peek the pipe first before reading). But when using VBScript or JScript, you are unfortunately stuck with the built-in implementation of the scripting engine stream object (unless you want to implement and deploy your own COM scripting object to all the client machines).

A simple approach that works around this problem in the scripting engine’s stream object is to redirect the data from the stdout and stderr streams of the console application to temporary files on the file system. Once the files are written to disk, the test script could then parse and modify the data from those files as needed. This solution essentially buffers the streams manually instead of relying on the scripting object to redirect the standard and error output streams from the child process on the fly.

Conclusion

This case study demonstrated several important steps in a typical debugging investigation:

1.Isolate a good repro case: When I started noticing the deadlock behavior in the script from this case study, it wasn’t actually occurring every time I ran the script, but it was depending on the specific command output from the console tool that I was passing to the script. Simplifying the console application to continuously write interleaving stderr and stdout messages highlighted the conditions under which the hang occurred, making for a 100% repro rate as I increased the number of iterations in the console application.

2.Pick the debugging environment that you’ll use to examine the scenario: In this case, the script was hung in native code, outside the Visual Basic script code that I had written in Visual Studio. In fact, the hung script was waiting on an event in kernel mode! WinDbg is a great debugger to use when examining native code, and because I also wanted to dig deeper into what was happening on the kernel side of the fence, I ended up using live (local) kernel debugging. Even when debugging high-level code such as a script program, low-level debugging techniques can still sometimes prove very useful.

3.Understand the code paths that you’re dealing with: A debugger leads you to the door step of the failure, but it’s still up to you to unveil the mystery by understanding how the code you’re debugging is supposed to work. In this case study, it was important to understand several system architecture fundamentals such as how named pipes work internally, how I/O requests are processed by the WindowsOS, and many more aspects of synchronization. It was also useful to search the Internet for similar issues, which in this case led me right to a known KB article on the MSDN website.

4.Determine possible bug fixes: The final step once you’ve understood the nature of the bug that you’re dealing with is to think about potential solutions. The more you understand about the bug, of course, the easier it is going to be to come up with sound approaches, so this step usually runs in parallel with the previous one.