Erlang resumes running CouchDB when couchjs procs are terminated with extreme
prejudice. The hang still occurs after reverting fdmanana's COUCHDB-1334
commit. This could be a race condition during invalidation of the views, and
subsequent deletion of the related ddoc view directory prior to reindexing.
On Windows a filesystem object cannot be deleted if there are open handles
remaining.

As discussed with rnewson on IRC, this is the classic Windows vs. Unix file removal issue - not an attempt to remote a directory using file:delete/1.

UNIX allows you to delete a file that is still in use. The directory entry is removed, but the inode is still marked as in use until the last file handle is closed, at which time the disk space is freed. Windows doesn't allow removal of a file with open file handles until they are all closed, returning EACCES upon delete request instead.

A quick fix would be to progressively back off and retry the delete a few times, if this is just timing-related. If not, more care is going to have to be taken to avoid the race/etc.

Joan Touzet
added a comment - 24/Nov/11 21:39 As discussed with rnewson on IRC, this is the classic Windows vs. Unix file removal issue - not an attempt to remote a directory using file:delete/1 .
UNIX allows you to delete a file that is still in use. The directory entry is removed, but the inode is still marked as in use until the last file handle is closed, at which time the disk space is freed. Windows doesn't allow removal of a file with open file handles until they are all closed, returning EACCES upon delete request instead.
A quick fix would be to progressively back off and retry the delete a few times, if this is just timing-related. If not, more care is going to have to be taken to avoid the race/etc.

Atomic moves of filenames are allowed within the same filesystem
on Windows, even if the file is open.

Modifying nuke_dir to relocate all files into .delete & then remove
looks like a reasonable approach to resolving this cross-platform,
that is crash-safe and can use a backoff to clean up if reqd on
Window when view files are still held open.

Dave Cottlehuber
added a comment - 10/Jun/12 10:07 Sorry, lost this ticket by not assigning it to myself.
Atomic moves of filenames are allowed within the same filesystem
on Windows, even if the file is open.
Modifying nuke_dir to relocate all files into .delete & then remove
looks like a reasonable approach to resolving this cross-platform,
that is crash-safe and can use a backoff to clean up if reqd on
Window when view files are still held open.
Erlang R15B01 (erts-5.9.1) [source] [smp:2:2] [async-threads:0]
Eshell V5.9.1 (abort with ^G)
1> pwd(),
1> file:make_dir( "nested"),
1> file:make_dir( ".delete"),
1>
{ok, File}
= file:open( "nested/file", [append] ),
1> file:write(File , <<"fill me up!">>).
C:/tmp
ok
%% can't move a directory that has open handles
2> file:rename( "nested", ".delete nested").
{error,einval}
3> file:rename( "nested", ".delete/nested").{error,einval}
%% but we can move files that have open handles
4> file:rename( "nested/file", ".delete/file").
ok
5> file:write(File , <<"Moar Data">>).
ok
6>

I took a closer look at this ticket today and came up with the patch below to ensure that view index files are closed before we attempt to nuke the directory when executing a reset_indexes call. Dave Cottlehuber helped me load the patch on a test Windows instance at EC2, and we confirmed that the patch allows the "basics" entry in the test suite to run to completion when it hadn't before. So, yay for that.

Unfortunately, we're noticing frequent hangs in other portions of the test suite, including but not limited to the "design_options" test. My observations from an evening running tests:

The design_options test always passes with the log level set to "debug".

When the level is set to "info" the test often hangs, and the Erlang VM seems to hang as well.

The hang often occurs on the second invocation of the test.

The hang occurs with and without my patch.

That last point is rather crucial. Dave Cottlehuber indicated that the test had been passing, but I can't seem to make that happen even with a stock build of the HEAD of 1.3.x. My current recommendation is to review and apply this patch as it only improves matters during my testing. That being said, I'd love to understand the root cause of these hangs when debug logging is disabled.

Adam Kocoloski
added a comment - 02/Dec/12 04:53 - edited I took a closer look at this ticket today and came up with the patch below to ensure that view index files are closed before we attempt to nuke the directory when executing a reset_indexes call. Dave Cottlehuber helped me load the patch on a test Windows instance at EC2, and we confirmed that the patch allows the "basics" entry in the test suite to run to completion when it hadn't before. So, yay for that.
Unfortunately, we're noticing frequent hangs in other portions of the test suite, including but not limited to the "design_options" test. My observations from an evening running tests:
The design_options test always passes with the log level set to "debug".
When the level is set to "info" the test often hangs, and the Erlang VM seems to hang as well.
The hang often occurs on the second invocation of the test.
The hang occurs with and without my patch.
That last point is rather crucial. Dave Cottlehuber indicated that the test had been passing, but I can't seem to make that happen even with a stock build of the HEAD of 1.3.x. My current recommendation is to review and apply this patch as it only improves matters during my testing. That being said, I'd love to understand the root cause of these hangs when debug logging is disabled.
diff --git a/src/couch_index/src/couch_index_server.erl b/src/couch_index/src/couch_index_server.erl
index 48fa8e4..bc1fce7 100644
— a/src/couch_index/src/couch_index_server.erl
+++ b/src/couch_index/src/couch_index_server.erl
@@ -160,7 +160,9 @@ reset_indexes(DbName, Root) ->
% shutdown all the updaters and clear the files, the db got changed
Fun = fun({_, {DDocId, Sig}}) ->
[{_, Pid}] = ets:lookup(?BY_SIG,
{DbName, Sig}
),
couch_util:shutdown_sync(Pid),
+ MRef = erlang:monitor(process, Pid),
+ gen_server:cast(Pid, delete),
+ receive
{'DOWN', MRef, _, _, _}
-> ok end,
rem_from_ets(DbName, Sig, DDocId, Pid)
end,
lists:foreach(Fun, ets:lookup(?BY_DB, DbName)),

Thanks Adam for your help on this. The most important point above is that the VM is hanging. It's possible that the IO during debug logging is too fast for the VM to keep up with, and that we end up blocking other IO. I will re-do these tests with +A 4 (which is what the Windows daemon & also unix uses also) and see if this helps. I am +1 on moving ahead with this patch & the release, and to see if the OTP team have any suggestions on troubleshooting the VM hang. I should make some good progress on this today also.

Dave Cottlehuber
added a comment - 02/Dec/12 10:22 Thanks Adam for your help on this. The most important point above is that the VM is hanging. It's possible that the IO during debug logging is too fast for the VM to keep up with, and that we end up blocking other IO. I will re-do these tests with +A 4 (which is what the Windows daemon & also unix uses also) and see if this helps. I am +1 on moving ahead with this patch & the release, and to see if the OTP team have any suggestions on troubleshooting the VM hang. I should make some good progress on this today also.

I did some more reading and log-based debugging this morning. We're doing something rather screwy. It turns out the the individual view index processes are already monitoring their parent DB; when we delete the DB, the index is automatically closed:

The handler for that 'DOWN' message will invoke the gen_server's terminate function, which closes the file descriptor cleanly. So that's A Good Thing. Unfortunately, we don't always let that cleanup run to completion, because we've got this separate DB update notification listener:

The couch_index processes do not trap exits, so when the reset_indexes function calls shutdown_sync it terminates the couch_index process immediately, bypassing any additional cleanup that we wanted to do. The patch I wrote allows for the termination to finish cleanly.

It seems to me that even with the existing shutdown_sync invocation we'd eventually close all the file descriptors because of exit signal propagation, but at that point we may be racing the process that tries to delete them (and when we lose, we hang). The clean shutdown avoids that race.

Adam Kocoloski
added a comment - 02/Dec/12 14:03 I did some more reading and log-based debugging this morning. We're doing something rather screwy. It turns out the the individual view index processes are already monitoring their parent DB; when we delete the DB, the index is automatically closed:
https://github.com/apache/couchdb/blob/bde29b/src/couch_index/src/couch_index.erl#L80
https://github.com/apache/couchdb/blob/bde29b/src/couch_index/src/couch_index.erl#L296-L300
The handler for that 'DOWN' message will invoke the gen_server's terminate function, which closes the file descriptor cleanly. So that's A Good Thing. Unfortunately, we don't always let that cleanup run to completion, because we've got this separate DB update notification listener:
https://github.com/apache/couchdb/blob/bde29b/src/couch_index/src/couch_index_server.erl#L159-L168
The couch_index processes do not trap exits, so when the reset_indexes function calls shutdown_sync it terminates the couch_index process immediately, bypassing any additional cleanup that we wanted to do. The patch I wrote allows for the termination to finish cleanly.
It seems to me that even with the existing shutdown_sync invocation we'd eventually close all the file descriptors because of exit signal propagation, but at that point we may be racing the process that tries to delete them (and when we lose, we hang). The clean shutdown avoids that race.

Volker Mische
added a comment - 04/Dec/12 18:44 This issue sounds similar to one we had at Couchbase.
The Couchbase ticket: http://www.couchbase.com/issues/browse/MB-6957
Which lead to Filipe's OTP patch: https://github.com/erlang/otp/commit/7c9ed2ed55e297dd78e7434851fa20063aa365f2
And to a patch for the Couchbase-couchdb fork (which is still needed even when the OTP fix is in, see the ticket for more information): http://review.couchbase.org/22042

The VM isn't actually completely hung, just blocking on IO and the console is part of that. This seems not to change based on +A x settings. The VM is waiting on input from couchjs processes, and when these are killed, couch springs back into life. Reducing [os_query_servers] os_process_limit = 1 still sees a hang, although as a bonus you can still talk to the VM now.

Swapping in couchjs.exe from 1.2.0 shows no discernable difference, both hang.

Dave Cottlehuber
added a comment - 05/Dec/12 23:12 What are we testing off?
A fresh 1.3.x https://www.dropbox.com/sh/jeifcxpbtpo78ak/Jw-fm6LJrR/snapshots/20121202/setup-couchdb-1.3.0a-f1b2304-git_otp_R15B01.exe and a new .beam file with Adam's patch here https://www.dropbox.com/sh/jeifcxpbtpo78ak/L_6G_6HeeT/snapshots/20121202/couch_index_server.beam also that needs to be dropped into place.
What are we seeing?
The VM isn't actually completely hung, just blocking on IO and the console is part of that. This seems not to change based on +A x settings. The VM is waiting on input from couchjs processes, and when these are killed, couch springs back into life. Reducing [os_query_servers] os_process_limit = 1 still sees a hang, although as a bonus you can still talk to the VM now.
Swapping in couchjs.exe from 1.2.0 shows no discernable difference, both hang.

@dch - The fact that you can kill couchjs to free up the VM is quite intriguing. Is it possible to get a stack trace on Windows like you can with gdb on linux? If you can, having a stack trace of a couchjs that you then kill to free the VM could possibly be the key to figuring that out.

Paul Joseph Davis
added a comment - 05/Dec/12 23:41 Reviewed Adam's patch and I'm +1 on applying it.
@dch - The fact that you can kill couchjs to free up the VM is quite intriguing. Is it possible to get a stack trace on Windows like you can with gdb on linux? If you can, having a stack trace of a couchjs that you then kill to free the VM could possibly be the key to figuring that out.

Generally, there are repeated hangs during basic things like fputs/freads.
I see buffers in couchjs that appear to be missing a character such as
'\nnteger', & from what I read couchjs is not returning from the blocking
fputs/fgets OS call. I can't tell if the port is still open on the erlang
side or not, I assume so.

I think this is because we are doing open_port(... binary ...) in Erlang,
but opening the stream as text mode only in Windows. Thus the next line
will lose a character (explaining the '\nnteger' stuff I was seeing in
the debugger), and eventually we'll hit a point where we waitin on an
additional character that will never be sent.

I tried changing fopen to "rb" but I think we only use this for files like main.js and not stdin:

fputc('\n', stream); // dch: we never return from this
fflush(stream);
}

COUCHSPAWNKILLABLE ###

couchspawnkillable looks clean to me, its just waiting for the couchjs process to terminate:

// Wait for the process to terminate so we can reflect the exit code
// back to couch.
WaitForSingleObject(pi.hProcess, INFINITE);
if (!GetExitCodeProcess(pi.hProcess, &exitcode))
return 6;
return exitcode;
}

The code snippet you provided you marked fputc as the call we're blocked on but the traceback makes me think its the line after that with the fflush call.

Still, it looks as though we're blocked trying to flush out that buffer from Windows and assuming Erlang is just sitting around waiting for a message then I could see this being a classic dead lock (depending on how Erlang does its line buffering). I'll try and find the Erlang internals that manage this bit of code to see what might be possible.

Paul Joseph Davis
added a comment - 06/Dec/12 19:05 The code snippet you provided you marked fputc as the call we're blocked on but the traceback makes me think its the line after that with the fflush call.
Still, it looks as though we're blocked trying to flush out that buffer from Windows and assuming Erlang is just sitting around waiting for a message then I could see this being a classic dead lock (depending on how Erlang does its line buffering). I'll try and find the Erlang internals that manage this bit of code to see what might be possible.

Paul Joseph Davis
added a comment - 06/Dec/12 20:39 Dave Cottlehuber While I read through code here, one of the things you might want to try is changing that
{line, 4096}
option to
{line, 1024}
or even something really small like
{line, 64}
to see if my theory on mismatched buffer sizes is anywhere near the mark.

I've done a fair few runs now and we are blocking on return from fputc every time. Internally the NT API looks like it is unable to flush data to the pipe, and that's why we end up blocking. Not really sure how to peer inside this further.

Dave Cottlehuber
added a comment - 07/Dec/12 12:56
notes for dch ###
There are a lot of changes in emulator's sys.c
https://github.com/erlang/otp/commit/3985e56bfa82083b23978e06809b8ab0cf9d52d8
https://github.com/erlang/otp/commit/e11bbb37273ebd2408d0c83c62770f9ef023879d
https://github.com/erlang/otp/commit/fed20c731b91f1debb11a809cc5999d8b04dd293
https://github.com/erlang/otp/commit/2a400b88a44ea35a992b3b46acb84a632bd7d7e9
and some variations on how these are handled (via pierre) elsewhere, who has seen similar issues:
https://github.com/ghc/packages-process/blob/master/cbits/runProcess.c
https://github.com/php/php-src/commit/03b39896a6df71fb0ccf379a9148eba910f6eb84
https://github.com/php/php-src/commit/49666eece6b8d5de74f1bae2537e8cc3919203ee
https://github.com/php/php-src/commit/1cd82c9db23749b692b35fccd2ba17ee9d1f60c2
https://bugs.php.net/bug.php?id=51800
https://bugs.php.net/bug.php?id=44942
and how couch does it:
https://github.com/apache/couchdb/blob/1.3.x/src/couchdb/priv/spawnkillable/couchspawnkillable_win.c
reverting some changes e.g. https://github.com/apache/couchdb/commit/a851c6e5 has no impact.

This change makes the communication between the Erlang VM and
an external view server (couchjs for e.g.) more efficient by
writing a series of commands into the port and reading all the
responses from the external view server after doing all those
writes. This minimizes the amount of time each endpoint spends
blocked reading from the port.

Jan Lehnardt
added a comment - 10/Dec/12 22:34 There is always the possibility that I got something wrong, but three elaborate bisect sessions lead me to:
drumroll
a851c6e5150d14221ca018587d76214856c1555a is the first bad commit
commit a851c6e5150d14221ca018587d76214856c1555a
Author: Filipe David Borba Manana <fdmanana@apache.org>
Date: Sun Nov 6 14:25:04 2011 +0000
More efficient communication with the view server
This change makes the communication between the Erlang VM and
an external view server (couchjs for e.g.) more efficient by
writing a series of commands into the port and reading all the
responses from the external view server after doing all those
writes. This minimizes the amount of time each endpoint spends
blocked reading from the port.
COUCHDB-1334
:040000 040000 608c3712f9544111205f952d9c88456f41c1c9ed 13cbf3c017ab73fba15a4278ded73285ef0215aa M src

Jan Lehnardt
added a comment - 10/Dec/12 23:59 And as a point of confirmation, master with the patch reverted works as expected.
Our options:
A)
revert and release 1.3.0
understand the root cause and work around it if possible and re-land in 1.4 or later.
B)
revert and reintroduce with a OS-switch that disables the code on Win32.
understand the root cause and work around it if possible and re-enable in Win32 in 1.4 or later.
Given that we had zero reports of random lockups with 1.3.0 yet (would love to hear from Jason who has 1.3.x running on IrisCouch for a few weeks), I currently opt for B).
People who should look at this:
Jason
Paul
Dave
Anyone else who is interested.

This change makes the communication between the Erlang VM and
an external view server (couchjs for e.g.) more efficient by
writing a series of commands into the port and reading all the
responses from the external view server after doing all those
writes. This minimizes the amount of time each endpoint spends
blocked reading from the port.

Dave Cottlehuber
added a comment - 11/Dec/12 07:59 After a long night of bisectus interruptus, I get the same result:
erl@werl /relax/couchdb
$ git bisect good
a851c6e5150d14221ca018587d76214856c1555a is the first bad commit
commit a851c6e5150d14221ca018587d76214856c1555a
Author: Filipe David Borba Manana <fdmanana@apache.org>
Date: Sun Nov 6 14:25:04 2011 +0000
More efficient communication with the view server
This change makes the communication between the Erlang VM and
an external view server (couchjs for e.g.) more efficient by
writing a series of commands into the port and reading all the
responses from the external view server after doing all those
writes. This minimizes the amount of time each endpoint spends
blocked reading from the port.
COUCHDB-1334
& doing a clean build from 1.3.x just reverting this patch works on both my build boxes (this time :/).
+0 on A in general given that many of us have been using 1.3.x actively for months on other platforms without issue.
+1 for B.

Dave Cottlehuber
added a comment - 12/Dec/12 21:32 OK, closing this as we have resolved the nuke_dir situation now, and further discussion on the COUCHDB-1334 issue should take place on the re-opened ticket. Thanks everybody who has helped on this.
Adam's patch was applied to master + 1.3.x for the underlying issue; COUCHDB-1334 remains in place on master but reverted for 1.3.x.