Rudolfs Kreicbergs
added a comment - 2011 Sep 09 17:24 Thanks, alix, for spotting this. The correct solution is actually to call WaitForSingleObject() for the job - it is signaled when all the processes are done.
Could you please send a link to the article about the SetHandleInformation() call? I would like to get to the bottom of the problem, thanks!

Rudolfs Kreicbergs
added a comment - 2011 Sep 09 18:30 Thanks, I found the same one - I'll look more into it. It seems a bit weird that we need to call SetHandleInformation() on a handle, that we do not pass to the child process.

Rudolfs Kreicbergs
added a comment - 2011 Sep 12 13:01 Jobs are used to handle process trees - the initial child may exit before the whole tree has finished. We can add the whole process tree to a job and then monitor the job instance.
It looks like waiting for the job to finish will be not as easy as it seemed in the beginning - WaitForSingleObject() does not work (i.e. times out) in my tests as well opposed to what is written in MSDN: http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx#managing_job_objects . Let's see how to solve this....

I'm still not convinced about jobs. we're launching a single process and it won't exit until all subprocesses it starts are finished. at least my tests show that this approach works flawless, and I'm starting quite a number things piped together in my userparameters.
of course process can launch something 'detached' from itself, and we don't care about the fate of these detached childs and thus don't need to wait for them.

I heard there are plans to make simultaneous, not consecutive userparameter checks and maybe then jobs will be useful.

alix
added a comment - 2011 Sep 12 15:50 I'm still not convinced about jobs. we're launching a single process and it won't exit until all subprocesses it starts are finished. at least my tests show that this approach works flawless, and I'm starting quite a number things piped together in my userparameters.
of course process can launch something 'detached' from itself, and we don't care about the fate of these detached childs and thus don't need to wait for them.
I heard there are plans to make simultaneous, not consecutive userparameter checks and maybe then jobs will be useful.

In general, there is no guaranty that a child will not exit before it's children do - "detached" child process is a normal case. Moreover, even if we call a process that will not exit before it's children do, timeout may occur and we would still need other means of killing the process tree - that's why jobs were introduced in ZBX-3519.

Rudolfs Kreicbergs
added a comment - 2011 Sep 12 16:06 In general, there is no guaranty that a child will not exit before it's children do - "detached" child process is a normal case. Moreover, even if we call a process that will not exit before it's children do, timeout may occur and we would still need other means of killing the process tree - that's why jobs were introduced in ZBX-3519 .

it's pretty interesting, and is right, in theory. but I'd like to test what exactly happens when children of a parent (and thus parent) hits timeout value and we have to kill its parent process. I'll report what I've found.

alix
added a comment - 2011 Sep 12 16:47 it's pretty interesting, and is right, in theory. but I'd like to test what exactly happens when children of a parent (and thus parent) hits timeout value and we have to kill its parent process. I'll report what I've found.

MSDN suggests 2 solutions:
1) http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx
Use WaitForSingleObject() on job handle which should return on either timeout or when the job gets signaled after all associated processes have exited - that simply does not work at all. Found no additional info on the requirements on this setup, however, found a warning that the job might get signaled in other cases as well.
2) http://msdn.microsoft.com/en-us/library/aa365198%28v=vs.85%29.aspx
Use I/O completion ports. However, that requires a separate thread to be monitoring the completion port (otherwise, we might get a race condition when the job is completed before we start monitoring the port). Furthermore, MSDN states that "delivery of messages to the completion port is not guaranteed" which make this solution also unusable.

Thus we come to two "bad" options:
A) Wait for the main process - if the first child exits, we assume that all scripts should have finished and force-kill all other processes still associated with the job. This means that users should always keep the main process running until everything is finished.
B) Query jobs process count every N miliseconds till it is zero or a timeout occurs.

After talking to Sasha, decided to implement option A since it is rather reasonable and option B is really dirty. If there is a better solution, please let us know.

Rudolfs Kreicbergs
added a comment - 2011 Sep 13 14:07 - edited MSDN suggests 2 solutions:
1) http://msdn.microsoft.com/en-us/library/ms684161%28v=VS.85%29.aspx
Use WaitForSingleObject() on job handle which should return on either timeout or when the job gets signaled after all associated processes have exited - that simply does not work at all. Found no additional info on the requirements on this setup, however, found a warning that the job might get signaled in other cases as well.
2) http://msdn.microsoft.com/en-us/library/aa365198%28v=vs.85%29.aspx
Use I/O completion ports. However, that requires a separate thread to be monitoring the completion port (otherwise, we might get a race condition when the job is completed before we start monitoring the port). Furthermore, MSDN states that "delivery of messages to the completion port is not guaranteed" which make this solution also unusable.
Thus we come to two "bad" options:
A) Wait for the main process - if the first child exits, we assume that all scripts should have finished and force-kill all other processes still associated with the job. This means that users should always keep the main process running until everything is finished.
B) Query jobs process count every N miliseconds till it is zero or a timeout occurs.
After talking to Sasha, decided to implement option A since it is rather reasonable and option B is really dirty. If there is a better solution, please let us know.

yesterday I experimented with jobs too and couldn't make WaitForSingleObject (I've also tried WaitForMultipleObjects) to signal when spawned process is finished. we just sit there until timeout hits despite the fact that spawned process and all its children has already been finished.

so basically we're stuck with what I've suggested in the first place it still is a pretty reasonable solution.
I'll keep this task in my head anyway and post if something interesting pops up. there's got to be a way to force jobs do what we want.

alix
added a comment - 2011 Sep 13 14:59 yesterday I experimented with jobs too and couldn't make WaitForSingleObject (I've also tried WaitForMultipleObjects) to signal when spawned process is finished. we just sit there until timeout hits despite the fact that spawned process and all its children has already been finished.
so basically we're stuck with what I've suggested in the first place it still is a pretty reasonable solution.
I'll keep this task in my head anyway and post if something interesting pops up. there's got to be a way to force jobs do what we want.

So it seems Moreover, my tests show that all the waiting is done in zbx_read_from_pipe() even when we have "detached" child processes - the output handle is kept open till the last process in the child tree exits (though I'm pretty sure there are other cases as well).

Rudolfs Kreicbergs
added a comment - 2011 Sep 13 17:23 So it seems Moreover, my tests show that all the waiting is done in zbx_read_from_pipe() even when we have "detached" child processes - the output handle is kept open till the last process in the child tree exits (though I'm pretty sure there are other cases as well).

It seems that we do NOT need to remove handle inheritance. Removing handle inheritance is important in a different situation:
1) We create a pipe in the parent process
2) Pass the read end to the child process
3) Run the child process and call ReadFile() in the child process() till it returns 0 and sets the error to ERROR_HANDLE_EOF (end of file)
4) Write the data to the pipe and use CloseHandle() to indicate that all data has been sent
5) Child process does not get ERROR_HANDLE_EOF since the child has inherited the write end of the pipe as well and it has not been closed that handle - only when ALL handles are closed the child will get ERROR_HANDLE_EOF.
If inheritance for the write end of the pipe had been removed, the child process would have correctly received ERROR_HANDLE_EOF.

In our case we pass the write end of the handle and in fact this ensures that we will read all the input of the child and it's child processes if the handle gets inherited to the whole tree - even if the first child exits, the parent will wait till all the handles are closed. So it is even more unlikely that waiting for the initial child process after reading from the pipe will do any harm as compared to waiting for the whole process tree if it was possible.

Note that the parent process does not pass the read handle and keeps it open till either a timeout occurs or the other end is closed by all child processes (and thus no process should be interested whether we are still keeping it open). So in our case there is no need to remove the inheritance for the read end.

Rudolfs Kreicbergs
added a comment - 2011 Sep 14 13:40 - edited Regarding handle inheritance, there is an article in MSDN explaining it: http://msdn.microsoft.com/en-us/library/aa365782%28v=vs.85%29.aspx
It seems that we do NOT need to remove handle inheritance. Removing handle inheritance is important in a different situation:
1) We create a pipe in the parent process
2) Pass the read end to the child process
3) Run the child process and call ReadFile() in the child process() till it returns 0 and sets the error to ERROR_HANDLE_EOF (end of file)
4) Write the data to the pipe and use CloseHandle() to indicate that all data has been sent
5) Child process does not get ERROR_HANDLE_EOF since the child has inherited the write end of the pipe as well and it has not been closed that handle - only when ALL handles are closed the child will get ERROR_HANDLE_EOF.
If inheritance for the write end of the pipe had been removed, the child process would have correctly received ERROR_HANDLE_EOF.
In our case we pass the write end of the handle and in fact this ensures that we will read all the input of the child and it's child processes if the handle gets inherited to the whole tree - even if the first child exits, the parent will wait till all the handles are closed. So it is even more unlikely that waiting for the initial child process after reading from the pipe will do any harm as compared to waiting for the whole process tree if it was possible.
Note that the parent process does not pass the read handle and keeps it open till either a timeout occurs or the other end is closed by all child processes (and thus no process should be interested whether we are still keeping it open). So in our case there is no need to remove the inheritance for the read end.