ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux - A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.

Hi, i am writing a program that launches multiple applications each of them as a separate process. I am using fork and execve to accomplish this.
What i'd like to do is find a efficient way to watch over this applications. Once I launch an application, i'd like to find out if it exits normally, or if it fails for any reason. if it fails over i'd like to relaunch it.

I was thinking of having another process as a watch process, and calling waitpid() for each application, i don't know if i can call it waitpid multiple times or if they would block each other. Worse case scenario was to have a thread for each application and wait until each application terminated, but I think this is not efficient at all.

how can I get signaled when any of the application that are launched which run in child processes exit normally or fail. My understanding is that wait calls block that process until it gets signaled, but since there is multiple applications running at the same time, more than one can exit or terminate at the same time, so how can i be notified that more than application terminated.

Ideally i'd like to find out as much details of why it failed, wether the applications crashed because of a seg fault or if its hung in an infinite loop, etc. I do not know if there's a way to find this out. is there?

Perhaps add the other processes into the parent process group, then waitpid on the negation of the parent process.

My design has actually changed a bit and i have another question. Let's say the main processes creates two threads via pthread_create(), say pthread_1 and pthread_2, then pthread_1 calls fork(), let say it returns process id = 4.

what i'd like to do is use the 2nd thread "pthread_2" and have it call wait4(-1, &status, options, &rusage), and wait for the processes created via the fork() call by another thread, in this case "thread_1"

is this possible? i believe that since the threads are actually in the same process, when one of them forks a child process, that child process is a child of both threads? correct?

will this work?

Also, i've never changed the group process id, that seems like another option, but how does it work? any body has any sample code... what is the default group process id, what are the risks or what else happens besides assigning a group process id to a process?

Mixing threads and processes makes my head hurt. What you may or may not want to consider is that in LinuxThreads, new threads really are new processes with joined memory. You can verify that by asking for the PID of the spawned threads. You can still make a implementation decision to not support LinuxThreads, since they are not POSIX. I'd just recommend sticking with either threads or processes and save the hassle of figuring out the sematics of mixing them, if you can.

Mixing threads and processes makes my head hurt. What you may or may not want to consider is that in LinuxThreads, new threads really are new processes with joined memory. You can verify that by asking for the PID of the spawned threads. You can still make a implementation decision to not support LinuxThreads, since they are not POSIX. I'd just recommend sticking with either threads or processes and save the hassle of figuring out the sematics of mixing them, if you can.

i agree with you that his can be a head ache, but as of now, i think thats what i need to implement for the following reasons:

1) the process that I fork has to be its own process since i call execve on it, and I can't lose the other threads once i do that.

2) the reason to make "thread_1" and "thread_2" threads and not processes is because I need to save memory, and i need those threads to access and change data structures created by the main process that created those threads.

so if they are or are not a new process, i still have the same original question. Is "thread_2" still considered the parent process of proceses forked by "thread_1" ???

i agree with you that his can be a head ache, but as of now, i think thats what i need to implement for the following reasons:

1) the process that I fork has to be its own process since i call execve on it, and I can't lose the other threads once i do that.

2) the reason to make "thread_1" and "thread_2" threads and not processes is because I need to save memory, and i need those threads to access and change data structures created by the main process that created those threads.

so if they are or are not a new process, i still have the same original question. Is "thread_2" still considered the parent process of proceses forked by "thread_1" ???

I have another question. what happens when you call waitpid() and the application its waiting on crashes?

the man pages say this:

Quote:

waitpid(): on success, returns the process ID of the child whose state has changed; on error, -1 is returned; if WNOHANG was specified and no child(ren) specified by pid has yet changed state, then 0 is returned.

so does waitpid() return -1 if the app crashes? or does it return some type of signal or something that caused it to crash?

One program that's been doing this sort of thing since the very beginnings of Unix/Linux is init, which basically starts everything up and then sits around waiting for things to die. According to /etc/inittab, it can start them up again.

Basically, when a process dies, it becomes a "zombie" until its parent (or if there is no parent, init) "reaps" it. This lets you collect the status of the defunct process.

It is often the case that the progenitor process in a multi-thread application does little more than handle the spawning and reaping of child processes and threads, the latter of which actually do all of the work. It is also common that one of the children is basically a "watchdog" that wakes up periodically just to see if something is amiss.

I was thinking of having another process as a watch process, and calling waitpid() for each application, i don't know if i can call it waitpid multiple times or if they would block each other. Worse case scenario was to have a thread for each application and wait until each application terminated, but I think this is not efficient at all.

Having yet another process to watch the children of another won't work if it's not the parent of the applications started... This will only make things more complex without a real reason.

Quote:

how can I get signaled when any of the application that are launched which run in child processes exit normally or fail. My understanding is that wait calls block that process until it gets signaled, but since there is multiple applications running at the same time, more than one can exit or terminate at the same time, so how can i be notified that more than application terminated.

Normally wait() and waitpid() block until a a child changed status (e.g. exited, stopped,..). But you can specify the WNOHANG ("return immediately if no child has exited.") option if you use waitpid() instead of just wait().

However I think you dont need to worry about multiple threads and blocking at all. If understand you correctly you just need to call wait(&status) waitpid(-1, &status, 0) in a loop. When two or three or a hundred processes exit at the "same" time, your call to wait() will return immediately until all exited children are "reaped". Then it will start blocking again until the next child exits.

A exited child process waits in zombie-state (defunct-state) until the parent calls wait(). It's not a problem at all if this takes (say) 1 second.

If you need to do other things in the parent processes' loop that take time, just use waitpid(-1, &status, 0) instead and make sure the loop runs at least once in a second or so. If your other activities in the parent process may also block, just register a the SIGCHLD signal handler, so your parent will "know" when it need to call wait().

Having yet another process to watch the children of another won't work if it's not the parent of the applications started... This will only make things more complex without a real reason.

I am actually not having another process to watch the children, its a thread. A thread should work right?

Quote:

Normally wait() and waitpid() block until a a child changed status (e.g. exited, stopped,..). But you can specify the WNOHANG ("return immediately if no child has exited.") option if you use waitpid() instead of just wait().

yes i am going to use waitpid(-1, &status, 0) or wait4(-1, &status, 0, rusage) with the WNOHANG option.

Quote:

However I think you dont need to worry about multiple threads and blocking at all. If understand you correctly you just need to call wait(&status) waitpid(-1, &status, 0) in a loop. When two or three or a hundred processes exit at the "same" time, your call to wait() will return immediately until all exited children are "reaped". Then it will start blocking again until the next child exits.

are you saying i need to call TWO calls: wait(&status) AND waitpid(-1, &status, 0) or one of the two?

what happens when when there's multiple zombie proceses in between waitpid() or wait4() calls?
does it return the status on the first one, reaps the first one, and the other zombie proceses stay in the process table until waitpid or wait4 is called again? or does one call to waitpid or wait4() reap all the zombie processess?

Quote:

A exited child process waits in zombie-state (defunct-state) until the parent calls wait(). It's not a problem at all if this takes (say) 1 second.

If you need to do other things in the parent processes' loop that take time, just use waitpid(-1, &status, 0) instead and make sure the loop runs at least once in a second or so. If your other activities in the parent process may also block, just register a the SIGCHLD signal handler, so your parent will "know" when it need to call wait().

instead of what? do you mean waitpid() instead of wait()?
the things that i need to do in between wait calls are: figure out what the status was, to send a message to another process to report the status of apps, get ruasge info, and use a semaphore to make sure there's still apps running. the one that might block is the semaphore, but that should only happen if there no apps to watch over for.

whats the best way to wake up this thread and have it run only once every sec or so? should i sleep for (500 ms) or something like that?

Yet another question. What happens when an app crashes? does that app also become a zombie process? and then it can be waited on? or is there no way to find out if an app crashes with wait?

I am actually not having another process to watch the children, its a thread. A thread should work right?

I'm not sure. I'd think so, but the Linux implementation of threads gives every thread another PID (which is, I understand, uncommon). That fact makes me unsure about this.

Quote:

Originally Posted by emge1

are you saying i need to call TWO calls: wait(&status) AND waitpid(-1, &status, 0) or one of the two?

No. I Just forgot type "or" between them. Sorry if that confused you.

Quote:

Originally Posted by emge1

what happens when when there's multiple zombie proceses in between waitpid() or wait4() calls?
does it return the status on the first one, reaps the first one, and the other zombie proceses stay in the process table until waitpid or wait4 is called again? or does one call to waitpid or wait4() reap all the zombie processess?

All wait...() functions return the status of one process. The next zombie will be reaped the next time you call one of the wait...() functions. This means a process may be a zombie for a short time. But this is no problem at all, since it's parent still lives and will reap it at some moment.

Quote:

Originally Posted by emge1

instead of what? do you mean waitpid() instead of wait()?

I meant to say: instead of running a seperate process watching and reaping the apps as they run and terminate (an different process than the one that starts the apps. That's what I thought you were trying to do). And instread of running a seperate "watch-thread" for each app running.

Quote:

Originally Posted by emge1

the things that i need to do in between wait calls are: figure out what the status was, to send a message to another process to report the status of apps, get ruasge info, and use a semaphore to make sure there's still apps running. the one that might block is the semaphore, but that should only happen if there no apps to watch over for.

whats the best way to wake up this thread and have it run only once every sec or so? should i sleep for (500 ms) or something like that?

I'd would try using wait4() so you can get the rusage and the exit-status in one go. But without "NOHANG", so you reaping-loop will block unless there's a process to reap. Then you won't even need to wake it up once in a sec or so. It will reap the status and rusage when a process exited, and will block (sleep, do nothing) if there's nothing to do... This is what I tried to explain in my first post.

Then you also don't need a semaphore It think.

Quote:

Originally Posted by emge1

Yet another question. What happens when an app crashes? does that app also become a zombie process? and then it can be waited on? or is there no way to find out if an app crashes with wait?

I'm not sure, but what I think is: A program that really crashes receives a signal (e.g. segfault) and dies. It's status will be that it was terminated by signal, which you can detect from the status you got from a wait...() function with something like:

Code:

if (WIFSIGNALED(status)) {
/* do something */
}

Note: I'm not an expert, and do not entirely understand what you trying to do.