If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

OK to run pcntl_fork in apache-hosted script?

So I've had some great luck using pcntl and posix functions in command-line scripts. I thought recently "gee wouldn't it be great if you could pcntl_fork off a process from a script hosted by apache to maybe perform that super-long process like converting a video file."

So I concocted a test script and ran it via cli a couple of times but when I tried to access it via apache, this error happens:

; This directive allows you to disable certain functions for security reasons.
; It receives a comma-delimited list of function names. This directive is
; *NOT* affected by whether Safe Mode is turned On or Off.
; http://php.net/disable-functions
disable_functions = pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,

It seems odd to me that one would define disable_functions to exclude pcntl functions when this module does not appear to be even installed/loaded. Also, I can't seem to find anything that would explicitly load pcntl functions in the cli php.ini file. Pretty weird.

Anyways, the big question is it kosher to utilize pcntl_fork and related functions when running a PHP script hosted by apache? Obviously the ubuntu package managers didn't want it running. I found some mostly confused/confusing discussion about how to get it running but also some suggestions that this may not be a good idea. I saw another suggestion to use ignore_user_abort but that doesn't sound like a good idea to me -- I wouldn't want my pool of PHP processes available to apache to get used up processing video or calculating pi to the millionth decimal only to starve apache of these processes.

I have managed in the past to fork off a CLI PHP process from apache like so:

Code:

$cmd = "/usr/bin/php /path/to/some/script.php [parameter] > /dev/null & echo \$!";
$cmd_output = NULL; // will contain an array of output lines from the exec command
$cmd_result = NULL; // will contain the success/failure code returned by the OS. Will be zero for a valid command, non-zero otherwise
$cmd_return = exec ( $cmd, $cmd_output, $cmd_result ); // $return will contain the last line from the result of the command which should be the PID of the process we have spawned

and then utilized pcntl_fork and/or posix_setsid to liberate and divorce the cli process from its cruel apache master.

But that seems like a lot of work to me. I believe I had to do it this way a) because apache doesn't have pcntl enabled and b) just calling exec is not enough because the exec'ed process can get killed when the browser disconnects or a timeout happens or whatever.

Hmm, would system() et al work for that? Perhaps most important, is it outward facing, public, etc.?

I dunno if I have objections to allowing Apache to do it for secure, intranet type stuff anyway. I'm looking at my "big project" in this field and that's what it does. I do have some reasonable, pre-tested limits on how many processes I fork ....

Incidentally, the PHP script is called via Ajax and the page just sits there and waits for it to tell the JS it's finished....

/!!\ mysql_ is deprecated --- don't use it! Tell your hosting company you will switch if they don't upgrade!/!!!\ ereg() is deprecated --- don't use it!

If you read the code in my post, you see I'm using exec to run a separate process. I have the vaguest recollection that this process would be terminated once the script that initiated it had completed -- or sometime shortly after. So the script that gets exec'ed has to call posix_setsid at least and in my working example, it also uses pcntl_fork. The exec'ed script can do this because it runs via CLI. The job of these forked processes will be to take files uploaded to my server and transfer them to a cloud storage system / cdn.

Originally Posted by dalecosp

Perhaps most important, is it outward facing, public, etc.?

Sort of. Users will upload files (images, maybe pdfs, perhaps other formats?). After some validation of the uploaded file to make sure it is in fact an image and/or a safe pdf, the uploaded file will be delivered into cloud storage. It's been our experience that allowing users to upload files takes long enough and that the transfer-to-cloud storage step is a bit too long. The idea is to make the forked process very robust such that it will try multiple times to get the file into the cloud before it gives up. Cloud storage can occasionally be flaky.

Originally Posted by dalecosp

I dunno if I have objections to allowing Apache to do it for secure, intranet type stuff anyway.

This will NOT be deployed on an intranet. The forked script will be forked from a php script accessed directly be public visitors. The parameters under which the forked process runs should be pretty constrained, though. Comments on potential security problems welcome.

Originally Posted by dalecosp

I'm looking at my "big project" in this field and that's what it does.

Not sure what you mean here?

Originally Posted by dalecosp

I do have some reasonable, pre-tested limits on how many processes I fork ....

I do suppose if I were to fork too many of these processes, my server might be vulnerable to some kind of DDOS. I am aware that every machine has its limits. I recall in the past tweaking a multi-processing program extensively to find its optimum process count.

Originally Posted by dalecosp

Incidentally, the PHP script is called via Ajax and the page just sits there and waits for it to tell the JS it's finished....

I'm not sure we perfectly understand one another. In my application, you visit a website, you upload a file, the file is thoroughly inspected and then a long-running external process is forked that will never return any data to the user. The original PHP script the user visited in the first place will terminate and the forked process must continue to run, even if the user closes their browser and smashes their computer over Justin Beiber's head.

Why not a process that LOOKs for things to do. In fashion you could create /jobs/uploads/{pid}_{timestamp}.json (or something) from the apache server, and a daemon (or cron or whatever) would come along, see a new upload to deal with and process accordingly. I imagine something like the following (super psuedo, but you're smart enough to deal)

Sadly, nobody codes for anyone on this forum. People taste your dishes and tell you what is missing, but they don't cook for you. ~anoopmailI'd rather be a comma, then a full stop.User Authentication in PHP with MySQLi - Don't forget to mark threads resolved - MySQL(i) warning

Why not a process that LOOKs for things to do. In fashion you could create /jobs/uploads/{pid}_{timestamp}.json (or something) from the apache server, and a daemon (or cron or whatever) would come along, see a new upload to deal with and process accordingly. I imagine something like the following (super psuedo, but you're smart enough to deal)

The reason this client came to us in the first place was because they had an import process that ran every night to download remote images and copy them to the server's web root (creating various thumbnail sizes, etc.). The number of images they were dealing with became so great that the script (which dealt with one image at a time) would still be running 24 hours later when it's daily scheduled cron time came.

Your proposal of having a single process to be waiting for the images to be uploaded could, in theory, suffer from a similar problem. Relying on a JSON-encoded file further complicates matters (at least I think it does) in that it might prove to be a bottleneck on a busy server. If the daemon script or cron job is working on the JSON file, then the web server cannot simultaneously work on that file so we might have either a bottleneck or some kind of weird race conditions. I dunno. Maybe not. In any case, we have found that using a database for the job queue is waaaaay better -- transactions and record locking mean it is well-suited to many processes running at once.

So imagine you have 50 customers at once come rushing over to the site, all uploading large sensitive files (e.g., 10MB apiece). Like maybe a passport or driver's license or something. Once process might take quite some time to sequentially load all those files into a CDN. On the other hand, 50 distinct processes all uploading files at once might totally hose the server's bandwidth.

Using a cron job is also a possibility, but I think this makes it even more likely that we'd see situations where we need to do a bunch of files at once.

I think the ideal situation is where we have a pool of processes waiting/running all the time, using a database to handle things. But that's a pain in the ass to code. Long-running processes are really hard to do. Memory leaks, unexpected fatal errors, etc., tend to kill your process pool or make zombie processes. You also have to be careful about locking your db records and how you dole out the workload without jobs getting dropped or skipped or permanently locked and never unlocked. You generally need some kind of cron job or something to kill and restart these things or supervise them to make sure they are healthy.

Anyways, my initial thoughts on pseudo were something like this:

PHP Code:

// file upload handlerif validate_file_upload()move_uploaded_file($_FILES["field"]["tmp_name"], '/jobs/uploads/files/'. time() . basename($_FILES["pictures"]["name"]));// create db record in our jobs table$db_id = insert_cdn_job_record("blah blah blah");$cmd = "/usr/bin/php /path/to/cdn-upload.php $db_id /dev/null & echo \$!";$log->write ( "command for terminate:" . $cmd );$cmd_output = NULL; // will contain an array of output lines from the exec command$cmd_result = NULL; // will contain the success/failure code returned by the OS. Will be zero for a valid command, non-zero otherwise$cmd_return = exec ( $cmd, $cmd_output, $cmd_result ); // $return will contain the last line from the result of the command which should be the PID of the process we have spawnedif ($cmd_result) { die("uh oh there was a problem with cdn-upload script!"); }}

And then the cdn-upload script is designed to run via CLI so it can happily use pcntl_fork and fork off an entirely independent process:

Process Control support in PHP implements the Unix style of process creation, program execution, signal handling and process termination. Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment.

.
So I think that very obviously precludes Apache. I wonder if my PCNTL approach is still safe?

The user comments section also has this statement, which I find a bit suspect:

Originally Posted by sean dot kelly at mediatile dot com

The following statement left me searching for answers for about a day before I finally clued in:

"Process Control should not be enabled within a web server environment and unexpected results may happen if any Process Control functions are used within a web server environment."

At least for PHP 5.3.8 which I am using, and who knows how far back, it's not a matter of "should not", it's "can not". Even though I have compiled in PCNTL with --enable-pcntl, it turns out that it only compiles in to the CLI version of PHP, not the Apache module. As a result, I spent many hours trying to track down why function_exists('pcntl_fork') was returning false even though it compiled correctly. It turns out it returns true just fine from the CLI, and only returns false for HTTP requests. The same is true of ALL of the pcntl_*() functions.

This test may be of interest to others. I created a test script to be accessed via Apache, all it does is exec another php script in the background. This seemed to work fine:
* apache-test.php is accessed in a browser. it forks off cli-test.php and completes
* cli-test.php runs for at least nine minutes or so running its loop

The two processes seemed distinct and disconnected. However, I got to wondering about how apache (prefork? fastcgi?) might treat a pool of processes. My test is working fine on my workstation, but what happens on a production server with frequent page requests and a busy php process pool? I tried the test again and restarted apache while cli-test.php was running its 10-minute loop. This killed cli-test.php (in fact all instances of it that might be running).

PHP Code:

//apache-test.php
echo "preparing to fork cli-test<br>";

$script_path = dirname(__FILE__) . "/cli-test.php";

$cmd = "/usr/bin/php $script_path > /dev/null & echo \$!";
echo "command for terminate:" . $cmd . "<br>";
$cmd_output = NULL; // will contain an array of output lines from the exec command
$cmd_result = NULL; // will contain the success/failure code returned by the OS. Will be zero for a valid command, non-zero otherwise
$cmd_return = exec ( $cmd, $cmd_output, $cmd_result ); // $return will contain the last line from the result of the command which should be the PID of the process we have spawned

<?php
/**
* This is a test file to determine if it's possible to execute an independent process from apache
* or if the process will be coupled to the original apache process and terminated when the other
* process terminates
*/

Soooo apparently calling posix_setsid in cli-test.php is enough to make that process survive beyond the apache restart. I'm not certain, but I'm inclined to think that this process is therefore entirely independent of apache and therefore not susceptible to any process pool management or memory reclamation work that apache might do. I could be wrong about this. Comments more than welcome.

// create a long-running process to see if it'll keep up or whether this process is // terminated and garbage collected when the apache process that launches it diesfor($i=0; $i<600; $i++) {$log->write($mypid . " running $i th iteration");sleep(1); }

Was revisiting this concept today and thought I'd add some additional information.

First, using the ampersand to background the process means you don't get the non-zero return_var if the PHP script throws an error. To demonstrate, I created this bad.php which throws an exception:

PHP Code:

echo "so far so good BUT...\n";throw new Exception("grrr! I am bad!");

Then I wrote this script, exec.php to execute it:

PHP Code:

// example 1$cmd = "/usr/bin/php /tmp/foo/bad.php > /tmp/foo/out.txt";$cmd_output = NULL;$cmd_result = NULL;$cmd_return = exec($cmd, $cmd_output, $cmd_result); // $return will contain the last line from the result of the command which should be the PID of the process we have spawned

This presumably returns the successful zero $cmd_result value because we successfully launched an independent PHP process. Note also that the exception thrown comes AFTER the output of exec.php has finished and the command prompt has again been displayed. This is presumably because stderr was still being routed to our main process (exec.php) and, because we forked the process, it was somewhat delayed or something. I.e., the error in bad.php was routed to stdErr which was still somehow piped to exec.php and subsequently bubbled back up to the terminal window from which we invoked everything. The exception is not part of the output returned to $cmd_output. The text of an exception thrown in bad.php will not be available in exec.php.

Adding the echo $! at the end is how we get the PID of the forked process.

This routes stderr into our file along with stdout and also effects a more complete separation of bad.php from exec.php. I'm not really sure what tenuous link might hang around between exec.php and bad.php, but it just seems better to try and separate them more fully. The result is that the exception does not appear when we run exec.php but it does appear in the output file, /tmp/foo/out.txt:

Using ampersand is nothing to do with PHP, its a shell trick. Generally, you will redirect output and stderr to files in that case. As you determined, sorry I wrote that before finishing reading - but leaving it so you know that this is a piece of shell trickery and has nothing to do with php exec. Here's some light reading on the subject: http://ba****out.com/2013/05/18/Ampe...mand-line.html and https://linux.die.net/man/1/bash

Originally Posted by https://linux.die.net/man/1/bash

If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0.

The reason it waits for the error when you don't redirect stdErr is because you haven't disconnected all the stream handles. If you look at like proc_open to manage this (which btw is a far more powerful way to do commands from PHP), you will see there are 3 streams associated with a shell command: stdIn, stdOut, stdErr. To fully detach the process with exec/shell/backticks you would have to (again, as you figured out) redirect stdOut and stdErr away from the firing shell environment.

Sadly, nobody codes for anyone on this forum. People taste your dishes and tell you what is missing, but they don't cook for you. ~anoopmailI'd rather be a comma, then a full stop.User Authentication in PHP with MySQLi - Don't forget to mark threads resolved - MySQL(i) warning

Using ampersand is nothing to do with PHP, its a shell trick. Generally, you will redirect output and stderr to files in that case. As you determined, sorry I wrote that before finishing reading - but leaving it so you know that this is a piece of shell trickery and has nothing to do with php exec. Here's some light reading on the subject: http://ba****out.com/2013/05/18/Ampe...mand-line.html and https://linux.die.net/man/1/bash

The reason it waits for the error when you don't redirect stdErr is because you haven't disconnected all the stream handles. If you look at like proc_open to manage this (which btw is a far more powerful way to do commands from PHP), you will see there are 3 streams associated with a shell command: stdIn, stdOut, stdErr. To fully detach the process with exec/shell/backticks you would have to (again, as you figured out) redirect stdOut and stdErr away from the firing shell environment.

Thanks for this detail, especially the 'return status is 0' bit and also the '3 streams' -- that explains a lot. I posted my results above as an attempt to clarify/share/remember where the stdErr and stdOut stuff go. I think it's quite noteworthy that an exception thrown in the script we have forked off is not available in the script that actually does the forking. The parent script has essentially no knowledge of the success or failure of any backgrounded process. If you want the parent process to know the success or failure of its child, you have to work out some other means of communicating between them.

I still think there's a bit of mystery about how separate these processes actually are. Some production code I wrote four years ago not only uses exec to fork off a separate process, the forked process also uses posix_setsid to further disconnect itself from the parent. It also unsets-and-redefines some $log and $db vars and I can't remember why. I vaguely remember that I might have had some crosstalk between processes if they used the same db connection. I can't seem to locate any notes, however.

The parent script has essentially no knowledge of the success or failure of any backgrounded process. If you want the parent process to know the success or failure of its child, you have to work out some other means of communicating between them.

Or don't background it? I'm confused do you want it to run in the background, separate from the calling script - or do you want the calling script to know what's going on in the child? The point of forking it off into the background, is now you can terminate the parent while the child still runs.

Originally Posted by sneakyimp

I still think there's a bit of mystery about how separate these processes actually are. Some production code I wrote four years ago not only uses exec to fork off a separate process, the forked process also uses posix_setsid to further disconnect itself from the parent. It also unsets-and-redefines some $log and $db vars and I can't remember why. I vaguely remember that I might have had some crosstalk between processes if they used the same db connection. I can't seem to locate any notes, however.

Again, you're confusing me - if you're running a shell command, you share no variables with the script executing the command vs the script running the command, so not sure why anything needs to be unset? Rather it's a new fresh script execution, with its own bootstrapping required.

Sadly, nobody codes for anyone on this forum. People taste your dishes and tell you what is missing, but they don't cook for you. ~anoopmailI'd rather be a comma, then a full stop.User Authentication in PHP with MySQLi - Don't forget to mark threads resolved - MySQL(i) warning

Or don't background it? I'm confused do you want it to run in the background, separate from the calling script - or do you want the calling script to know what's going on in the child?

Minimally, I'd like the parent process to know if the child process succeeds initially (i.e., no fatal errors in the first 10 milliseconds). If you fork off a process and have no idea what the forked process is doing, it makes it very hard to debug the forked process -- and you run the risk of having a bunch of poorly behaved scripts doing weird stuff. Even an obviously bad command will have a result of zero if you background it:

PHP Code:

$cmd = "/this/script/does/not/exist &";

Originally Posted by Derokorian

The point of forking it off into the background, is now you can terminate the parent while the child still runs.

Not always. In my case, I want to delegate a time-consuming task full of lots of waiting (possibly ten minutes) to a separate process so my parent process can continue a loop which may need to launch still more servers. If I had to launch them all sequentially, it might take two hours to spin up 10 servers.

Originally Posted by Derokorian

Again, you're confusing me - if you're running a shell command, you share no variables with the script executing the command vs the script running the command, so not sure why anything needs to be unset? Rather it's a new fresh script execution, with its own bootstrapping required.

I was also confused/surprised to find that my production code from four years ago had the child process calling pcntl_fork. It seems unnecessary but I vaguely recall that I had problems keeping the child process running if I didn't. To summarize:
* master.php is my script that wants to fork off processes
* master.php calls exec on worker.php and backgrounds the process: /usr/bin/php /path/to/worker.php "some-param" > /dev/null 2>&1 & echo $!
* worker.php calls pcntl_fork, the parent process exits immediately and the child process calls posix_setsid to become a 'session leader' and then does the work
* the first thing the child process does is to unset any db-related variables and then create a new, distinct database connection. this is apparently necessary because if the parent process closes the database connection then the child process can no longer connect to the db. my parent process doesn't explicitly close the db connection, but I seem to recall having erratic db connection problems -- i think this was due to the parent process being garbage collected and its db connections getting closed.

I agree that it seems like a lot of effort to do all that. I don't recall why I went through all the trouble in the first place but I suspect it was to fix problems I was having with the production system. The code, although seemingly more complicated than it really needs to be, has been running steadily and reliably every five minutes for four years. When it has trouble, it's usually due to a third-party API (rackspace) acting up.

If you have a master script and you want to control child nodes, I would use proc_open. Far more flexible and much easier to facilitate communication between the two.

Sadly, nobody codes for anyone on this forum. People taste your dishes and tell you what is missing, but they don't cook for you. ~anoopmailI'd rather be a comma, then a full stop.User Authentication in PHP with MySQLi - Don't forget to mark threads resolved - MySQL(i) warning