Simple Parallel Processing with PHP and proc_open()

Written by James Sinclair
on the 6th December 2012

You can find a reasonably sensible example of how to perform parallel processing with PHP using the popen() function at wellho.net. Sometimes though, the requirement to pass all the parameters to your function via command line arguments can be limiting. For example, maybe you are processing a lot of HTML pages, or large associative arrays. In these cases, you can get slightly more flexibility by using the proc_open() command and using Unix pipes to pass data.

With both techniques, you need an array of items you want to process in parallel, and a separate file that can process the item for you. Let’s say for example, you want to read a bunch of files, then append some complicated data to them. Your PHP class without parallel processing might look something like the following:

This is very simple, but can be rather slow. To run in parallel, we first create a separate PHP file to do the processing:

<?php
/**
* Append data to a file.
*
* @file append_to_file.php
*/
// Load our File Appender Class.
require_once 'FileAppender.class.php';
// We expect the filename to be passed as the first command line argument.
$filename = $argv[1];
// We read the data to append from STDIN.
$data = file_get_contents('php://stdin');
// Push the result back out to STDOUT.
echo FileAppender::appendData($filename, $data);

We now change our append all function to run in parallel:

/**
* Append all in parallel.
*
* @return array
*/
public function appendAllParallel()
{
$files = $this->fileList;
$strings = $this->forAppending;
// Descriptor specification. This sets up our Unix pipes so that PHP can
// pass data to the function as if it were writing to a file. It gets
// data back the same way, by reading from the file. The final entry tells
// PHP to pass any errors straight through to STDERR.
$descriptorSpec = array(
0 => array('pipe', 'r'),
1 => array('pipe', 'w'),
2 => array('file', 'php://stderr', 'a')
);
// Kick of the parallel processing.
$handles = array();
foreach ($files as $i => $file) {
// Create the command to run, being careful to use escapeshellarg().
$cmd = 'php append_to_file.php '.escapeshellarg($file);
// Run the process. This will modify $pipes so that it contains file
// handles as specified by $descriptorSpec.
$procHandle = proc_open($cmd, $descriptorSpec, $pipes);
// We will just assume that $procHandle was created OK. Really, you
// should check that proc_open() does not return false.
$readFileHandle = $pipes[0];
$writeFileHanle = $pipes[1];
// Keep track of those file and process handles for later.
$handles[] = array(
'process' => $procHandle,
'file' => $readFileHandle,
);
// Pass the data to the process so it can read it from STDIN.
fwrite($readFileHandle, $strings[$i]);
// Close the file handle because we don't need it any more.
fclose($readFileHandle);
}//end foreach
// We've kicked all the processes off. Now we need to get the data back.
$output = array();
foreach ($handles as $handleData) {
// Read the data back from our process and close the file handle.
$output[] = fgets($handes['file']);
fclose($handes['file']);
// Close the process handle.
pclose($handles['process']);
}
return $output;
}//end appendAllParallel()

As you can see, this makes the code a lot more complicated. It is more work to make things run in parallel, but sometimes to make things fast, that’s what you have to do.