Parallel Processing in PHP

May 17, 2011

Though not a first choice for long-running processes, many web shops end up writing daemons or batch processing scripts in PHP. As business grows, the need to process records more quickly to deal with traffic becomes an issue. Often times, the processing is limited by something other than raw processing power–network latency and database query times being the usual slowdowns. When this is the case, the easiest way to increase throughput is with multiprocessing: multiple children that spread the time waiting so as the fully utilize the processing power available.

To this end, I have created a simple framework for managing child/worker multiprocessing in PHP. Like other high-level interpreted languages, the most straightforward way to spin things up is using fork(2) to create new processes. While not as Hardcore and Awesome as the lightweight threads that other languages provide, OS-level process creation isn’t a huge hindrance if you code for it: make the child processes long running so as to mitigate the startup cost.

This class creates simple workers that print a couple of debug messages with some sleeping in between, and then announce that they are done working. Now you can instantiate the class with a single argument: the number of children to run. gosUtility_Parallel will take care of all the details.

// Make with the go
$minimal = new Minimal(2);
$minimal->go();

If children exit with a non-zero status, the parent will spin up a replacement. The parent will continue to run until all children have exited normally, or it gets INT (say, ctrl+c) or TERM (the default signal sent by kill(1)), in which case it will pass that signal on to the children, ensure they shut down, and then end itself. gosUtility_Parallel provides ample logging information; running the above produces the following output:

gosUtility_Parallel provides a number of overrideable methods whose names explain their purpose: parentSetup(), parentCleanup(), and childCleanup(). Children can also get their $workerID and the $maxWorkers number making processing based upon modular division trivial. The example parallel class in the distribution demonstrates some of these features: