I have a PHP class that selects data about a file from a MySQL database, processes that data in PHP and then outputs the final data to the command line. Then it moves onto the next file within a foreach loop. ( later I'll be inserting this data into another table ... but that's not important now )

I want to make the processing as fast as possible.

When I run the script and monitor my system using top or iostat:

my cpus are never less than 65% idle ( 4 core EC2 instance )

the PHP script sits at about 45%

mysqld sits at about 8%

my memory usage never passes ~1.5GB ( 8GB of ram total )

there is very little disk IO

What other bottlenecks could be preventing this process from running faster and using the available CPU and Memory?

EDIT 1:

This does not need to be a procedural process and I've designed it to parallelize the processing if necessary. If I can speed it up some, it'd be simpler to leave it as procedural processing.

I've monitored the disk I/O using iostat -x 1 and there is very little.

I need to speed this up in general because it will ultimately be used to process hundreds of millions of files and I'd like it to be as fast as possible as it's part of a larger processing step.

5 Answers
5

Well, it may be because a single PHP process can only run on one core at a time and you're not loading up your system to the point where it will have four concurrent jobs running continuously.

Example: if PHP were the only thing running on that box, it was inherently tied to a single core per "job" and only one request at a time were being made, I'd fully expect a CPU load of around 25% despite the fact it's already going as fast as it possibly can.

Of course, once that system started ramping up to the point where there are continuously four PHP scripts running, you may find higher CPU utilisation.

In my opinion, you should only really worry about a performance problem if it's an actual problem (such as not being able to keep up with incoming requests). Optimisation just because you want it using more CPU and/or memory resources seems to be looking at it the wrong way around. I would just get it running as fast as possible without worrying about the actual resources used.

If you want to process hundreds of millions of files as fast as possible (as per your update) and PHP is core-bound, you should think about horizontal scaling.

In other words, if the processing of a single file is independent, you can simply start two or three PHP processes and have them process one file each. That will be more likely to get them running on distinct cores.

You can even scale across physical machines if necessary though that's likely to introduce network latency on the DB access (unless the DB is replicated across all the machines as well).

Without a fair bit more detail, the options I can provide will be mostly generic ones.

When this runs in production it will be processing ~100M files so speed is crucial. I understand your point about the single PHP process only using a single core, but shouldn't that single PHP process be using ~100% CPU in top, even if the overall reported CPU usage is not at 100%?
–
T. Brian JonesNov 9 '11 at 2:05

@T.BrianJones: that's not the way it works. If you have a rogue "suck-up-as-much-cpu-as-possible" single-core-bound process running on your dual-core machine, it will never use more than 50% of the total CPU capacity.
–
paxdiabloNov 9 '11 at 2:10

1

@T.BrianJones: If its job is to pump files, you want it doing that and as little else as possible. High CPU usage tends to mean it's not keeping the disks busy. So if you're seeing high CPU usage you might consider using a compiled, not interpreted, language.
–
Mike DunlaveyNov 9 '11 at 14:18

The first problem you need to fix is the word "bottleneck", because it means everything and nothing.
It conjurs this image of some sort of constriction in the flow of whatever the machine seems to do which is so fast it must be like water running through pipes.

Computation isn't like that.
I find it helps to see how a very simple, slow, computer works, namely Harry Porter's Relay Computer.
You can watch it chug along, at a very slow clock rate, executing every little step within each instruction and finishing them before it starts the next.
(Now, obviously, machines these days are multi-core, pipelined, multi-level cache, blah blah. That's all fine, but that makes you think computation is like water flowing, and that prevents you from understanding software performance.)

Think of any computer and software as just like in that relay machine, except on a scale of nanoseconds, not seconds.
When a computer is calculating in a program, it is executing instructions one after the other. Call that "X".
When a program wants to read or write some bits to external hardware, it has to request that hardware to start, and then it has to find a way to kill time until the result is ready.
Call that "y".
It could be an idle loop, or letting another "thread" run, etc.

So the execution of a program looks like
XXXXXyyyyyyyXXXXXXXXyyyyyyy
If there are more "y"s in there than "X"s we tend to call it "I/O bound".
If not, we might call it "compute bound".
Either way, it's just a matter of proportion of time spent.

If you say it's "memory bound", that's just like I/O except it could be different external hardware.
It still occupies some fraction of the overall sequential timeline.

Now for any given task, there are infinitely many programs that could be written to do it. Some of them will get done in fewer steps than all the others.
When you want performance, you want to get as close as possible to writing one of those programs.
One way to do it is to find "X"s and "y"s that you can get rid of, and get rid of as many as possible.

Now, within a single thread, if you pick an "X" or "y" at random, how can you tell if you can get rid of it?
Find out what it's purpose is!
That "X" or "y" represents a moment in the execution sequence of the program, and if you look at the state of the program at that time, and look at the source code, you will be able to figure out why that moment is being spent.
Do that a few times.
As soon as you see two moments in time having a similar less-than-absolutely-necessary purpose,
there are probably a lot more like them, and you've found something you can get rid of.
If you do so, the program will no longer be spending that time.

That's the basic idea behind this method of performance tuning.
Here's an example where that method was used, over several iterations, to remove over 97% of the time spent in a program.
Not all programs are that far away from optimal.
(Some are much farther.)
Many programs just have to do a certain amount of "X"s or "y"s, and there's no way around it.
Nevertheless, it is often very surprising how much room you can find for speedup in otherwise perfectly good code - provided - you forget about "bottlenecks" and look for steps that it's doing, over time, that could be removed or done better.

I suspect you're spending most of your time communicating with MySQL and reading the files. How are you determining that there's very little IO? Communicating with MySQL is going to be over the network, which is very slow compared to direct memory access. Same with reading files.

@OliCharlesworth Yeah, it would be faster. My point is mainly that no matter how they're doing it, waiting for MySQL is going to be slow (comparatively).
–
Brendan LongNov 9 '11 at 1:41

@Breandan Long - MySQL is local. I don't read the files off disk, just info about them that has already been put into a MySQL table. How does one monitor MySQL communication times?
–
T. Brian JonesNov 9 '11 at 2:08

I understand your point about the single PHP process only using a single core, but shouldn't that single PHP process be using ~100% CPU ( as reported for that process in top) , even if the overall reported CPU usage is not at 100%?
–
T. Brian JonesNov 9 '11 at 2:10

100% of 1 core = 25% cpu utilization in a 4 core machine. You should be able to get "top" to report utilization by core.
–
James AndersonNov 9 '11 at 5:08

Sorry to resurrect an old thread, but thought this might help someone out.

I had a similar problem and it had to do with a command line script that was throwing numerous 'Notice' warnings. That somehow led to it performing slowly and using less than 10% of the cpu. This behavior only showed up on migrating from MacOS X to Ubuntu, as the default in OSX seems to be to suppress the wornings. Once I fixed the offending code it performed much better, with processes using around 100% cpu consistently.