OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

I have a pretty general question which I didn't find answered in anyforum post yet:

What determines the CPU usage of OpenMP threads?

Background and details:

I have a C application that uses a #pragma parallel for loop to do somepretty heavy processing that's typically taking several tens of seconds.I am running it on 64bit Linux (using gcc 4.5.1 or 4.4.6 depending onthe machine) on machines with 8 to 32 cores. While developing it and "inproduction" (for the past half year or so) it used to basically saturatethe machine, so that when I checked the CPU usage, I saw as many threadsas there were cores all use 100% CPU, i.e. a total of 1600% CPU usage onthe 16 core machine. I check this using top.

Now, in the last few weeks, I see something different: there are as manythreads as before, but I see a total CPU usage of 100% or 500% or somen*100% in between, where a few threads are using 100% and the rest ofthe threads is getting less CPU. Here is an example top output with 500%usage (ran on a 16 core machine with OMP_NUM_THREADS=12 set in theenvironment):

showing only a very moderate gain of the parallelization, despite having12 CPUs available to OpenMP.

The problem is that I don't see anything that has changed, neither inthe code nor in the installation of the machine. The loop in question isa simple parallel OpenMP for loop with some explicitly shared variables:

The problem is that I don't see anything that has changed, neither inthe code nor in the installation of the machine.

Maybe the running environment changed, are there new processes/users there were not before? I see something "suspicious" in the usage of Memory (too close to the limit) and swap (every thread using some swap memory).

ftinetti wrote:Maybe the running environment changed, are there new processes/users there were not before? I see something "suspicious" in the usage of Memory (too close to the limit) and swap (every thread using some swap memory).

The swap is indeed weird, I would have throught that the process is not running long enough to get partly swapped out, especially since there is plenty of RAM just taken up by the cache. But I don't think that swapping happens while the CPU-heavy loop in question is running. I see that the swap increases while I read the data from disk at the beginning of the process, but it doesn't change during the threaded loop. And in my program the bulk of the used RAM is in the shared OpenMP variables. Would that not show up in top for all threads?

Otherwise I don't find memory use special, in fact in production this process takes up to 60 GB of RAM on this machine (even more on another machine that shows the same problem), and still used to run in parallel much more efficiently.

I don't think anything has changed regarding users or special processes. At least I didn't see any other processes that would compete significantly for resources while I run my program.

MarkB wrote:Given that it hasn't had one for over a year, a reboot might be worth a try!

I run these things on three machines. They all have slightly different hard- and software installed. Curiously, the two machines that show(ed) the problem had zombie processes, the one where I never saw the problem doesn't have any zombies. I wonder if that has anything to do with it...

Fernando, I have forgotten what hardware we ordered exactly, but I can tell you about the properties I see in /proc. One machine has Xeon E5520, 8 cores total (16 if you count hyperthreading) and 72 GB RAM; another has Xeon E5645, 24 cores including hyperthreading and 24 GB RAM; the last one has Opteron 6134, 32 cores including hyperthreading, and 256 GB RAM.

pwei wrote:OK, so rebooting indeed helped! Thanks again for that idea.

You're welcome!

pwei wrote:I run these things on three machines. They all have slightly different hard- and software installed. Curiously, the two machines that show(ed) the problem had zombie processes, the one where I never saw the problem doesn't have any zombies. I wonder if that has anything to do with it...

As I understand it, zombie processes are supposed to be harmless and not consume and system resources (e.g. CPU, memory). You might have had orphaned processes which did not release their memory, causing your running threads to swap???

Fernando, I have forgotten what hardware we ordered exactly, but I can tell you about the properties I see in /proc. One machine has Xeon E5520, 8 cores total (16 if you count hyperthreading) and 72 GB RAM; another has Xeon E5645, 24 cores including hyperthreading and 24 GB RAM; the last one has Opteron 6134, 32 cores including hyperthreading, and 256 GB RAM.

MarkB wrote:As I understand it, zombie processes are supposed to be harmless and not consume and system resources (e.g. CPU, memory). You might have had orphaned processes which did not release their memory, causing your running threads to swap???

The swapping is apparently normal, even now when I get full speed again, I see this. And most of the RAM is still unused...

ftinetti wrote:

Thanks for the details of the computers, which one is used for

13.355s (wall-clock) 140.140s (CPU, 12 CPUs)

?(I guess it is on the two Xeon E5645s, though)

Sorry for the confusion. That was the first one with E5520s, but run with OMP_NUM_THREADS=12. I found that for this machine and what I'm doing this 12 threads give the optimum runtime.