I asked a previous question to try and isolate the source of an increase in CPU usage when moving an application from RHEL 5 to RHEL 6. The analysis that I did for that seems to indicate that it is being caused by the CFS in the kernel. I wrote a test application to try and verify if this was the case (original test application removed to fit in size limit, but still available in git repo.

On both versions, these results were about what I expected with the average amount of time per iteration scaling relatively linearly. I then recompiled with -DSLEEP_TYPE=1 and got the following results on RHEL 5:

On RHEL 5, the results were about what I expected(4 threads taking twice as long because of the 1 ms sleep but the 8 threads taking the same amount of time since each thread is now sleeping for about half the time, and a still fairly linear increase).

However, with RHEL 6, the time taken with 4 threads increased by about 15% more than the expected doubling and the 8 thread case increased by about 45% more than the expected slight increase. The increase in the 4 thread case seems to be that RHEL 6 is actually sleeping for a handful of microseconds more than 1 ms while RHEL 5 is only sleep about 900 us, but this doesn't explain the unexpectedly large increase in the 8 and 40 thread cases.

I saw similar types of behaviour with all 3 -DSLEEP_TYPE values. I also tried playing with the scheduler parameters in sysctl, but nothing seemed to have a significant impact on the results. Any ideas on how I can further diagnose this issue?

UPDATE: 2012-05-07

I added measurements of the user and system CPU usage from /proc/stat//tasks//stat as an output of the test to try and get another point of observation. I also found an issue with the way the mean and standard deviation were being updated that was introduced when I added the outer iteration loop, so I will add the new plots that have the corrected mean and standard deviation measurements. I have included the updated program. I also made a git repo to track the code and it's available here.

I re-ran the tests on a Dell Vostro 200 (dual core CPU) with several different OS versions. I realize that several of these will have different patches applied and won't be "pure kernel code", but this was the simplest way that I could run the tests on different versions of the kernel and get comparisons. I generated plots with gnuplot and have included the version from the bugzilla about this issue.

All of these tests were run with the following command with the following script and this command ./run_test 1000 10 1000 250 8 6 <os_name>.

#!/bin/bash
if [ $# -ne 7 ]; then
echo "Usage: `basename $0` <sleep_time> <outer_iterations> <inner_iterations> <work_size> <max_num_threads> <max_sleep_type> <test_name>"
echo " max_num_threads: The highest value used for num_threads in the results"
echo " max_sleep_type: The highest value used for sleep_type in the results"
echo " test_name: The name of the directory where the results will be stored"
exit -1
fi
sleep_time=$1
outer_iterations=$2
inner_iterations=$3
work_size=$4
max_num_threads=$5
max_sleep_type=$6
test_name=$7
# Make sure this results directory doesn't already exist
if [ -e $test_name ]; then
echo "$test_name already exists";
exit -1;
fi
# Create the directory to put the results in
mkdir $test_name
# Run through the requested number of SLEEP_TYPE values
for i in $(seq 0 $max_sleep_type)
do
# Run through the requested number of threads
for j in $(seq 1 $max_num_threads)
do
# Print which settings are about to be run
echo "sleep_type: $i num_threads: $j"
# Run the test and save it to the results file
./test_sleep $sleep_time $outer_iterations $inner_iterations $work_size $j $i >> "$test_name/results_$i.txt"
done
done

Here's the summary of what I observed. I will compare them in pairs this time because I think that it is a bit more informative that way.

CentOS 5.6 vs CentOS 6.2

The wall clock time (gettimeofday) per iteration on CentOS 5.6 is more varied than 6.2, but this makes sense since the CFS should do a better job of giving the processes equal CPU time resulting in more consistent results. It's also pretty clear that CentOS 6.2 is more accurate and consistent in the amount of time that it sleeps for with the different sleeping mechanisms.

The "penalty" is definitely apparent on 6.2 with a low number of threads (visible on gettimeofday and user time plots) but it seems to be reduced with a higher number of threads (the difference in user time may just be an accounting thing since the user time measurements are so course).

The system time plot shows that the sleep mechanisms in 6.2 are consuming more system than they did in 5.6, which corresponds with the previous results of the simple test of 50 processes just calling select consuming a non-trivial amount of CPU on 6.2 but not 5.6.

Something that I believe that is worth note is that the use of sched_yield() doesn't induce the same penalty as seen by the sleep methods. My conclusion from this is that it's not the scheduler itself that is the source of the issue, but the interaction of the sleep methods with the scheduler that is the issue.

Ubuntu 7.10 vs Ubuntu 8.04-4

The difference in the kernel version between these two is smaller than that of CentOS 5.6 and 6.2, but they still span the time period when CFS was introduced. The first interesting result is that select and poll seem to be the only sleep mechanisms that have the "penalty" on 8.04 and that penalty continues to a higher number of threads than what was seen with CentOS 6.2.

The user time for select and poll and Ubuntu 7.10 are unreasonably low, so this appears to be some sort of accounting issue that existed then, but I believe is not relevant to the current issue/discussion.

The system time does seem to be higher with Ubuntu 8.04 than with Ubuntu 7.10 but this difference is FAR less distinct than what was seen with CentOS 5.6 vs 6.2.

Notes on Ubuntu 11.10 and Ubuntu 12.04

The first thing to note here is that the plots for Ubuntu 12.04 were comparable to those from 11.10 so they not show to prevent unnecessary redundancy.

Overall the plots for Ubuntu 11.10 show the same sort of trend that was observed with CentOS 6.2 (which indicates that this is a kernel issue in general and not just a RHEL issue). The one exception is that the system time appears to be a bit higher with Ubuntu 11.10 than with CentOS 6.2, but once again, the resolution on this measurement is very course so I think that any conclusion other than "it appears to be a bit higher" would be stepping onto thin ice.

Ubuntu 11.10 vs Ubuntu 11.10 with BFS

A PPA that uses BFS with the Ubuntu kernel can be found at https://launchpad.net/~chogydan/+archive/ppa and this was installed to generate this comparison. I couldn't find an easy way to run CentOS 6.2 with BFS so I ran with this comparison and since the results of Ubuntu 11.10 compares so well with CentOS 6.2, I believe that it is a fair and meaningful comparison.

The major point of note is that with BFS only select and nanosleep induce the "penalty" at low numbers of threads, but that it seems to induce a similar "penalty" (if not a greater one) as that seen with CFS for a higher number of threads.

The other interesting point is that the system time appears to be lower with BFS than with CFS. Once again, this is starting to step on thin ice because of the coarseness of the data, but some difference does appear to be present and this result does match with the simple 50 process select loop test did show less CPU usage with BFS than with CFS.

The conclusion that I draw from these two points is that BFS does not solve the problem but at least seems to reduce its affects in some areas.

Conclusion

As previously stated, I don't believe that this is an issue with the scheduler itself, but with the interaction between the sleeping mechanisms and the scheduler. I consider this increased CPU usage in processes that should be sleeping and using little to no CPU a regression from CentOS 5.6 and a major hurdle for any program that wants to use an event loop or polling style of mechanism.

Is there any other data I can get or tests I can run to help further diagnose the problem?

Update on Jun 29, 2012

I simplified the testing program a little bit and can be found here (The post was starting to exceed the length limit so had to move it).

Wow, thorough analysis - but with so much data the original question is getting fuzzier to me. Can you boil it down 1) a single test 2) a single distro 3) two different kernels 4) the 15% slowdown? If your hypothesis in last paragraph is right, it's time to start diffing kernel sources, but feels like the other variables should be eliminated first.
–
ckhanMay 6 '12 at 6:13

I added some outputs from the test application and now do the comparison in pairs to try and make it a little easier to digest all of the info.
–
Dave JohansenMay 7 '12 at 17:46

I tried to take a look at that bugzilla, but Redhat is saying that it is "an internal bugzilla, and not visible to the public." Have there been any updates on this?
–
user21585Aug 6 '12 at 13:40

I'm new to the whole RedHat bug thing, so that may have been something that I did (or didn't do) when creating the bug that did that, but the only update I've heard so far is an update to a parameter that makes it behave better with hyper threaded processors, but no real fix yet.
–
Dave JohansenAug 7 '12 at 16:25

2

CFS is the completely fair scheduler? This sounds interesting - I ran into a performance problem with a java-based application on SLES11 SP2, too. The difference (to SP1) is the change in CFS...
–
NilsSep 27 '12 at 20:25