Measure the impact of remote versus local NUMA node access thanks to processor_group_name

Starting with oracle database 12c you can use the processor_group_name parameter to instruct the database instance to run itself within the specified operating system processor group. You can find more detail about this feature into Nikolay Manchev’s blog post.

So basically, once the libcgroup package has been installed:

> yum install libcgroup

you can create an “oracle” (for example) group by editing the /etc/cgconfig.conf file so that it looks like:

For accurate comparison, each test will be based on the same SLOB configuration and workload: 8 readers and fix run time of 3 minutes. Then I’ll compare how many logical reads have been done in both cases.

Test 1: Measure local NUMA node access:

As explained previously, I do it by updating /etc/cgconfig.conf file that way:

As you can see, during the 3 minutes run time, SLOB generated about 4 500 000 logical reads per second with local NUMA node access in place. Those are also FULL cached SLOB runs as the top event is “DB CPU” with about 100% of the DB time.

Test 2: Measure remote NUMA node access:

As explained previously, I do it by updating /etc/cgconfig.conf file that way:

As you can see, during the 3 minutes run time, SLOB generated about 2 100 000 logical reads per second with remote NUMA node access in place. Those are also FULL cached SLOB runs as the top event is “DB CPU” with about 100% of the DB time.

This is about 2.15 times less than with the local NUMA node access. This is about the ratio of the distance between node 0 to node 7 and node 0 to node 0 (22/10).

Remarks:

The SGA is fully made of large pages as I set the use_large_pages parameter to “only”.

To check that only the CPUs 1 to 9 worked during the SLOB run you can read the mpstat.out file provided by SLOB at the end of the test.

To check that the oracle session running the SLOB run are bind to CPUs 1 to 9 you can use taskset:

To check that your SLOB Logical I/O test delivers the maximum, you can read this post.

With SLOB schema of 8 MB, I was not able to see this huge difference between remote and local NUMA node access. I guess this is due to the L3 cache of 24576K on the cpu used during my tests.

Conclusion:

Thanks to the 12c database parameter processor_group_name, we have been able to measure the impact of remote versus local NUMA node access from the database point of view.

If you plan the use the processor_group_name database parameter, then pay attention to configure the group correctly with cpuset.cpus and cpuset.mems to ensure local NUMA node access (My tests show 2.15 times less logical reads per second with my farest remote access).