Industry Standard Architecture (ISA) technologies have progressed extremely rapidly in the last 10 years. Both Intel and AMD based systems are dramatically different than even just 5-6 years ago. Ten years ago servers usually had either 2 or 4 physical processors plugged into sockets on the server motherboard. The typical memory installed in these systems was 1-2GB, with very few ISA based systems supporting 8 or more processors. The Windows operating system displayed each CPU as one bar in Windows Task Manager. The machine architecture also was very simple with each processor having the same access latency to memory and other resources.

Today ISA servers have increased exponentially in processing capacity to be on par with high cost proprietary UNIX system and complexity. These developments and the associated terminology tends sometimes to confuse people a little. The implications for software licensing of the Windows operating system and SQL database is sometimes unclear.

Today SAP on Win/SQL customers routinely run on servers with 8 processors, 128 logical processors and 512 GB of RAM is nothing unusual. Even on 4 processor commodity Intel servers such as the HP DL 580 G7 we have customers with 512GB RAM. Typical 2 CPU servers are now configured with 128GB of RAM and have 24 logical processors. Let’s go through some terms:

Processor

Sometimes referred to as “CPU” or “Socket”. This is the packaged physical piece of silicon that contains all the cores and required shared components. The CPU is the package of components that needs to be put in the processor socket on the motherboard. Besides the multiple cores which are contained on each processor, the current generation of processors by AMD and Intel contains the Memory Controller and the bus to external memory which is administrated by this one processor. In the Windows and SQL Server space, we use the term socket or processor side by side also due to Microsoft’s per socket or per processor licensing.

Intel & AMD both observed that the physical limitations of CPU manufacturing processes and materials meant that increasing Processor clock speed too much over 3GHz resulted in several unwanted side effects. The most apparent side effect was heat. In order to continue to exponentially improve performance Intel and AMD added multiple CPU cores onto each CPU. Sometimes these CPU cores shared L3 caches and other components. All of these CPU cores were integrated onto one physical processor and plugged into one socket on the server motherboard. Today servers with 6-,8- and 12-core processors developed by AMD and Intel are commonly deployed in our SAP customer base we monitor. Today proprietary UNIX hardware has followed the Intel & AMD multicore approach after some years attempting to increase CPU speeds to 5GHz or higher.

Logical Processor

Windows operating system Threads are mapped 1:1 onto Logical Processors. When a server boots the BIOS reports the number of Logical Processors to the operating system at the very earliest stage of starting the operating system. Opening Task Manager and going to the Tab ‘Performance’ will show you the number of Logical Processors. The current supported limit with Windows 2008 R2 Datacenter Edition is 256 Logical Processors. SQL Server 2008 R2 also is supporting a maximum of 256 Logical Processors. Another common terminology used for this unit is ‘CPU thread’. In SAP Benchmark publications, the term ‘threads’ is used to describe this unit. All new Intel Nehalem Processors (such as Xeon 55xx, 56xx & 75xx) are Hyperthreaded meaning each physical core is presented as two logical processors. This results in twice the number of logical processors displayed in Windows Task Manager.

NUMA Node

Each node on a Non-Uniform Memory Access based system is a collection of processors which accessed the same memory. Usually a hardware architecture had more than one NUMA node which was connected via Bus or other topologies. Each NUMA node had its own memory. Applications running on one NUMA node, but accessing memory on the other NUMA node usually encountered longer latency, this in turn greatly reduces performance. Therefore Windows 2003 and SQL Server 2005 included a lot of optimizations to reduce ‘remote’ memory access to a minimum. Today the unit of a NUMA node is usually one processor or socket. Means in most of the cases there is a 1:1 relationship between a NUMA node and a socket/processor. Exception is AMDs current 12-core processor which represents 2 NUMA nodes due to the processor’s internal architecture. All new ISA servers are NUMA based after Intel stopped using Front Side Bus technology on Nehalem processors with Quick Path Interconnect. AMD have used Hyper-Transport for some years already

SAP Kernel is completely NUMA unaware, therefore we do not recommend running SAP application servers on large scale up NUMA systems.

Processor Group

In order to get beyond the former Windows limitation of supporting a maximum of 64 Logical processors, a new grouping system was designed. This Unit is a Processor Group or short ‘Group’ Each processor group can contain a maximum of 64 Logical Processors. In order to get to the current supported limit of 256 Logical processors, four processor groups are defined by Windows 2008 R2. More details can be found here: http://msdn.microsoft.com/en-us/library/dd405503%28VS.85%29.aspx

This graphic displays the hierarchy:

The graphic below shows an Intel Nehalem EX 8 core Processor. Each of the 8 cores can be seen. Each of these cores is Hyperthreaded and displays as two bars in Windows Task Manager. A server such as an HP DL980 has 8 of these 8 core Processors. Total Logical Processors = 8 Processors x 8 cores x 2 for Hyperthreading = 128

Let’s look at a server which has 8 Intel Xeon 7560 CPUs. Let’s assume Hyperthreading is enabled. Then we look at:

Logical Processors: 128 (this is what is displayed in Windows Task Manager)

Cores: 64

Sockets/Processors: 8

NUMA nodes: 8

Processor Groups: 2

Let’s compare that with a server having 2 brand new AMD Opteron 6174 where we look at:

Logical Processors: 24 (this is what is displayed in Windows Task Manager)

Cores: 24

Sockets/Processors: 2

NUMA nodes: 4

Processor Groups: 1

Hope this explains the terms were using a bit. Oh yes, what about the term ‘CPU’? Good question. We are seeing it used all over the place. In the most common usage we still see it used for what we defined as the Logical Processors. However we also find it used quite a lot for what we defined as socket/processor. Therefore if somebody talks about a server with x number of CPUs, better ask what really is meant.

Join the conversation

Question on NUMA. In this blog posting and this one(blogs.msdn.com/…/frequently-asked-questions-we-heard-on-the-sap-on-sql-2008-training-course-this-year.aspx), you mentioned not to run SAP systems on large scale NUMA servers, and that we should stick to a 2 CPU system. I take it that this is a 2 socket system with multiple cores. Does the recommendation hold true for virtual servers running on Microsoft's Hyper-V or VMWare's software? When creating virtual servers with vCPUs, do you have control over which sockets / cores that you want your VM to use to minimize the remote memory accesses?

1. Yes – the best performance and lowest cost application server solution is a scale out 2 processor solution. Customers often use blades

2. Yes – 2 socket systems do have multiple cores

3. Yes – the recommendation to use 2 socket systems is particularly important if virtualization software is used. This is due to NUMA issues.

4. Yes – both Hyper-V and VMware give control over which physical cores vCPU run on, but there are limits around this (for example if your vCPU and memory is higher than one single NUMA node)

5. The only way i know to eliminate remote memory accesses is to make the VM smaller than one single NUMA node. So this means your VM has only 1/nth (where n is the number of Processors) the SAPS of the server. Possibly this will have a performance impact on large & busy SAP systems.