SQL 2016 – It Just Runs Faster: Automatic Soft NUMA

As hardware continues to expand and evolve SQL Server testing and customer reports have highlighted the need to partition activities for optimal scaling.Partitioning based designs are common ways to localize activities and improve performance and scalability.An example of how SQL Server leverages partitioning is the CMemThread object.

For thread safety various synchronization mechanisms are utilized (spinlock, latch, mutex, semaphore, …)For discussion purposes let's focus on a spinlock with imperial, computer industry testing results.A highly contended spinlock does not scale well beyond 8 CPUs.The CPU is utilized and N-1 spinners are unable to acquire the lock, only one owner is possible.

The SQL Server product team studies and tracks internal structures and partitioned designs.Many of the SQL Server, common structures are designed with various partitioning schemes and rooted around the NUMA layout of the machine.

The core inception of NUMA partitioned based designs evolved in SQL Server 2000 and more so in SQL Server 2005.As the SMP, single node machines, advanced beyond 8 CPUs the scalability issues were uncovered and design changes made to address the issues.To combat the scalability NUMA partitioning was a standardized choice.During SQL Server 2000 and 2005 development 8 CPUs per NUMA node was a high-end system.Hardware advancements have 18 cores in a single NUMA node today and expose SMP like scalability issues within a single NUMA node.

Even today, Soft NUMA can be used to divide a physical node into multiple logical nodes presenting a different layout to the entire SQL Server and adjusting the partitioning to optimize scalability and performance.Microsoft recommends use of Soft NUMA on the newer, large CPU NUMA system deployments to increase performance.

DMV: The softnuma_configuration_desc column in sys.dm_os_sys_info can have one of the three values: OFF / ON / MANUAL

Your mileage may vary but, here is a testing results from the SQL Server 2016 test harness:"With HT aware auto soft-NUMA, we get up-to 30% gain in query performance when DOP is set to the number of physical cores on a socket (12 in this case) using Automatic Soft NUMA."

The automatic, soft NUMA behavior is Hyperthread (HT/logical processor) aware.When determining the optimal node layout the logical CPU information is queried and used to prevent groupings of logical only and physical only nodes which could lead to performance variations across the nodes.

Furthermore, many of the background processes are created within each node.The partitioning and creation of additional nodes scales background processing.For example, each node contains a worker to listen for network activity and performs encryption activities.The additional nodes created with a soft NUMA configuration increases the number of listeners, scaling and network and encryption capabilities.