Diagnosing Spinlock Problems By Doing The Math

In this blog post I’m going to outline a very simple technique for working out whether spin lock activity is the cause of excessive CPU consumption. What is A Spinlock ?

A spinlock synchronizes access to internal data structure within the database engine, any thread wishing to acquire a spinlock will spin up until the point it can be acquired. The key thing to note is that the wait is largely a “Bust wait”, meaning that whilst the thread is waiting to acquire the spinlock it burns CPU cycles. A spinlock is a much more light weight mechanism than a latch and it is mainly intended for use in scenarios where the ‘Expected’ wait for the lock is short. This simple pseudo code illustrates the logic behind a spinlock acquire and release:

After a certain amount of time the thread will yield, this is referred to as a backoff:

SQL Server 2008 R2 introduces the concept of “Exponential back off” whereby the amount of time that elapses between a thread attempting to acquire the spinlock and the thread sleeping decreases exponentially over time. This call stack excerpt illustrates a Sleep associated with a spinlock acquire:

If you have genuine cause to look into spinlocks in anger, its highly likely that your server has at least two CPUs and hardware NUMA enabled. Spinlocks travel between NUMA nodes as memory cache lines:

NUMA node memory transfers between i Series Xeon’s CPUs is via the Quick Path Interconnect (QPI). The latency is passing cache lines between NUMA nodes is considerably more than that when passing cache entries between CPU cores on the same socket or NUMA node:

It is therefore important that the quick path interconnect is working as efficiently as possible, this usually involves making sure that the server firmware is uptodate, coreinfo from sys internals can be used to check the latency between NUMA nodes.

Why Spin ?

This simple graph represents the latency in CPU cycles for accessing main memory and different parts of the CPU cache hierarchy:

Let me add something to this, which will dwarf everything on my original chart:

The context switch is the great white shark of performance killers, but what is this ?. When a CPU core has to switch between threads, this is known as a “Context switch”. The switching bit should be relatively self explanatory, the context element refers to the saving of the execution context of the thread being switched out and the loading of the execution context being switched in. Execution context includes:

CPU register contents

Call stack of the thread

Virtual memory mapping information of the threads working set:memory pages it is currently using.

Because of this; fewer CPU cycles are expended if a busy wait is performed compared to switching a thread out if the wait is brief.

When Should I Suspect Spinlock Activity Might Be An Issue ?

The usual suspects to look out for are:

High CPU utilization

High concurrency, this typically affects OLTP systems

As the number of threads performing transactions goes up, transactions per second remains static but CPU utilization increases.

Determining that spin locking is an issue can be a bit more subtle than this, hence this blog post ! ;-), however, some simple sums can help us get to the bottom of this. Let me provide a simple example in the form of a test I ran:

Server has two CPU sockets, each CPU is clocked at 2.4Ghz with six cores and hyper threading turned on, this results in SQL Server seeing 24 logical processors.

Most of the check boxes for spinlock activity being a potential issue have been checked:

High CPU utilization, 100%, plus because the CPU is running at 2.49Ghz, turbo boost has kicked in on the CPU clock.

Concurrency

Latches with lots of spins and back offs, the most active spin lock has eleven digits.

We have everything except the nice graph depicting that transactions per second plateaus off whilst CPU consumption increases as the number of threads is increased.

Where Does The Math Bit Come In ?

It comes in here, what I’m going to do is to turn the number of spins into a percentage of the CPU cycles expended:

First we calculate the number of CPU cycles (roughly) that have taken place during the test, this is:Time (s) x CPU cycles per second x physical cores used 84 x 2,490,000,000 x 12 = 2,509,920,000,000

The second step is to compare our spins to the total number of CPU cycles expended, I will do this for the XDESMGR spin lock and the total number of spins:XDESMGR spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 12,123,870,745 ) / 2,509,920,000,000 ) * 100 ) = 0.48% Total spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 14,581,540,862 ) / 2,509,920,000,000 ) * 100 ) = 0.58%

Can We Do Better ? Yes we can, if we take our most active spin lock, windows performance tool kit can be used to quantify the CPU time spent spinning on it. The top of the stack provides the overall CPU time SQL Server has used:

This is a more accurate calculation in terms of providing something that is directly tangible to CPU time, however it still spells out the fact that spins on XDESMGR are not responsible for maxing out the server’s CPUs.

Using the computation graph in the windows performance analyzer analysis view the XDESMGR spinlock spin CPU time can be visualized compared to that for the database engine as a whole. The green line at the bottom highlighted by the red box is the CPU consumed by the XDESMGR spinlock:

There you have it, a two simple ways of determining if spinlock activity is causing excessive CPU consumption “By doing the math”.

I must give thanks to Thomas Kejser for letting me borrow the first three images in this blog post from his “Master Tuning” slide deck.

In honesty, based on my testing I suspect I need at least eight physical cores per socket under load to perform test in which spin lock activity is a cause for concern. As part of the work that went into this blog and the one of singleton inserts I’ve got enough material for another two blogs which cover the ramifications of spinlock activity from different angles, these should be posted in the next couple of weeks or so.

Chris,
I have looked at Thomas Kejsers blog on spinlocks too. I am trying to track them now. Your formula is a great start point on when to be concerned. I was wondering at what level should we worry about spinlocks if you don’t see 100% CPU spikes? Do you have a major percentage point when this should be a concern? Your example showed the XDESMGR spinlock just being 2.43%. If it had been say over 40% then might you consider this a being a major reason for the high CPU?

Just trying to get a clear idea of when to be worried about spinlocks.

As a general rule of thumb you need to be maxing out your CPU capacity and be seeing spins of the order of billions (US billion with 9 zeros). However above and beyond this it depends on which spin lock you are seeing a high number of spins and back offs on. For example, the LOGCACHE_ACCESS spin lock governs access to the log buffer, its not unheard of 20% of a servers CPU utilization to be spent on spins on this one spinlock under a heavy OLTP load. Apart from affinitizing the rest of the data engine away from the core the log writer is running on and using CPUs with the fastest clock speed you can buy, so that threads can obtain and then release this spinlock as fast as possible your options are limited. A full answer to this warrants a blog post in itself which I will endeavor to write.