I also decided for a test of a system that’s already under CPU pressure - so the Processor Usage is almost constantly hitting 100%. Under these circumstances I was hoping to get the maximum impact of Extended Events as well as SQL Trace. (Remember: SQL Profiler is based on SQL Trace) One could think that an I/O driven workload would be even more influenced, but that depends on the targets of Extended Events. - 4 of which only go to memory, vs. one file target (which one should put onto a non-data volume anyways). And since I also want to measure the pure overhead of the tracing technologies (without any target), this makes even more sense. Besides that, I/O usually can be tuned quite easily by writing to an otherwise idle and fast device, whereas for CPU it is not as simple to keep the costs invisible. Keeping this in mind helps to evaluate the results presented here and bringing them into the right context.

1 Windows 2008 R2 Server with SQL Server 2012 SP1. No running services above the minimum required, to avoid side-effects. I also shut down the default trace as well as the system_health_session. The Distributed Replay-client was run locally for all tests, so yes, this has effect on the server, but it was equally for all tests. All tests have been repeated 3 times (some even more) to make sure the average/median is valid. The server, as I already said, was physical (which comes in handy when you are running benchmarks for a total of ~50 hours): 1 Dual Core Intel Xeon with 8 GB of RAM (which was by far enough), Data Volumes and Volumes for Trace-Targets on separate Controllers.

The workload was a mixture of mostly short (sub-second) but also several queries of several second runtime, (Ad Hoc and stored procedures) to simulate a somehow realistic decision-support-systems’ load – no data changes though. The baseline workload took 00:24:24 minutes. The statistics stayed the same for all workloads cycles. The database used was AdventureWorks2012.

This gave me the advantage to have a valid test for all Extended Event Targets that are provided (only ETW-Tracing was left out) - specifically histogram and Pair Matching (I did not care about the few occasions of lock escalation on purpose).

The following Trace technologies and -Targets were used:

XEvent Trace, Target: None

XEvent Trace, Target: Ring Buffer

XEvent Trace, Target: File

XEvent Trace, Target: Counter

XEvent Trace, Target: Histogram

Here I chose to filter on lock acquired and to bucketize on the lock mode

XEvent Trace, Target: Pair Matching

Guess, what I matched here ;-)

SQL Trace, Target: None (I had to trick a little bit to get this to work, but I made sure, the behavior of this unsupported configuration did not distort the results: It’s just a SQL Trace with no Provider processing, so all events are lost by nature.)

For the sake of saving space and time, I will focus on the benchmarks with the options single event loss and no event loss without causality tracking. In fact, the cost of causality tracking was less than 1% for all targets.

A plain XEvent Session itself, with no Targets, already does have a noticeable impact of ~15 % performance decrease.

The overhead of the artificial (not supported) version plain SQL Trace, without the overhead of a rowset or file provider (aka target), is not too far off the Extended Events Session without the dispatching to any target (~19% versus ~15%).

The Ring Buffer is the XE-Target with the greatest overhead (~40%). (!)

The File-Target really does have the lowest overhead as expected, but it’s still 25%.

The differences between the Targets Counter, Histogram and Pair Matching are irrelevant.

Yes, even the simple Counter has a higher overhead than the File Target.

Configuring the file target with no event lossadds approximately 4 times as much performance overhead.

The difference for the other targets between single, multiple and even no event loss is mostly negligible in this scenario.

Compared to Extended Events, the overhead through SQL Trace is almost “brutal”: It quadruplicated the total runtime.

And there's even more: Tracing with SQL Profiler is absolutely impossible for such a scenario: I filled in 21 hours, but that’s actually an estimate: I stopped those test cycles after around 3 hours (forgive me, but it’s just a waste of time...), counted the finished events and calculated how much else it would have taken, roughly – in an actually rather optimistic manner. Profiler, as I always say, is a “no go” for tracing longer periods of time, especially if performance is already at stake.

Now it is important to note what I actually used as a filter: The database_id / source_database_id. - I essentially added a filter that would never return “true” by adding a filter on database_id = 77 – since no session/query/lock was on this non-existing database. Why did I do that? – This is a totally artificial test/benchmark. I am interested what the pure overhead of tracing is, and not what happens if a certain percentage of activity is not logged. (That might be something for a different benchmark. :-))

As one can see, the overhead of the pure XEvent session without target drops from ~ 15% to 9% respectively ~ 16.5% vs. 12.5 % (no Event loss)

The other targets do in fact show more differences in this setup:

The Counter Target with single event loss benefits most of the filter and drops from ~33.5 to 8.4 % overhead.

Only Ring Buffer and Pair Matching show no difference between single and no event loss.

SQL Trace with and without rowset provider (file) as target have almost the same overhead – no surprise if you know the architecture. This is in fact another huge difference which counts for Extended Events, where filters get honored much sooner within the Trace.

That the overhead of SQL Trace without target and at the same time all events being filtered outis actually higher than the overhead without target but no filtering (from the former test cycle) must be caused by the processing of for filtering out. I will not dive more into this though, since this setup is not documented and or supported. And in my eyes, “SQL trace is dead” anyways ;-) – okay, I am exaggerating a bit here, but I am also just being frank: For SQL Server 2012 there is almost no need to use it any more. – Running benchmarks using Distributed Replay is such a reason – isn’t that ironic?

I did not run SQL Profiler for this Test, forgive me. You probably know why by now. ;-)

XE_DISPATCHER_WAIT - From BOL: Occurs when a background thread that is used for Extended Events sessions is waiting for event buffers to process. - You should be able to safely ignore this unless you believe a problem is occurring with processing of events for async targets. Since this works on a queue you can have bursts of high wait times especially when no XEvent sessions are active.

XE_TIMER_EVENT – From BOL: Used to indicate a background task is waiting for "expired" timers for internal Xevent engine work. - You should be able to safely ignore this one. Just used by the Xevent engine for internal processing of its work. If something was possibly wrong with Xevent processing you might see if this thread ever "wakes up"

XE_BUFFERMGR_FREEBUF_EVENT - which by BOL means: An Extended Events session is configured for no event loss, and all buffers in the session are currently full. This can indicate that the buffers for an Extended Events session are too small, or should be partitioned.

So, I hope this was interesting for you. You may draw more and other conclusions out of these results. Remember though: this is a very specific scenario with no CPU reserves, so the Tracing/observer overhead has to show up – no choice but to take away (CPU) resources.