Performance impact: SQL2008 R2 audit and trace

We are told that SQL Server 2008 R2 Audit (and SQL Server 2008 Audit) has much less performance overhead than SQL Trace when we try to capture the same information. Knowing how SQL Server R2 Audit is implemented (i.e. on top of the extended events infrastructure), I’ve always taken that as a given and never bothered to check it out.

Recently, I had to capture some object access information, and it turned out that SQL Audit was not the most convenient tool for the task. I had to go back to SQL Trace. There, the question of how much cheaper is an audit relative to an equivalent trace came up. I thought it’s time to explore that a bit.

I wanted to explore two questions:

In the most extreme case (i.e. when the processors are heavily loaded), how much degradation in transaction throughput should I expect if I turn on SQL Audit vs. if I turn SQL Trace?

If the processors are moderately loaded, should I expect a performance degradation if I turn on SQL Audit or SQL Trace?

The first question is the worst case scenario you probably should never run into. But it’s always good to know where the boundary is. The second question is a realistic and more common scenario.

Heavy CPU load. This was generated by 100 simulated users submitting the standard TPC-C readonly queries without any wait between any two consecutive calls. The total processor time on the 12-core (or 24-logical-processor) server was sustained at ~92%, and the transaction throughput (as measured by one of the TPC-C transactions) was steady at ~16,000 per second or about 32,000 batch requests per second on the SQL Server 2008 R2 instance.

Medium CPU load. This was generated by 100 simulated users submitting the standard TPC-C readonly queries with a 10ms wait between two consecutive calls. Compared to the previous load scenario, this otherwise rather small wait caused the total processor time on the server to sustain at a very low level at about 2%. In a way, it may be more appropriate to label this a light load level. But the sustained transaction throughput was about 3,100 per second or 6,200 batch requests per second, which still kept SQL Server busy looking up the pages.

For each of the above two load levels, three conditions were tested:

No Audit/No Trace. In this case, I did not enable any SQL Audit session or run any SQL Trace. This was our baseline.

With Audit. In this case, I turned on a standard audit with 1000ms queue delay and the output was sent to files with each file capped at 500MB. The database audit specification included all the SELECT statements on all the TPC-C user tables. No SQL Trace was enabled.

With Trace. A SQL Trace that was more or less equivalent to the audit described above was started with the trace results sent to the same drive folder as in the above audit. The trace files were also specified to rollover to a new one ever 500MB. No SQL audit was enabled.

The SQL script to create the audit and the database audit specification is listed below:

With all the CPUs on the server heavily loaded, turning on SQL Audit to log all the SELECT statements caused the transaction throughput to drop from 16,043 per second to 13,825 per second. That was about 14% drop. Given how heavy the load was and how busy the traffic was on the server, that’s a relatively small drop. Now the impact of turning on the trace was quite a different story. There, the transaction throughput went from 16,043 per second to a mere 5,012 per second. That was a 60% drop. if we use 5012 as the base reference, turning off the trace increased the transaction throughput for more than 300%. The difference in terms of performance impact was striking between SQL Audit and SQL Trace.

Again, I should note that this was an extreme case. In a real world situation, most likely the system wouldn’t be this busy. So the 14% drop in throughput is probably the ceiling, and you should probably expect much less an adverse impact by turning on a similar audit on a real production server. This next chart shows the result of turning on an audit (and separately a trace) when we don’t beat up the server with non-stop transaction traffic. As you can see, by just giving the server a little bit breathing room with a 10ms wait between the calls, the load on the server was reduced significantly and turning on either the audit or the trace had no impact at all on the throughput. Yes, the total CPU consumption went up a little bit as a result of turning on the audit or the trace, but the difference was rather insignificant.

These test results are good news for the SQL Server 2008 R2 audit feature. Now, only if it could be made a bit more useful!

The test environment was a DL360 G7 with two X5690 (6 core) processors and 64GB of RAM. Hyperthreading was enabled. The OS was Windows 2008 R2 Enterprise x64 Edition with SP1 and DBMS was SQL Server 2008 R2 Enterprise x64 Edition (10.50.1600).

Comment Notification

Comments

Any chance you reconciled the data captured between the sources to ensure they captured consistent output? I've never reconciled the output from XE & SQLTrace from systems under pressure & would be interested to know whether they actually produce consistent data under pressure.

XE is often touted as more efficient & this might be true but its a marginal difference in most real life systems (very few systems are fully traced without filtering as in this test). Most systems capture trace with some sort of filtering which eliminates most of the material performance overhead of SQLTrace over XE.

However, XE is clearly more functional than SQLTrace (eg event matching etc) which should be good enough reason for people to use it on SQL 2008 + systems, but I argued against introducing XE years ago because there are already too many sources of performance information (Perfmon, DMVs, SQLTrace etc) which are already hard to collect & cross-interpret.

Adding another major source muddies the waters even further & the beenefits aren't that significantly worthwhile imo.

It's actually quite arguable whether the audit setup and the trace setup I tested are in fact similar. And I'm ready to concede that the test was kind of 'rigged' in favor of the audit setup. Just I wanted to get a feel for some sort of worst case scenario.

I'm repeating the test with the trace setup using one of the security audit event, i.e. 'Audit Schema Object Access Event' to be precise. This would be a more apple-to-apple comparison between audit and trace. Also, I used the statement completed events in this test because that's how most people would set it up.

I did not include any filtering because this was an isolated test with no other activity going on at the same time, and the statement completed events are not the most convenient for setting up filters. The Audit Schema Object Access Event, however, is much more convenient for filtering for specific objects, in this case, the tables used in the benchmark.