Sunday, October 16, 2011

Probably the first thing I learned about query tuning is that "clock-on-the-wall" time is definitely not the thing you want to measure. The elapsed time a query takes can be influenced by countless factors (other processes running, disk speed, number of processors, amount of RAM, phase of the moon, a few hundred database settings) that it's too rough a metric for serious performance tuning.

What I've been taught to measure, instead, are logical reads, which are the number of disk I/O reads needed to execute the query. If the query doesn't change, and the underlying data doesn't change, it's been my experience (and it makes sense theoretically), that you can run a given query all day long, under any load conditions you can create, and the number of logical reads will always be the same. This immunity to outside "noise" makes it ideal for tuning: you can be pretty sure (given the same schema, of course - indexes leap to mind - and at least similar data) that the logical read counts on your machine will be the same on another machine (such as the Production server, or your boss's laptop).

Ok, so how do we capture logical reads? Easy - simply enable the STATISTICS IO option, like so:

SET STATISTICS IO ON

Once turned on for a given session, this emits the number logical reads made for every table (including temp tables) for every statement executed, until you turn it off (by saying SET STATISTICS IO OFF). Let's look at a single line that would be emitted:

You'll notice a lot of other values displayed on this line, but for now, ignore them. One that you should especially ignore (for now) is physical reads, because its count is included in the logical reads value. That is, in our one-line example, we can see there are going to be 332 pages read from somewhere. That "somewhere" will be either in RAM (the buffer pool), or from disk (your "C:" drive, for example), but both of these "somewheres" are counted by logical reads. Physical reads only counts what was read from disk this time, which is the kind of unpredictable "clock-on-the-wall" information we know to be unreliable for query tuning (for now). Why are they unpredictable? Because who knows why the page wasn't in buffer pool in memory, and so had to read from disk, at the precise moment we needed it? It's unknowable, and even if it weren't - if you had perfect knowledge of the innards of the computer at that moment - it's still unpredictable.

Looking at our one-line example above, everything seems pretty rosy. SET STATISTICS IO ON spits out the logical reads we need, so all we have to do is total them up. But this is where it gets ugly, because the information we're interested in is interspersed with our own PRINT messages, RAISERROR output, warning and informational messages from the system, and data output from SELECT statements (if we're using "Output to Text" mode, which I often do, as it's the only way to get the dang output window to scroll). This can all be quite a mess to read:

Now, try to total up the logical reads in that output. If there's only a few lines of output, it's manageable, but imagine trying to decipher hundreds of lines of stuff like this. Over time, I developed a rigamarole system:

copy the entire contents of the Output window to the clipboard

fire up Excel, and paste the output into it

from Excel's "Data" tab, click the "Sort A to Z" button

select only the rows that start with "Table..." and copy them to the clipboard

paste the rows into a new window in Management Studio

use a regular expression search-and-replace to strip out all the text on each line before the logical read numbers

use a regular expression search-and-replace to strip out all the text on each line after the logical read numbers

copy the column of logical read numbers back into Excel

from Excel's "Formulas" tab, use the AutoSum button to, finally, calculate the total number of logical reads

"Crude but effective" doesn't quite do this process justice. There must be a better way, I thought.

My first idea was an AddOn for Management Studio, but the more I looked into it, I realized I just couldn't face the tedium of application programming (yes, I'm now permanently spoiled by a platform - SQL Server - that actually works without swearing at it). And while SQL Server isn't exactly famous for its text-manipulation ability, what I was was doing didn't have to be fast or elegant. And I'd rather hack something together in T-SQL in a few hours than beat my head against the C# wall for a few days. (Don't believe me? Ask someone who's written a truly great Visual Studio add-on.) As a bonus, the people using the code would be able to understand it, and maybe even build upon and improve it.

So, after an embarrassingly long delay caused by forgetting, again, that REPLACE replaces all occurrences, not just the first one, here it is. It works correctly as far as I can tell, but don't use it as an example of how to write good T-SQL code. (Please tell me about any bugs you find. Use at your own risk. Some settling of contents may occur during shipping.)

To use this script, just copy-and-paste the entire contents of the Output window between the two lines in red in the @Text variable's definition, and hit F5 (Ctrl-E, whatever) to execute the script. You should get output like this:

The first set of results represent the SET STATISTICS IO rows that have non-zero logical reads, useful mostly for debugging the script. You can suppress this output by commenting out the first SELECT statement in the script (third line of code from the bottom). The second set of results is what we're looking for: the total number of logical reads that appear in the text of the Output window.

Sunday, October 9, 2011

Remember that Twilight Zone episode, where a man sells his soul to the Devil for immortality, only to be sentenced to life in prison the next day?

I'm learning SSIS (2005, for now), and have discovered, like all those before me, that Microsoft implemented Annotations, but... they don't word-wrap! Your Annotation must fit on one really long one line! Bwa-ha-ha-ha-ha-ha!

Another very interesting read is [SSIS] Modifier l'apparence des packages via l'API [FAIL] by François Jehl. (We non-Francophones can use Google's translate-to-English, or the page itself has a Microsoft translation button.) After some research, his conclusion is that Annotations are a feature from a product created before SSIS ("DaVinci Design Surface", hence "MSDDS"), which might explain why Annotations are hacked implemented in SSIS package files as XML nested inside the document's XML. (Hmmm... that would explain all the &lt;s and &gt;s floating around!) For what François was attempting, that was a fatal roadblock. Fortunately, we're not trying anything nearly so ambitious or useful as he was - certainly nothing Notepad can't handle!

Anyway, using what they've uncovered, here's how to make your Annotations word-wrap. (This is how I do it - use at your own risk, and make a backup of your package file first). Close Visual Studio, and open the package file in Notepad. Search for controlprogid="MSDDS.Text", and you'll be treated to nested XML that looks something like this:

See the red "2" at position 61 of the ddsxmlobjectstreaminitwrapper item? To enable word-wrap, simply OR in a value of 1: in this case, we change the "10" (decimal 2) to "11" (decimal 3). Once that's done, save the package file (you made a backup first, right?), exit Notepad, open the package with Visual Studio, and - viola! - your Annotation is word-wrapped! Now do the same thing for all your other Annotations. (Hey, it's a hack, not magic.)

So, if it's just a bit-flip, why didn't Microsoft simply put a checkbox on the Properties page for Annotations? Because... Annotations don't have Property pages! Bwa-ha-ha-ha-ha-ha!

If you entered the value for the source and/or target filename variable and enclosed it with double-quotes, try removing them. Worked for me.

(Before coming over to the SQL Server side, I was a full-time Windows application developer, working in C, C++, and C#, ever since Microsoft QuickC for Windows was released, way back in 1990. If you've ever wondered what it's like being a Windows application developer, imagine that everything in SQL Server was designed and implemented just like SSIS. If you've ever spent much time writing application code for Windows, you'll know exactly what I mean. If not, count your blessings!)

Before making the move to SQL Server, Larry specialized in Windows programming in C/Win32, C++/MFC, and C#/.NET. His blog focuses on query tuning, database optimization, and system performance, and features handy scripts, practical tips, and occasional dispatches from the many dark corners of SQL Server.