Those that have been reading this series might be be thinking, “Is he going to go there?” Well, the answer is “Yes.” This is the GUID clustered index post that had to be. It’s inevitable with this tool.

If you follow SQL Server at all, you are probably aware of the long-standingdebate about whether it is wise, desirable, smart, useful, or what have you, to identify rows using GUIDs. I won’t take a position on that, but I will show here, I hope objectively, a few things that the visualizer shows about file layout vs. distributed inserts, distributed inserts being one of the main challenges around using GUIDs as clustering keys. Just to recap the argument very, very briefly:

Advantages

GUID keys can be generated at the client, which saves a round-trip to the database server to create a collection of related rows.

GUID keys can make certain architectures like sharding, or peer to peer replication, or merging multiple source databases, simpler.

Disadvantages

GUID keys are wider, therefore they take more space in memory and on disk. The additional space is multiplied by their presence in both clustered and non-clustered indexes if they are a clustering key.

GUID keys don’t only take more space in RAM and on disk because of their width. They also cause distributed inserts into the clustered index – that is, new rows are added to any and all pages in the index. Each time a row has to be added, the target page must be read into memory, and at a checkpoint, the whole changed page (both existing and new rows) must be written to disk. This has two effects:

The amount of RAM and disk IO required for inserts is probably much higher, as pages with existing data must come into cache, get changed, and then be written back out again. Essentially, large parts of the table have to be rewritten to disk to append rows to pages that have data already.

The pages that store the index will individually fill up, and have to split such that half the existing rows are written back out to the “old” page and half written out to a “new” page in a different location on disk. This causes the pages to be less full, the same number of rows to require more space on disk and in RAM, and the resulting index to be massively fragmented on disk.

I am not writing to argue these points, which have I think been established by both sides of the debate, only to see if the visualizer shows these effects clearly. Most of the argument isn’t actually about these facts (they are all true, as far as I know) but rather which are more important, and I think that is the main source of debate on the issue.

Visual Example of Distributed Inserts

It’s very easy to create an example of this with a small sample database. I created one called “VizDemo2.” VizDemo2 has a slightly modified structure to illustrate what’s going on here – I need two tables that are stored separately on disk, so that they cannot interfere with one another. The simplest way to do that is with a couple of file groups containing one file each. So here’s the structure:

I created the database with a 50MB, single file, Primary file group

I added a file group FG1 with one 75MB file

I added a second file group FG2 with one 75MB file

When the database is empty, the visualizer shows only the system pages at the start of each file, as shown here:

To that database I added two sample tables identical in structure but with different clustering keys:

Compare

After inserts, the resulting graphic does show some facts we know to be true:

First, the data in the integer-clustered index takes about eight bands of the diagram, while storing the same data in a GUID clustered index has required about twelve bands of data pages. The database itself supports that impression with space allocation – it reports these figures:

Part of the extra space required is the width of the key, but part of it is the empty space on each page resulting from page splits. If a page that needs a new row is too full, then half the rows from that page are moved to a net-new page, half left in place, and the new row added to one or the other of the resulting pages. Afterward, they are often both partly empty.

Second, the whole graphic in the GUID clustered index area is a dark blue that the visualizer uses to show fragmentation – in fact, the object is almost perfectly fragmented, with practically no contiguous pages at all. The sequence of pages in the leaf level of the index is still a linked list, as always, but it it is physically stored in essentially random order on disk.

Does Re-Indexing Help?

The next question is whether we can combat these problems by doing huge amounts of index maintenance – if we rewrite the GUID index, will that make it take less space, or make it more efficient? The answer is, “well, sort of, temporarily.”

First, re-indexing will put the table in “GUID” order. Whether that really helps or not is debatable, perhaps. It would enable read-ahead for the index, which is otherwise clobbered by the fragmentation. Having the table in “GUID” order might or might not be of any help to performance. Second, re-indexing will make the pages denser, or less dense, depending on the fill factor applied. For the sake of demonstration, let’s re-index with the default fill factor, because I think that happens a lot out in the world, and it may tell us something:

ALTER INDEX ALL ON dbo.SampleCustomersGuid REBUILD;

After re-indexing, this is a view just of the second file group with the GUID clustered table (note that I scrolled down in the display):

The arrow shows where the data was moved from the old data pages into a new region of the file. And, sure enough, it’s not fragmented (note the lighter color) and it takes less space in the file.

That might sound good, but if this is a real database, inserts probably will continue. In the int clustered case, as we know, new data will be appended to the end of the page sequence, but in this case, new data will have to be inserted into most of the existing pages on disk. Those are all full now, and will have to be split 50/50 to create new pages for the new data, both the old and new pages will have to be written out, and the new pages by definition can’t be in index order with the existing pages.

INSERT dbo.SampleCustomersGuid DEFAULT VALUES;
GO 20000

What we get after more rows are added to the table is what a layperson might call a “hot mess:”

Here everything is fragmented – back to that dark blue – even the pages we just re-indexed a moment ago, because they all split. The table has more than doubled in size, even though we just doubled the number of rows, because the individual pages contain less data.

Would appropriate fill factor be a workaround? In some measure, yes, but it really only combats the issue. The write activity on the table, even with a low fill factor, will still be higher as more existing pages have to be flushed at checkpoints. The pages will still be less dense, and therefore take up more space on disk and in cache. In short – maybe helpful but no silver bullet.

What about Sequential GUIDs? Here I will venture my opinion. Sequential GUIDs have never made sense to me. They solve one part of this problem – the distributed insert part – but at the expense of the very things GUIDs might be good for, namely not demanding a visit to the database to generate an identifier. If you have to come to the database, you already lost this whole argument. Use an integer and solve the rest of the problem at the same time. I can only see it as a sort of band-aid for existing systems that could not be refactored, but, like a bad SUV that combines the worst properties of a car and a truck, it feels like a really poor compromise to me.

I hope this helps to illustrate some of the physical database design challenges that surround the use of GUID cluster keys. In the next installment I’m planning to demonstrate the interleaving of objects, which is one argument for multiple file groups.

In Part 1 of this series, I introduced a little demo app that renders the layout of pages in SQL Server files by object. Today I’ll put that app through its paces to show, in vivid color (well, teal, anyway) the destructive power of the famous Re-Index Then Shrink anti-pattern for index maintenance.

This one is very easy to demo, so let’s go!

First, I created a demo database VizDemo1, with a single 200 MB data file. Into that database I placed a canonical table – highly simplified for this example – clustered on an ever-increasing integer, using identity():

And finally, fire up the little visualizer app and process the database:

The small color bands at the top left corner of the image are the system tables and such that are in every “empty” database to make it run. The blue/green/teal area is the new table we created and populated with sample data, and the gray area represents empty regions in the file.

As expected, the table started writing into the first available space, and, because the cluster key is increasing, pages were allocated to the end of the page sequence in order, and we end up with a crazy-perfect, contiguous linked list on disk.

You can see small darker bars at intervals within the table – most of the pages in the index are “type 1” pages, which are the leaf-level/rows in the clustered index. Those bars are “type 2” index pages that have the upper level(s) of the index. The reason they are darker is that those are a disruption in the leaf level linked list, and the app shades such disruptions as a way to see fragmentation. The list has to “hop over” those pages and then continue on the other side. It’s technically fragmentation, but at this point not harmful at all – but remember that darker color that shows a break in the page order.

A side note: in the midst of the gray area you can see one orange line (and another in the sea of teal). Those are “type 11” PFS pages, which happen on a fixed interval in every file. I don’t think they ever move – they track file allocation and free space metadata. They are like rocks in the stream…

Now, what happens if we re-index this bad boy? Well, a re-index operation has to write all the pages for the object into new, blank pages in the file, and then abandon the old pages. I run:

-- This "moves" all the data toward the end of the file, into free areas
ALTER INDEX ALL ON dbo.SampleCustomers REBUILD;
GO

Then re-analyze the file. As expected, the table has “moved” toward the end of the file, and left free space toward the beginning. It’s still not fragmented, because we had enough room, and it was written in order into that new area by the rebuild:

We can see the gray area near the top is all the “abandoned” pages where the index was, and the data has all moved down into the free area. Ah, but that seems wasteful to some people, am I right? All that empty space – the file could be smaller!

Let’s see the damage that Shrink File does. Imagine that I do this:

DBCC SHRINKFILE (N'VizDemo1' , 70)
GO

First, before we shrink, let’s just scroll down and look at the end of the file:

We have two conditions – the gray part of the file is sort of OK to shrink. There’s just a lone PFS page out there, and removing that does no harm. But once we get into that blue area, the data has to be moved back up into the beginning of the file. Here’s where the problem lies, as I learned from Mr. Paul Randal – the shrink routine will move a page at a time back into that free space, starting from the end, going backward. That makes the pages land in approximately reverse order from the correct index order. Perfect fragmentation. Let’s see if this tool proves him right. Shrink, then re-analyze:

Yep, it’s not immediately apparent, perhaps, but that teal color is a darker shade that indicates every page is a fragment boundary in most of the index – perfect fragmentation! Here’s a better view:

So, how can we clean that up? Well, with a rebuild. But … we need that bigger file. In fact, practically any database in production needs this overhead of available space to be able to perform index maintenance. It’s not “wasted” space at all.

ALTER INDEX ALL ON dbo.SampleCustomers REBUILD;
GO

After the rebuild, the index is back toward the end of the file, but it’s also back in order:

Now, in light of this information, imagine nightly re-indexing on a database with … AutoShrink! <shudder>

Part 1 of a blog series visually demonstrating the layout of objects on data pages in SQL Server

Some years ago a gentleman called Danny Gould created a free tool called Internals Viewer for SQL Server. I’m a visual sort of guy, and I always thought it would be fun and educational to make a simple visualizer, like the one he created, in order to view how objects are laid out in SQL Server files, and to use it to demonstrate how operations like re-index and shrink affect the layout of files.

To that end, and a little bit reinventing the wheel truth be told, I spent this past holiday creating a simple .NET app that renders the file layout of a database into a color-coded bitmap:

Fig 1

The app can scan the pages in a database, grab the header output from DBCC PAGE, parse that, and create a structure with a few key bits of information about every page. It then renders a bitmap from those structures showing a few things (Fig 1):

Each data object (index or table) partition is identified with a unique partition ID in SQL Server. Those IDs are used in this tool to color-code the output by object, from a color lookup table. Each color in the example screenshot represents the pages that are dedicated to a single partition of an object. This screenshot shows AdventureWorks, which doesn’t use the Enterprise Edition partitioning feature, so for this case each color represents one object – every object having exactly one partition in Standard Edition (or in databases that don’t use partitioning).

Unallocated pages are shown as gray gaps. These are regions that are part of the physical file(s), but not used to store anything.

The app flags pages at the end of any fragment of an object using a darker colored band, so it will reveal any non-contiguous structures in the data file(s). Sometimes these happen at the end of a region of the file where one object is stored, but, interestingly, sometimes these can happen in the middle – as shown in the image above where a dark band interrupts a continuous region of the same color.

The app has some very basic mouse-over capability where you can run the mouse over the image and the text fields at right will reveal information about the pages, including the object schema.table.index and partition, and also whether the page represents a fragmentation boundary.

Finally, the app shows what page types are located where in the file using the narrower white/gray/black bands. White represents data or index pages, while other shades of gray or black indicate other kinds of system pages, per Paul Randal’s excellent blog post here.

The Pixels Already Tell a Story

So, what can we learn about the sample database in this image? Here are a few things:

The part of the file shown in the bitmap is fairly dense. There aren’t big regions of unallocated space in the file. A gap in the allocated pages looks like this (enlarged): Empty Region

Objects in the file are not contiguous, and may “hop around.” That is, if you follow the linked list of pages that compose an index, a bunch of them will be in a row, and then there will be a page that links to the next page composing the index but it’ll be in a different location in the file. I’ve called these “frag boundaries” – pages that do link to another page, but where that next logical page isn’t the next physical page in the file. In the graphic the frag boundary pages are colored with a darker dithered pattern. You can mouse over these and look in the text fields at the right in the app, and see the page they link to. Fragment Boundaries

Sometimes the end of a fragment will be adjacent to pages from another object, but it can be the case that there’s a fragment boundary in the middle of the pages for one object – it’s just that the linked list goes up to that point in the file, but then the next page in the index (in index order) isn’t the next page in the file, even though the next page in the file is part of the same object. Imagine a page split in the “middle” of an index – the existing page with half the rows stays in place, and a new page with the other half of the rows gets created in the middle of the logical index order but possibly stored in some other location in the physical file.

Right at the very beginning of the file there’s a special sequence of metadata pages that describe the database, allocations, and so on (again, well documented by Paul Randal). In our diagram this shows up as a series of pages at top left with varying page type (the gray and white) indicators: Database and file metadata pages

In the next installment, I’ll run some test databases through this and we can see what more severe fragmentation looks like, the effect of GUID cluster keys, shrink, and how the data moves around in a re-index operation.

Here’s a short demo video of the mouse-over working (quality is You-Tube limited):

Back in August I posted a first version of a rules-driven solution for backups, index and statistics maintenance and integrity checks. The system in general has been working well, and has saved my team a huge amount of time and effort. We are coming to the anniversary of its use in production soon.

Today I offer an update that contains a few enhancements, performance improvements and a bug fix.

To recap, this is a system or framework to manage many small databases across many instances on many servers in a policy-based/automated way. The details about how to deploy and how to use the system are in the original post, but at a high level:

First deploy the code across a collection of servers using the included PowerShell script DeployMaintenance.ps1. That will create an administrative database on every instance (or use one that you provide) and populate it with the required tables, procedures, etc. The deployment script will also make a handful of SQL Agent jobs that handle all the maintenance on all the databases.

Visit the instances. If there is maintenance already in place, disable or remove it so as not to perform this work twice. If necessary, change or override the default preferences for this solution in the admin database for each type of maintenance. This includes the time of maintenance windows, whether to use features like differential backups or Litespeed, whether there are databases that should be treated as exceptions, with different settings, etc.

Enable the new policy jobs in SQL Agent.

Monitor to make sure things are running smoothly.

The theory behind this project is to save work configuring servers (efficiency), to ensure that the maintenance is in fact deployed and working for ALL databases in the environment (quality), to ensure that it works in a truly consistent way everywhere (consistency), and to make it so that the servers don’t have to be touched by a DBA when changes happen, such as adding or dropping databases (resiliency).

Changes

The updates in this 1.1 version include

Better reporting of errors. The jobs now write log files to the default error log location for SQL Server.

Better recovery from errors. The jobs in some cases continue on error – for example, one failed backup will not stop the backup job and leave other databases without backups.

Performance enhancements for examining backup history and system counters (mainly % log used).

One bug fix for the log backup logic on a server that uses both log shipping and Litespeed.

Disclaimers

The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even imagine using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: http://opensource.org/licenses/GPL-3.0. You may use and modify this code, but not sell it.

This has been run on SQL Server versions from 2005 to 2008 R2. It probably works on 2012, but I have not tested it thoroughly.

I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.

There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.

If you haven’t done index maintenance for a while, and you flip this solution on you might blow up your transaction log. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of Standard Edition of SQL Server and the preference for Online index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.

Lastly, this is a solution for the type of environment that has many small databases, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.

As I talked about in my last post, I just went through a re-indexing project that took the partitioned fact rows from our warehouse and relocated them into new files. There are a lot of tables and indexes involved, so I have a PowerShell “helper” script to generate the actual T-SQL that moves the data. The idea is to find all the indexes that use a particular partition scheme in the database, and make the CREATE INDEX statements that would recreate them on the new partition scheme. This script doesn’t do the re-indexing work, it just outputs the T-SQL that would do the work, so that the SQL can be verified and/or edited and run later, in the context of other modifications to the warehouse.

(This is a much improved version of the work described in this even older post. It frankly wasn’t all that great back then.)

The script was developed on PowerShell 2 and SQL Server 2012 client tools against SQL Server 2005, but I believe the work that the script does (the process of generating the T-SQL and the results) is compatible with any SQL Server EE 2005 or later. It assumes you’ve loaded the SQL provider or SQLPS module into PowerShell.

Most of this is pretty basic – put the tables into a collection, loop over them, and for each table go through each partitioned index, scripting them out. The only parts that were tricky to develop involve the indexes that support primary keys.

The scripter from SMO wants to script out a PK as “ALTER TABLE … ADD CONSTRAINT,” and the problem with that is you can’t use it to recreate the existing index using the DROP_EXISTING option. But, in fact, in SQL Server it is perfectly valid to do a CREATE INDEX … WITH ( DROP_EXISTING = ON ) against the index that supports a primary key. It’s just that the scripter isn’t designed to deal with that idea (as far as I know).

I searched around for some solution to this issue to no avail, but instead fell back on good old hacky find and replace. There are a few semi-hairy regex expressions in the script that locate instances of ALTER TABLE … ADD CONSTRAINT and substitute the equivalent CREATE INDEX statement, and also locate the WITH clauses that don’t have the DROP_EXISTING option, and add that. The gibberish parts of those expressions are mostly looking for object names and escaping square brackets.

If it’s not clear what this sort of thing means (\[[^\]]*\])\.(\[[^\]]*\]) drop me a comment and I’ll try to clarify. Conversely, if you are better at regex than I am, which is likely, any advice you might have to simplify this would be welcome!

A side note about regex and PowerShell:

If you are used to using regex in the find and replace dialog in SSMS / Visual Studio, it’s not the same flavor of regex that PowerShell uses, and there are some important differences:

AND, epically confusing, the dollar signs in regex are not the dollar signs in PowerShell. So

"$1" and '$1'

in a –replace expression are different in important and confusing ways. One will look for a PowerShell variable $1 and try to expand it, while the other is the regex matching group #1 and does regex replacement.

What makes this sad is that the regex one might use in editing T-SQL in SSMS can’t be moved over to PowerShell without a lot of changes.

I had a long night last night of watching Perfmon counters while I coaxed our data warehouse fact data into new files. I learned something through this little project, perhaps dumb and obvious, but important: don’t assume that your re-indexing work, even the biggest flavor, is automatically I/O limited.

Our systems are relatively small, but we do have a nice disk array. This server is just a 2 x 6-core machine with 64 GB of RAM and two HBAs, but it has a mix of SAN Flash and Raid 1+0 FC disks for the SQL Server files. The operation is one I have done a few times. Take the fact data, which is partitioned in to monthly files, and move it into new files, by mapping it into a new partition scheme. We do this about annually to optimize indexes and re-tier the storage.

Basically, I take all the existing tables and indexes, one at a time, and do

to move the data into new, pre-allocated files. The data involved is about 1 TB.

Eight Cores Seems like Enough? Wrong.

I thought, wrongly, that if I did this work with 8 cores in play, that the machine would be running it about as fast as it was capable of going, figuring I was sure to be I/O limited. My reasoning was that “create with drop existing” is a really simple pass-through operation, just reading the existing index, which is already in order, and writing it into new pages. It parallelizes very well* because of the arrangement of the data. How could eight cores be kept busy enough with that work to saturate the I/O subsystem?

Instead, the results surprised me: running with DOP = 8 I was getting, depending on the index, between 200 and 450 MB per second reads and writes to and from the disks. Changing the DOP to use all 12 cores on the machine, the throughput actually increased to 600 – 700 MB per second, making the maintenance significantly shorter. As it turned out, the work was CPU limited somehow, even with eight cores allowed. The eight cores would run at 100%, while the disk system didn’t.

For this case I had the luxury of being the only one on the machine during the maintenance window, which meant I could set MAXDOP to any value I wanted without side effects. I set that to 12 for the duration of the window, to use all cores, and then set it back again.

Your mileage may vary, but with this new world of storage I learned to test for optimal DOP when I have any large-scale index operations.

I enjoy Sudoku as a way to relax on occasion but, being an IT guy, I suppose I am predictably meta: solving them is fun, but making a machine to solve them is even more fun. I created a clunky solver once before based on the idea of using as few T-SQL statements as possible. This year I decided to try to make a fast solver instead. It’s working fairly well and solves most puzzles in under a second on my laptop. It helps I’ve been sick with a cold for the past two weeks, giving me the sort of “enforced relaxation” that drives me to puzzles.

Here’s the basic algorithm:

Create a table for the puzzle solution, and insert the given values from the puzzle definition.

Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This is a bit like the “notes” feature of an electronic Sudoku app, where you can indicate possible values for unsolved cells.

Loop while there is no solution

Using some list of solving rules, “push” solved values from the cells in the candidates table to the solution table.

Use a process of elimination to remove candidate values that the Sudoku rules make impossible.

End Loop

I started with that fundamental idea, but I found that the solver rules I could encode in T-SQL (at least with my level of math skill) covered most but not all puzzles. Some of the advanced ones would stall or reach an impasse where the available rules couldn’t make progress eliminating values. For those cases I added some branching logic to do a minimal level of trial and error. The adjusted algorithm became this more complicated version:

Create a table for 1-n puzzle solutions with a branching identifier. Insert the given values from the puzzle as branch 1.

Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This table also has a branch identifier, and the first set of candidates is also labeled as branch 1.

Loop while there is no solution

Using some list of solving rules, “push” solved values from any branches in the candidates table into the corresponding branch in the solution table

If no cells are solved, then “branch” the solution by duplicating the solved values and the candidates into multiple versions/branches on some cell that has a minimal number of candidate values. Example: if we see no progress, and cell 3B might be a 2 or might be a 3, then make two versions, one using 2 and one using 3, and proceed with both.

If a branch contains a “contradiction” – meaning an impossible combination of values – then it cannot be the solution to the puzzle, so remove it.

Use a process of elimination to remove candidate values that the rules make impossible, from all branches.

End Loop

I found that getting started with this felt complicated, but I was able to distill the code down gradually to a few statements. Admittedly, they are dense, and take some time to understand, but in all it didn’t end up being very much code at all.

Modeling the Puzzle

First, how to represent the puzzle space in SQL Server? A Sudoku puzzle looks like a table, but in fact it isn’t like a database table at all, because position is so important to the logic of the puzzle – in SQL the order of rows and columns is, by definition, undefined. For example, there’s no “last row” in a table. Further, in creating the solver I tried to stay in the spirit of relations and sets, and not resort to techniques like string manipulation or stuffing multiple values into the same column.

For me, it was simplest to think of the puzzle as several intersecting sets (“rows”, “columns,” “squares,” available digits) and make a table representing the sets instead of the physical layout of the Sudoku square.

The schema looks like this: a Solution table has columns for Value, Row, Column, and Square. Value is the digit 1-9 in the cells of the puzzle, and Row, Column and Square represent the position of the cell in the larger Sudoku square. Rows are numbered, columns assigned letters A-I and the squares are labeled S1-S9, reading left to right and top to bottom in the physical puzzle. Unfortunately, Row, Column and Square are all keywords in T-SQL, so I abbreviated them as Rw, Cl, Sq to keep the square brackets at bay in the code. Lastly, a Branch column is appended to allow the solver to make versions of the puzzle:

Because the square is implied by the row and column location in the puzzle, the Sq value is computed using a UDF, as:

CREATE FUNCTION dbo.SquareFor
(
@Rw tinyint, @Cl char(1)
)
RETURNS char(2)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @Result char(2)
SELECT @Result =
CASE
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'A' AND 'C' THEN 'S1'
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'D' AND 'F' THEN 'S2'
WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'G' AND 'I' THEN 'S3'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'A' AND 'C' THEN 'S4'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'D' AND 'F' THEN 'S5'
WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'G' AND 'I' THEN 'S6'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'A' AND 'C' THEN 'S7'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'D' AND 'F' THEN 'S8'
WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'G' AND 'I' THEN 'S9'
END
RETURN @Result
END

[These code samples are just to explain the solution and won’t run in the order presented, but all the code in a runnable script is attached to this post.]

The domains for allowed values, rows and columns are very useful to explicitly define, as we’ll see later, so I have four reference tables containing the possible digits, row, column and square labels, all following this pattern:

Given those objects, the rules of the game can be encoded with some foreign keys and unique constraints. Only one of each digit allowed per row, column and square, and only those values defined in the domain tables:

Note the addition of Branch in the indexes, so that we can store multiple versions of a puzzle, while still being constrained to the puzzle rules.

Now, how to populate the Solution table with a puzzle? In the code sample attached to this post, there are several example puzzles prebuilt, but this is the method. There’s a default on the Solutions table to make new inserts as Branch 1:

It’s even faster to enter the values and positions using SSMS > Edit Top 200 (gasp!) on the Solutions table. I also found it helpful to back the puzzles up with SELECT INTO as I was working on the solver, because entering them is definitely tedious. In the code sample there are several example puzzles that can be loaded like this:

One last bit of setup: even though a relational table doesn’t directly model the Sudoku square, requiring this other method to model the puzzle, it’s incredibly helpful to be able to view the puzzle like a Sudoku square. In fact, I found working on this was impossible without rendering the puzzle in the paper layout. The paper display can be presented using a pivot:

CREATE VIEW dbo.DisplaySolution AS
SELECT Branch, Rw,
MIN( CASE Cl WHEN 'A' THEN Value ELSE NULL END ) AS A,
MIN( CASE Cl WHEN 'B' THEN Value ELSE NULL END ) AS B,
MIN( CASE Cl WHEN 'C' THEN Value ELSE NULL END ) AS C,
MIN( CASE Cl WHEN 'D' THEN Value ELSE NULL END ) AS D,
MIN( CASE Cl WHEN 'E' THEN Value ELSE NULL END ) AS E,
MIN( CASE Cl WHEN 'F' THEN Value ELSE NULL END ) AS F,
MIN( CASE Cl WHEN 'G' THEN Value ELSE NULL END ) AS G,
MIN( CASE Cl WHEN 'H' THEN Value ELSE NULL END ) AS H,
MIN( CASE Cl WHEN 'I' THEN Value ELSE NULL END ) AS I
FROM dbo.Solution
GROUP BY Branch, Rw

Now we have all the setup required to make the solver work. Returning to the original algorithm above, we have a few discrete tasks that we can break out into procedures:

Deriving a table of all the possible candidate values for empty cells in the solution

Searching that candidates table for solved cells and pushing those into the solution

Eliminating candidates

Branching the puzzle into versions if we get stuck in a situation where the rules we have in #2 and #3 don’t work

Candidates

So, deriving all possible candidate values is fairly straightforward: if we cross join the rows, the columns and the digits 1-9, we get a derived table of every possibility for any puzzle. From that we can eliminate the candidate values from the cells that are already in our specific puzzle. We can also eliminate all the candidates that conflict with the values in the given puzzle, in the same row, column or square.

In order to make this manageable, I used a “bag” analogy. Imagine a bag full of tiles, like scrabble tiles, where there are 9 ones, 9 twos, 9 threes, etc. From that bag we know that 1 one will go in the first Sudoku square, 1 one into the second, one into the third, and so on. It’s possible to “label” the tiles ahead of time with the 9x9 square to which they will belong even though we don’t know which specific cell they will ultimately occupy.

So this code starts with a CTE called “bag” that has just this structure, made using a cross join of all digits and all squares. From the bag, we discard a bunch of the tiles for the already-occupied cells in the solution. Next, we take the remaining contents of the bag and use it to make a Candidates table, further discarding tiles that conflict with the already-solved cells in the solution by occupying the same row, column or square. It’s a great CTE exercise:

The result is that we have a Solution table with one version of the puzzle at its starting point, and a Candidates table that contains all the plausible values that might go into all empty cells in the puzzle. All the rows in both tables are identified as “Branch 1.” Going forward we’ll work back and forth between the two tables to solve the puzzle by process of elimination.

The procedure processes one branch at a time given by @branch, and outputs the number of cells that were solved with the rules in the code. The count will become important when we need to track “stalled” branches of the solution.

Elimination

The next chunk of the algorithm says, “OK, now that those cells’ values are known, what other candidates are eliminated.” This is sort of the other half of the solving logic – take the remaining candidates and try to eliminate as many as possible, with other solving techniques:

Alternating these two procedures – locating solved cells and pushing them into the solution, and then eliminating other candidates – will solve most of the “easy” and some “intermediate” level puzzles in books or Sudoku apps. The next challenge is what to do when these rules are exhausted and can’t narrow the field of candidate values. A mathematician might be able to add more solving methods to the handful I have here and make the algorithm solve ANY puzzle. I’m not quite that smart, so for some advanced puzzles I had to introduce branching, or live with the fact that some would not be solved.

(Technically I did a lot of work implementing one more advanced method, which was pages and pages of code – but I found that it was never invoked in any sample puzzles I tried, perhaps because the logic was already implied by one of the other statements above.)

Branch

Branching presents a few challenges: first, how to know when to branch. Second, some branches are by definition invalid, so how to we identify and cull those? Third, simply, what are the logistics to make branches? How?

The answer I came up with to the first is just to count how many cells are solved in each pass through the solver. If we execute a pass and we don’t see any new values being inserted, then the branch(es) we have in play are obviously stalled. When they stall, make a new branch.

Invalid branches are interesting. They fail in one of two ways: either there is a cell that has NO candidates, which ultimately is caught by the check above, or there is a contradiction that surfaces where, for example, a digit appears to belong in two places. The contradictions violate the rules of the puzzle, and because we encoded the rules of the puzzle in a series of unique constraints they become constraint violations in SQL Server. They actually cause the script to throw an error where the implied solved cells can’t be inserted into the solution because they violate a unique index. I decided just to roll with that idea: branches work until they throw an error, at which point they can be discarded from the process.

So, when we run the PushSolvedCells procedure, and @count shows we didn’t solve any cells, then we branch. Branching works by identifying a cell in Candidates that has a minimal number of possible values (practically always 2). We then rank the possible values for that cell, insert the #1 possibility into the existing branch and use the #2 and higher in new copy(ies) of the solution and all the candidates:

If you are still with me, wonderful! This is a bit of a marathon post, but we are near the end.

Solver

The last piece we need is just a “driver” script to loop through these procedures in the right order, with some flow control. The idea here follows the algorithm at the top of this post: run a loop, alternate inserting solved cells and removing candidates, until a solution surfaces. If the loop stalls, then branch, and if a branch fails, then delete it. The script is also decorated with statements to load and display the puzzle, to track progress, and a timer to see how long the solution took to produce:

Working sample code for the database and the solver script will be attached to this post if you want to try this out. It was authored on SQL Server 2008 R2. I’d welcome any and all observations, suggestions, improvements! Just leave notes below. Enjoy!

Over at erinstellato.com there is a great post this week about tips for new presenters. Ms. Stellato suggests, insightfully, that we record ourselves, which is really a fantastic piece of advice. What’s extra-cool is that today you don’t need any special equipment or expensive software to do just that. This week I “filmed” two run-throughs of my talk for SQL Saturday tomorrow. For me, the timing is the hardest thing – figuring out how much content I can really present in the time allowed without going way over, and how much time each demo or module should take – and filming myself was a great way to zero in on that.

What You Need

Getting over feeling awkward is the hard part. The tech is super simple:

Simple webcam software. Here I use Lync’s video conferencing window without a call. It’s also possible to use something like WebcamViewer. (The fact that this function is missing from out-of-the-box Windows 7 is a major blunder IMO.)

Simple Setup

First, start an Expression Encoder session of the type “live broadcasting project.” In Encoder, add the screen as a live source, and if you have more than one monitor, choose the one where your presentation will run. Set the audio source to your laptop’s mic.

Next, on the Encode tab, boost the resolution of the video output to something legible like 1024 x 768. We are not going for quality here, but it has to be good enough for you to evaluate yourself.

Set the Output tab to go to File Archive and choose an output folder for the video files.

Now set up the display for your presentation. If you are using PowerPoint, fire that up. Set up your demos.

Start ZoomIt. You do have ZoomIt, right? If not, stop now and go get ZoomIt. I’ll wait.

(It’s vital to practice setting up the presentation, surprisingly, because at the start of one you’ll be nervous, and fumbling with tech in an unfamiliar place, in front of people, and with a new and probably dicey projector to deal with. Its easy to miss something.)

Finally, start the web cam working and put it right on the screen with your presentation (bottom right corner works well). It helps if your camera viewer has a “keep on top” option.

Presenter, Record Thyself

That’s basically it. Start the Encoder, and pretend you are presenting to someone on Skype. Or your mom. Or the Greater Moose Factory Ontario virtual chapter of PASS. Whatever makes you feel less ridiculous. That awkward feeling lasts about ten minutes, and then you are just presenting away.

I owe my career to the SQL Server community, specifically the Internet SQL Server community, so this month’s T-SQL Tuesday is especially poignant. I changed careers “cold” about eight years ago, and, while I had some educational background in computer science, I had relatively little real-world DBA experience. Someone gave me a shot in the form of an entry level job, for which I am grateful, but I also had to make the argument to him that I would figure out whatever I needed to do to be successful in that position. Without the SQL community, that would never have been possible.

Here’s how:

Early on I used the heck out of traditional forums, especially places like SQL-Server-Performance.com, back when it was run by Brad McGehee, and SQLServerCentral.com. I read a lot after hours, Joe Celko and CJ Date for theory and Kalen Delaney’s phenomenal “Inside SQL Server 2000” first, then a stack of other titles. I read SQL Server Magazine. I read forms by the hour, just soaking up the issues other people were facing, and the solutions.

I still always keep Kalen’s book in a special spot near my desk. It’s in pretty good shape:

When, at my first DBA job, we really needed professional help wrangling an ISV, the internet community helped me find Ron Talmage and Joe Chang, when the company had no idea where to look for high caliber SQL Server consulting. They were amazing.

Once I got a little experience and knew at least one or two things, I started trying to give back. On the forums I ventured answers for some easy questions. (I still didn’t know much. Heck, I still don’t now.) Brad McGehee was kind enough to accept a couple of articles I wrote based on technical challenges from work – articles which lost their relevance years ago, but gave me some practice and some confidence writing. When StackOverflow arrived on the scene I switched to that pretty exclusively as a forum site, if only because I like the format and the UI so much.

When SQL-Server-Performance.com changed hands in 2006, I wanted to keep writing, and realized that blogging was probably the thing to do rather than formal articles. My writing is uneven, but the practice helps, and I find that forcing myself to compose an argument or explain a solution really helps to solidify my own knowledge. The old saying, “if you want to learn, then teach” really holds true. I was able to secure a spot blogging here at sqlblog.com in the summer of 2009 with Adam Machanic, and I still try to contribute odds and ends if I think they will be useful to someone out there on the interwebs.

Somewhere in there I reluctantly accepted the fact that I should try Twitter, and unfortunately or fortunately, depending on whether you ask my wife, I’m fairly addicted. #SQLHelp is a marvel. I have also begun attending submitting sessions to SQL Saturday events, which is both a blast socially, and incredibly helpful professionally.

That’s my history – I hope, if you are reading this from a position like mine, that you’ll consider doing two things: take advantage of the community resources out there, then, when you are able, pay it forward.

“Storage is just like gasoline,” said a fellow DBA at the office the other day.

This DBA, Mike is his name, is one of the smartest people I know, so I pressed him, in my subtle and erudite way, to elaborate.

“Um, whut?” I said.

“Yeah. Now that everything is shared – VMs or consolidated SQL Servers and shared storage – if you want to do a big project, like, say, drive to Vegas, you better fill the car with gas. Drive back and forth to work every day? Gas. Same for storage.”

This was a light-bulb-above-my-head moment.

Now that everything is consolidated onto shared infrastructure, all the way down to complete servers, the way we think about funding IT projects has to change too. It used to be that if you wanted to do a project, you would enumerate what the systems would cost, then price and go buy them. It was like this: this new project will need a bulldozer and an excavator, and maybe a Super-Zooper-Flooper-Do, let’s buy them for the project, and then they will arrive on a truck and we will install them, and the project will move forward. Many people are still thinking this way, but it’s now officially backward. We don’t buy discrete items for projects anymore, we buy a slice of shared infrastructure. And planning for that infrastructure has to change, or you will be, as many organizations are, forever, endlessly, exasperatingly short of it.

Gas Up Early

Imagine you and your friends are cruising down the road on a beautiful day, and someone decides you need to, simply MUST drive to Southern California. Do you at that point look around at each other and say “OK, who has gas money?” Perhaps. But hopefully not if you run a large business.

Worse, do you just start driving that direction, and when you get down to 1/8 of a tank, then ask everyone in the car? Again, maybe, but not too many people travel this way who are over 25. I think, anyway.

So the obvious question is, and I see this in many companies, why do we pile projects onto shared infrastructure like SAN storage and VM clusters without planning what infrastructure will be required to take us where we want to go? Answer: the organizations haven’t finished shifting their thinking. They think, hey presto, now we don’t need to buy those unique pieces of equipment any more, we just get “free” VMs and databases and storage from that magic bottomless pool. But that’s only the first stage. They haven’t realized yet, at an organizational level, who fills that pool of resources up, and how quickly, and how much it costs.

Watch the Gauge

Part of the difficulty is there’s no single “gas gauge” to tell an organization how the shared infrastructure is doing – you need some clever, forward thinking administrators to do that, and they, in turn need some tools. Further, it’s pretty hard today to estimate what “slice” of shared infrastructure a project will need, and how or whether to literally charge back for that resource. That means you have one arm making plans for all the places the organization will drive, with no idea how much gas is in the tank, and perhaps another arm with its eye on the fuel level, but which doesn’t know what the travel plans are. If you just start driving, at some point someone’s going to be standing by the side of the road with a thumb out and a gas can.

And here’s another gotcha – you can’t, from a practical point of view, keep on filling this tank one gallon at a time, while always near empty. It’s not safe or economical. Do you really want to buy disks or shared servers and try to install them monthly? Weekly?

So start thinking about your servers and storage as a commodity, and do it now. Try to get your organization to make this simple shift – we don’t buy pieces of equipment for projects anymore. We buy a platform, then estimate how much more of that platform we need for all upcoming work, and to sustain growth, then implement it.

I am happy to be presenting a session on beginning PowerShell for DBAs at my new home town’s SQL Saturday! (I moved from Seattle to Saint Paul, MN a short time ago.) I will be sharpening this presentation up to make sure anyone who comes will not go away empty handed.

BTW, WOW, the schedule is up and I must admit I did not expect nine tracks of awesome. This looks amazing.

My session is geared toward helping those DBAs who have not seen PowerShell, or perhaps may find PowerShell syntax opaque or confusing, get a conceptual understanding of how it works, in order to get past that first learning bump and into the useful stuff. I think that if you can write a Select statement, and you know in very basic terms how a query plan works, that the PowerShell pipeline is not really so different. Come and see why!

I haven’t posted for a while, but I hope that today I’ve got something really interesting to share.

Even though I love SQL Server, there are a few areas that have always sort of disappointed me:

People around the world have constantly had to implement and re-implement essentially the same patterns for database maintenance and backups.

Even though those patterns are well established, there isn’t much out of the box in SQL Server -- and I include Maintenance Plans and Central Management Server/Policy-Based Management -- that works well to implement them, especially across a large number of instances. There are some very good community-supplied solutions ( http://ola.hallengren.com/, http://sqlfool.com/2011/06/index-defrag-script-v4-1/ ) that help, which is great, but the popularity of those solutions also demonstrates that the gap in the product is real.

I personally found I was spending a lot of time setting up, checking on and troubleshooting basic maintenance and backup jobs, and after a while it becomes very repetitive work. We have about 2000 databases and a team that has ranged from one to four people.

Some simple examples:

You want all databases on all production instances to back up nightly, in sequence, during a maintenance window. If a new database is added, it should also start being backed up. Old files should be removed in some safe, reasonable fashion.

You want log backups for any databases that are in full recovery mode, on any instance, to happen at least every 15 minutes, and more often if the log files are filling up.

You want any indexes that are fragmented, and only those that are fragmented, to be rebuilt or defragged nightly during a defined maintenance window.

You want all databases on all instances to be verified with CheckDB nightly or weekly, in sequence, at a specified time.

Basically anybody who manages small to mid-size databases has these rules about maintenance, and they are always very similar, but we have to go around and set up many, maybe hundreds, of jobs and scripts to implement those rules. Editing the jobs when the rules, or perhaps the deployed databases, change is really tedious.

Solution: Rules-Driven Maintenance

I came up with a system to help manage this with less manual effort. The system consists of tables containing preferences that describe how maintenance and backups should operate, and some stored procedures and functions that implement those preferences in an automated, instance-wide maintenance routine. I was thinking of calling this “Policy-Based Maintenance,” but I don’t want it to be confused with the SQL Server feature Policy-Based Management (PBM) because it’s not really the same animal. (I like PBM, but don’t get me started about how the SQL Server team forgot to finish that feature before moving on to other things...)

Here is a brief synopsis of how the system is implemented, using full backups as an example:

There’s a policy table called FullBackupDefaults that defines the basic preferences about when and where full backups should be taken.

There’s a second table called FullBackupOverrides where it’s possible to make exceptions to the default settings for specific databases.

A table-valued function compares those preferences to the state of the databases on the instance including their backup history, and generates a “work queue” – a list of databases to back up – for the current time.

A SQL Agent job is spawned that processes that work queue, iterating through the listed databases and backing them up.

When the spawned job completes successfully, it is deleted.

If a new database is added to the instance, it’ll automatically become part of this process, using the default settings. If you have a database that needs special treatment, let’s say weekly full backups instead of nightly full backups, then you can make an entry in the FullBackupOverrides table that describes the different policy to apply to the exception database.

Using that general idea, I created similar maintenance code for both data (full and differential) and log backups, index maintenance, statistics maintenance and integrity checks. Each type of activity has its own pair of preference tables following the same design as the example above. With this solution, I can deploy maintenance across a large number of SQL Server instances, and set preferences for how it should act, rather than making and maintaining detailed SQL Agent jobs or maintenance plans, which would be hard to administer centrally. The maintenance will handle new databases fairly seamlessly, so there is no need to go back and constantly check the servers, other than to run some PBM rules against them from the Central Management Server to be sure things are healthy.

What about the existing community solutions?

They are great, and have some better and different features than this solution. The one feature that I think distinguishes mine is just the implementation of preference tables to drive all the maintenance activity.

Disclaimers

The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even imagine using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: http://opensource.org/licenses/GPL-3.0. You may use and modify this code, but not sell it.

This has been run on SQL Server versions from 2005 to 2008 R2. It might work on 2012, but I have not tested it.

I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.

There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.

If you haven’t done index maintenance for a while, and you flip this solution on you might blow up your transaction log. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of Standard Edition of SQL Server and the preference for Online index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.

Lastly, this is a solution for the type of environment that has many small databases, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.

Walkthrough

OK, with all the fine print out of the way, here’s how you can try this code. You will need these prerequisites:

PowerShell 2 and SSMS / SMO installed on the workstation from which you will deploy the solution. The deployment is handled by a PowerShell 2 script that uses invoke-sqlcmd. Before running it, you’ll need to have the SQL Server Snap-ins for PowerShell installed and working. The solution itself, after deployment, does not require PowerShell on the servers where it is deployed.

The full backup procedure in this solution calls a file delete procedure that I have implemented as a CLR stored procedure. You’ll need either mine, or you’ll need to substitute your own, or comment out the calls to that procedure and handle deletes some other way. The one I use is available here.

Windows-authenticated sysadmin access to your SQL Server environment. The deployment script uses windows auth and your own identity. It could be edited to support SQL auth if necessary, but it does not by default.

To install on your test server

Download and unpack the code into a convenient folder.

Read and edit Deploy.ps1, entering the name of your test server, and the location of the deployment SQL scripts, according to the instructions in the file. The deployment script can push the solution out to one or to multiple SQL Server instances.

Execute your edited Deploy.ps1 to install the solution to the specified server.

The deployment process will populate an admin database on the SQL instance with the policy tables, stored procedures and functions, and will add five SQL Agent jobs to the instance. You can use your own admin database or allow the script to create one for you.

There is one SQL Agent job for each type of activity required for maintenance:

Policy-Based Data Backups

Policy-Based Log Backups

Policy-Based Integrity Checks

Policy-Based Index Maintenance

Policy-Based Stats Maintenance

Each job typically works like this:

On the job schedule, each job will check to see if it’s time to do any work. (For example, is it now the start of the backup maintenance window for any database(s)?) The work, if any, is listed using a table-valued function that takes the current time as a parameter.

If there is work to be done according to the maintenance preferences, then the system will spawn a new SQL Agent job, specifically for that work. Example: Run index maintenance as of Feb 5, 2012 at 21:00.

The newly spawned SQL Agent job will iterate through the work queue for that time. If the job is successful, it will be automatically deleted. If there is an error, it will remain on the server for troubleshooting, or for monitoring software to alert on.

Fix This One Broken Thing

In the Full Backups stored procedure, there are calls to a proc called DeleteAgedBackupFiles. You, most likely, don’t have said procedure :-). You can either remove this call, substitute your own, or compile the CLR code and install it from Prereq #2 above. I didn’t do this to be mean – this is a cheap trick to make you, gentle reader, not install in production until you have read and understood the code.

Setting Preferences and Policies

The rules driving the maintenance jobs are stored in a few simple tables. There are views over those tables that give a summary of what preferences are in effect:

SELECT * FROM dbo.FullBackupPolicies

SELECT * FROM dbo.DiffBackupPolicies

SELECT * FROM dbo.LogBackupPolicies

SELECT * FROM dbo.IndMaintPolicies

SELECT * FROM dbo.StatsMaintPolicies

SELECT * FROM dbo.IntegCheckPolicies

Behind each view is a pair of tables, one for default settings that apply across the whole instance, and a second for database-specific overrides or exceptions to those defaults. To adjust the behavior or the timing of the different kinds of maintenance, just adjust the preference values in those underlying tables.

Some examples:

Q. How do I set the time for my integrity checks window for all databases?

A. In the table dbo.IntegCheckDefaults set the value for WindowStartTime to the desired time, as 24-hour time, format hh:mi.

Q. How would I limit integrity checks for one larger database to only Saturday nights?

A. In the table dbo.IntegCheckOverrides, make a row with the database name, and set the value for all days of the week, except Saturday, to 0. Set Saturday to 1.

Q. How do I make one larger database use differential backups nightly, with one weekly full backup, while the other databases get a nightly full?

A. In the table dbo.FullBackupOverrides, make a row for the large database. Set all the values for days of the week to 0 except the day you want the full backup. In the table dbo.DiffBackupOverrides, make a row for the database and set all the days of the week to 1 except the day you want the full backup.

Q. How can I make the system ignore backups for one large database that I explicitly don’t want to back up?

A. In the table dbo.FullBackupOverrides, add a row for the database where Exclude = 1.

This same pattern should work to set any and all preferences.

Turning On Automated Maintenance

The deployment script will install the SQL Agent jobs as “disabled,” as a safety net. When you are ready to begin using the automated maintenance, just enable the five SQL Agent jobs.

With the jobs running, watch the SQL Error Log periodically, and the file system, to be sure you are getting all the maintenance activities when you expect to, and that the agent jobs are not failing.

Feedback

I would LOVE feedback on this solution and ideas about how to make it better. If you try it, please come back here and let me know how it went.

(F.) A. Q.

Here are some details that may be helpful:

Q. What are the available preferences?

WindowStartTime

All maintenance activities except log backups have a specified time; this is the start time for that window, daily.

WindowDurationMins

Index maintenance can be limited to a set duration past which new rebuilds will not start. This value is the length of the window. (A rebuild in progress, however, will run to completion.)

Sunday - Saturday

Days of the week on which to perform the maintenance activity (Bit columns)

MaxIntervalHours (Minutes)

For backups, the maximum length of time between backups. For example, you can set a rule that says a full backup should be taken at least every 24 hours, or a log backup at least every 15 minutes.

MaxLogPercentUsed

The limit of log percent full that will trigger a log backup. The frequency of log backups is governed both by MaxIntervalMinutes and by MaxLogPercentUsed.

Path

The path for file output. If NULL, the default path set at the instance level will be used.

ThirdPartyBackup

Optional third party backup. Only Litespeed is implemented. Can be NULL (implies Native), ‘Native’ or ‘Litespeed’

UseDifferentialBackup

Bit indicating that differentials should be used.

FullBackupSequencesToRetain

For the backup cleanup process, indicates how many sets of Full, Differential and Log backups to keep on disk.

DeleteOnlyArchivedFiles

For the backup cleanup process indicates whether to delete files where the “ready to archive” attribute is set in the OS. (Can prevent deleting files that have not gone to tape.)

DatabaseName

For override tables, the name of the database where the overrides apply.

Exclude

Ignore/skip the database (Bit)

Online

For index maintenance, try to keep the indexes online (1) or do not try (0). For Enterprise Edition, Online = 1 will cause online index rebuild; for Standard Edition, Online = 1 will cause reorganize instead of rebuild. Watch the log file size and backup disk capacity if using defrag!

DefragThreshold

When examining indexes for fragmentation, the threshold for percent fragmentation that will cause index reorganize.

RebuildThreshold

When examining indexes for fragmentation, the threshold for percent fragmentation that may cause rebuild (subject to the Online preference and SQL Server edition as described above)

Q. Why does the log backup policy job never stop?

A. The log backup policy job is implemented as a loop, with a one minute delay, so as not to flood the job history with executions on a one-minute interval. It runs every minute to keep up with log files that might be filling up any any given time. It doesn’t do work every minute; a backup is only executed for a given database when MaxIntervalMinutes or MaxLogPercentUsed is exceeded.

Q. I set my WindowStartTime to 03:15 or 11:45 and now I don’t see any maintenance/backups? Huh?

A. The Policy jobs can run on any interval, at any time, but at least one execution has to “line up,” within a few minutes, with the WindowStartTime. That is, if the policy job runs every hour, on the hour, but your window is set to start at 15 minutes after the hour, the job will never find work to do. Set the job schedules to align with the corresponding WindowStartTime.

Q. Why did the index maintenance job fill my Standard Edition log?

A. See if you have the combination Standard Edition and Online = 1 for index maintenance. That will cause indexes to be maintained using Reorganize, which is very log-intensive. You can: a. watch and wait, because the level of fragmentation, and so the amount of log generated, might be reduced with more regular maintenance depending on the database. That is, it might be a temporary issue. b. Turn the Online preference off, if your application is not 24x7 and you can have blocking during a maintenance window. c. Get the checkbook and perform an edition upgrade to Enterprise.

Q. How the heck to these crazy overrides work?

A. The design of the overrides is: for each database, for every value in the preferences, IF the override table has a non-null value, use it. If the override table has no value or the value is NULL then use the corresponding default value. This means that when you add a row to an overrides table, you only have to fill in the specific values that are different from the defaults, and leave the rest as NULL. The NULLs will “inherit” the default values from the defaults table at run time.

Q. What happens if a job fails?

A. Failure typically will result in an extra job (or many extra jobs) showing in SQL Agent Job Activity Monitor, with a name that indicates the date and time that the job was intended to run. The job history can be used to troubleshoot what happened. Successful maintenance jobs will delete at the end of their run, but failed jobs will remain for troubleshooting.

Q. How can I see what index maintenance was done?

A. Select from the TrackFragmentation table

Q. Does this work with:

Log Shipping

Yes. Log backups should work by firing the log backup job in SQL Agent instead of running a conventional log backup. You may elect to disable the log shipping log backup job’s schedule (but don’t delete the job) to prevent collisions between the schedule starting the job and this solution starting the job.

Mirroring

Yes.

Replication

Probably. I don’t use replication much, so it has not been tested.

Clustering

Yes.

Named Instances

Yes. For multiple instances on the same machine, consider staggering the WindowStartTime values on each instance.

2012

I hope to find out soon. Looks promising.

2000

Um … no.

VLDB

Not really. This solution targets many small, simple databases. It does operations like sp_updatestats that might not be appropriate for a big system.

Partitioning

Yes, there is automatic detection and index maintenance for partitioning on Enterprise Edition

Q. I have an admin database already. Can I use it?

A. Yes. Check for name collisions for objects in the database, and then just enter your preferred admin database name in the Deploy PowerShell script.

Q. The SQL scripts don’t include “Use <database>?”

A. True. The T-SQL scripts are designed to be used with a database that has the name of your choosing. (In my shop, all the admin databases have unique names per instance, by design.) The PowerShell deployment script handles the database name. If you run the scripts from SSMS, be sure to set the database manually.

Q. The deployment script croaked in the middle. Can I rerun it?

A. Yes, the deployment scripts are all designed with “If not exists…” and can be rerun if necessary.

Q. I have tested the solution and I want to deploy on multiple instances. How?

A. The Deploy.ps1 file can use a list of SQL Server instances to deploy across multiple servers if desired. Keep in mind that the Policy agent jobs are disabled by default when deployed and will need to be switched on.

Q. Why is WindowStartTime implemented as text! You Hack! :-)

A. SQL Server 2005

Note: I’ve published some updates to this code, so please grab the latest version here.

Today’s T-SQL Tuesday, hosted by Aaron Nelson (@sqlvariant | sqlvariant.com) has the theme Logging. I was a little pressed for time today to pull this post together, so this will be short and sweet. For a long time, I wondered why and how a database in Full Recovery Mode, which you’d expect to have an ever-growing log -- as all changes are written to the log file -- could in fact have a log usage pattern that looks like this:

This graph shows the Percent Log Used (bold, red) and the Log File(s) Used Size KB (light, green) counters for a demo database. The database was in Full Recovery mode the whole time this graph was generated, and there were no log backups taken. It seems, naively, that the graph should just climb ever upward as changes are added. How could it possibly drop and then start to climb again? Mysterious.

>> If I had time and wit, this is where I would do some clever Scooby Doo or Nancy Drew thing. Not tonight, I’m afraid. <<

I learned the answer in an amazing session with SQL Skills, during their IE1 class. The answer is that the amount of space counted as “used” in the log includes the log records physically present in the log file and in use (for full recovery, mirroring, replication, etc.) plus the amount of space that would be required to add rollback records, which do not actually exist, for all inflight transactions.

When you make a change to a database, the change is recorded first, obviously, as a series of transaction log records that can be played forward. In the event of a crash, when recovery runs, those records are replayed forward to recreate the changes, as one stage of the recovery process. But if a transaction is incomplete, somehow the instructions need to be present to undo and reverse out the changes made in that transaction. The way that works is that at the time of rollback, the reverse operations from the original log records are added to the log, then those reversal records are also played forward, which “undoes” all the changes. The reversal records do not exist, though, unless a rollback is actually invoked.

If the server ran out of space in the log file to append these rollback records, obviously there would be a major problem, and we might not be able to recover the database. In order to prevent that condition, SQL Server will reserve the space necessary in the log file(s) and have it appear “used” until a given transaction commits, at which point the crash recovery space for these hypothetical reversal records is no longer a requirement.

Here’s a demo script if you want to see this in action. Note that it takes a fair amount of disk space:

Here’s a handy trick – if you have a SQL script, stored as a text file, and need to run it against an arbitrary list of different databases on different SQL Server instances, try a little PowerShell. There are a lot of ways to accomplish this task, but personally I like this one for its flexibility.

First, make sure the SQL script does NOT include the common USE <database>. Generally that statement is your friend, but not in cases where a single script should work against multiple databases having different names.

Next, make a “collection of collections” in PowerShell, which will act as a list of instances and databases on those instances. Example:

Each item in $instances is a mini-collection, and each of those collections has two elements: an instance name and a database name, which will be available using indexes [0] and [1].

If you prefer, it’s also possible to pull these values from a text file using Get-Content, and split each line on some delimiter character.

Next, load the content of your SQL script file into a variable. There’s an important caveat here: we have to load it as a single string, not an array of strings. The default behavior of Get-Content, though, is to split a file on line terminators and make each line into a separate object in a collection of strings.

There are a few ways to accomplish this, but I learned a simple one in a forum from Alex K. Angelopoulos – we can direct Get-Content to split on a character that doesn’t actually exist in the file. This makes the whole content of the file “look like” one line to Get-Content. He suggests the Null character, which in PowerShell is `0 (back tick zero). Note that the Null character is not the same or related to $null.

Finally, loop over the list of ( instances + databases ) to run the script everywhere. Be cautious and test!

$instances | ForEach {
# This should print the instance and database from your list, which is handy for troubleshooting:
Write-Host $_[0] $_[1]
# This sort of thing can be used to validate that you are connecting to the right databases:
Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query 'Select @@servername, db_name()'
# Finally, this would execute the script:
Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query $deployScript
}

This is lame. Still, here goes: I need, increasingly, to author both PowerShell and SQL Server scripts, bundle them into a solution and store that in TFS. Usually the PowerShell scripts are very closely related to SQL Server, and have a lot of SQL in them. I am hopeful that 2012 SSDT, or the tighter integration of SSMS and Visual Studio in 2012, might help put all of this in one place, but for now I am stuck in SSMS 2008 R2. So here are my blunt attempts to marry these activities.

(This post is rather like the scene at the end of MythBusters, when the explosion didn’t really work, so they just put a lot of C4 under the <whatever is being used> and set it off.)

Next I made a new SQL Server script solution in SSMS. I then right-clicked on it in the Solution Explorer and used the context menu to add it to source control.

Make a PowerShell Script by Force

Now for the PowerShell “integration.” In SSMS, in a solution, the “Miscellaneous” folder can contain any sort of files you like. So I right-clicked on the project, chose Add New Item. In the Add New Item dialog, I picked a template at random and forced the file name to myPowerShellScript.ps1. This makes the file, in the Miscellaneous folder (admittedly with a few lines of unwanted SQL code in it):

Create an “External Tools” entry for PowerShell ISE

SSMS has long had the ability to launch an external program from a menu command. I hadn’t used this much in the past (never really saw the point) but in this case it works nicely. I clicked Tools > External Tools… and created a new entry for the PowerShell ISE:

In Arguments and Initial Directory, it’s possible to put $()-style variables that are sensitive to the context in SSMS, in this case to pass the file selected in the Solution Explorer as an argument to launch the PowerShell ISE – basically allowing me to open a .ps1 file, in the right app, from inside SSMS.

At this point, it’s possible to create a source-controlled PowerShell script in the SSMS solution, to open and edit it with the PowerShell ISE, and to check it in to TFS. Huzzah!

Option: TFS Windows Shell Extension

That’s pretty grand, I think, but as an added option, it can be handy to check out a PowerShell script, make a change, and then check it back in, without using SSMS. That’s possible using a little Team System widget called the Windows Shell Extension, which is bundled in this TFS Power Tools download: http://www.microsoft.com/download/en/details.aspx?id=15836.

Note: this installer, by default, installs a bunch of stuff you don’t want, and not the thing you do, for this case, so use the Custom option in the installer and select the Windows Shell Extension feature.

With the Shell Extension, you can see the source control state of files right in Windows Explorer, with a little green icon overlay. There is a right-click context menu to check files in and out.

Adding this widget means I can check a file out in right in Windows, edit with PowerShell ISE, and then check it back in – great for those PoSh scripts that don’t have a lot of SQL dependencies.

It’s not pretty, but it works. And, provided you have a legal TFS license, by my count the cost comes to … $0!