Microsoft's 'Katmai' gives both big shops and small shops reasons to make the upgrade

SQL Server 2008, aka "Katmai," gives SQL Server shops plenty of reasons to get excited. The best SQL Server release to date, it sports more nice new features than you can count, and the improvements extend to both performance and manageability. In a few cases, such as the Resource Governor, you'll wish Microsoft had taken the functionality a little further. But whether you manage an OLTP environment, or an OLAP environment, or both, you will most likely find Katmai compelling. It easily passes my own five-point test for upgrades.

My five-point test requires a new release to bring at least five significant improvements to my environment, or it’s not worth upgrading. Each improvement has to change my life in a significant way, either by dramatically shortening the time to do common tasks or by allowing me to do something I couldn't do before. For my environment, which is a large data warehouse, these five Katmai features easily pass the test: Change Data Capture, Lookup Cache, Data Compression, PowerShell integration, and Policy-Based Management. And I could easily keep counting.

Change Data Capture and Lookup Cache will be popular with DBAs who want to speed up ETL (Extract, Transformation, and Load) processes, as will the pipeline improvements in Integration Services, enabling it to push data much faster. Data Compression, PowerShell integration, and Policy-Based Management, not to mention Server Groups, should make a big splash in almost any environment. Just keep in mind that the benefits you get from Data Compression (which works at the page level, replacing repeating data with lookup pointers) will depend on the nature of the data in your table and how it's ordered.

Backup Compression is also new in SQL Server 2008. Although the compression itself works well, the feature has too many limitations (including supporting only Enterprise Edition) to be effective in an enterprise setting. Another compression feature, called Sparse Columns, allows you to store nulls without taking up physical space. Sparse Columns will come in handy for large tables containing many null values. It's just too bad you can't use Sparse Columns and Data Compression on the same table.

Policy-Based Management allows you to define almost any configuration or administration policy you might think of for any number of servers, and be alerted whenever those policies are violated. Although spanking new in Katmai, Policy-Based Management already seems mature.

The inclusion of PowerShell will reinvent the way DBAs manage their environments by taking complicated cursors out of management scenarios. This is the debut of PowerShell in SQL Server, and there are some rough spots. But I expect they'll be ironed out soon enough. I've even heard rumors that PowerShell will eventually replace sqlcmd as the command line interface.

Policy-Based Management and PowerShell integration are the biggies, but Management Studio has added some other nice features, such as Server Groups, which allows you to run queries against multiple servers at once. However, I’m not fond of the new Activity Monitor. It may be a step in the right direction, but it just isn't useful in its current state.

Another new feature that isn't quite there yet is the ballyhooed Resource Governor. The Resource Governor lets you define limits on CPU and memory usage for certain workloads. This is good enough to prevent some traffic jams -- like query processes consuming too many resources on an OLTP server, for example -- but it's a far cry from what's needed to define and isolate rogue queries.

Improved managementIf there's any one new feature in SQL Server 2008 that will change the way DBAs manage their environments, it's probably Server Groups. Server groups let you run any query you like against an entire group of servers simultaneously. So instead of having to cycle through all of your servers to check job status, deploy stored procedures, and so on, you can manage or push code to any number of servers with a single query. On top of that, Server Groups can be further extended and enhanced to do some great things. It does take a paradigm shift, however. DBAs will have to start thinking in terms of groups instead of single servers.

SSMS (SQL Server Management Studio) also has much better object-level stats than ever before. This is a perfectly executed feature, and I love it. In the object details pane, you can now click on an object folder and get info for all objects in that folder. For example, click the Databases folder to immediately see the key info on all the databases on the server, including size, space used, owner, index space used, and many more. This is an incredibly useful list because you can sort it and reorder the columns and your customizations persist. You can get detailed info on all of your tables too. In this new detail view, you can see row count, data and index usage, schema, and so on. Not only is it very easy to view object-level stats, but this level of info is available for all folders, not just databases and tables. This is one feature that I wouldn't change in any way.

One of the most important features included in this release is PowerShell. PowerShell is going to change how DBAs manage their environment, because it allows you to do complicated things so much easier than in Visual Basic or T-SQL.

Let's take an example of scripting databases. You can script a database pretty easily in SSMS, but you can't schedule it. Until now, if you want to schedule a script or code the scripting process to perform the exact same options every time without having to click through the wizard or worry about making mistakes, all you could do was code a solution in VB and compile it. But not everyone has VB skills, and writing SMO (SQL Server Management Objects) is neither easy nor intuitive. PowerShell allows you to script a database in just 21 characters of code and schedule it inside an agent job. You don't have to compile it, so you don't have to keep up with separate source code. PowerShell also makes it a lot easier to work with multiple objects -- and multiple servers -- in SQL Server.

Another good example is adding user permissions to all the schemas in a database. You can code this in T-SQL, but T-SQL requires the inclusion of unsafe dynamic SQL code inside a cursor. You can accomplish the same thing in PowerShell with just a couple of lines of code, and without introducing any unsafe constructs into your environment.

Pretty blinkiesSSMS has a new Activity Monitor that provides information DBAs formerly had to gather from multiple sources. At the top is a series of live graphs where you can see useful stats such as CPU, waiting tasks, and batch requests. Under the graphs, you can see detailed information on processes, resource waits, recent expensive queries, and data file I/O. In fact, the Activity Monitor looks exactly like the resource overview from Vista.

So the Activity Monitor is presentable enough, but using it is a mixed bag. If you want just the top-level server stats from the graphs, then you won't be disappointed. The graphs load fast enough and the information is useful. The trouble begins when you start digging into the drill-down information. For starters, on a really busy system it takes too long for these drill-down areas to populate. I tested this on a 64-bit data warehouse system with 1,500 reports running and a few dozen more ad-hoc queries going on. The drill-downs took so long to populate -- more than five minutes -- that I thought they weren't working. That's not what you need in a troubleshooting scenario.

Second, much of the information you'd expect isn't there. When the drill-down info did come up, I immediately saw that my server was maintaining about 67 percent CPU -- so far, so good. But the next pieces of information you would want -- namely, memory usage and disk usage -- aren't in the Activity Monitor at all. You can see disk throughput, but that's not really going to tell you if you're having disk problems.

In short, while there are some useful server-level stats, they're incomplete enough to render them pointless. If I have to go to another tool for my memory and disk stats, then I might as well skip the Activity Monitor and use perfmon (or other tool) to begin with.

The processes, resource waits, and data file I/O drilldowns are more useful. The resource waits monitor shows all of the server-level resource wait types and gives you stats about each one. Similarly, data file I/O puts all of the important info at your fingertips, though it does take forever to load.

Processes is really just the same sys.sysprocesses table that DBAs are used to, but with a GUI that makes it a little more user-friendly than a manual query. You can easily filter data that you want to see, which makes for easy troubleshooting, especially when you're looking for a specific user or blocking process. One beef: Although you can rearrange the columns in the display, the order doesn't persist, so the next time you open the Activity Monitor, you'll be arranging your columns again.

The recent expensive queries drill-down is almost completely pointless. Does it give you the most recent expensive queries? Yes, it does. Does it give you important stats on these queries such as CPU, number of reads and writes, average duration, and plan count? Absolutely. Does it tie these stats to a user so that you can tell who or what is performing these offending actions? No. What is a DBA supposed to do with that?

One thing you get from the GUI drill-downs that you don't get from their manual counterparts is auto-refresh. You can configure your refresh rate, but here again, your changes don't persist. A bigger issue is being able to collect the performance data you want within the refresh period. This can be a problem on busy systems that collect a lot of performance data.

For example, if you have dozens of data files on your system, it could take longer than the 10-second default refresh rate to pull the data file I/O data. And while you can increase the refresh interval to, say, one minute, the CPU graph will be refreshed only once per minute as well. Unfortunately, you can't set different refresh intervals for different monitors. It's one size fits all.

For all that it tries to be, I have to give the Activity Monitor a thumbs-down, or at least a thumbs-sideways. In its current form, it's just not useful enough to do DBAs much good. Not only did it take too long to give me the info I needed, but the info was so sketchy that I needed another tool to complete it. If I'm going to look at CPU, I'm immediately going to want to see memory and disk. And as long as I have to go somewhere else for those other metrics, I might as well get CPU there too. Sorry Microsoft, better luck next time.

Resource GovernorMicrosoft touts the new Resource Governor as one of the biggest enhancements in this release, and there's no question it will come in handy for shops that want to keep some processes from interfering with others. For example, one of the best fits for a resource governor would be on an OLTP system where you're forced to run reports against live OLTP data. You don't want the reports to get in the way of transactions, so you would push them to a group that caps their resources.

Still, this is Microsoft's first stab at granular resource management, and while it's a nice start, it's not going to be as immediately useful as many shops hope. First of all, you can specify only two resource measures to define a resource group: CPU and memory. You can specify min and max values for each one, but that's not nearly enough to define a rogue process on a busy system. Rogue processes take many forms, and capturing them requires additional metrics (number of I/Os, amount of time, etc.) beyond CPU and memory usage.

Further, once you define a process to be in a particular resource group, it stays there. Misbehaving queries cannot be moved dynamically to a different group, as you can do in Oracle Database. In order to assign a query to a different resource group, you would have to kill the process, assign it to the group, and then restart the Resource Governor. Something tells me you won't want to do that often on a really busy system.

Finally, you have to be careful with how you set up your groups. For example, if you have three groups and they all have a maximum 50 percent of the memory defined, then there's obviously going to be a problem. So there have to be rules that define how much everyone gets; you just have to be careful about how you define your groups and monitor them to make sure they're giving you what you want.