Category Archives: SQL Server

Post navigation

One of our production database clusters has developed a nasty habit of losing network connectivity with no error messages or indicators of why, necessitating 1am drives to the NOC by a network admin to growl at it and scramblings to put the system it backs into emergency maintenance mode. The second time this happened, the operations team quite sensibly requested that I move all the databases to another server so that they could diagnose this one.

We had just transitioned our other production cluster from a set of servers attached to an MSA 1000 to a new set of servers attached to an EVA 8100, and the MSA and old servers were still hanging around. It was a matter of a few hours to coordinate offline time (thankfully, this database cluster housed non-24-hour-available systems), and agitate at developers until they changed connection strings and deployed code.

All went smoothly except that my job monitor — a job that runs every two hours and emails me about jobs which have failed or taken longer than usual in the meantime — failed. When it went to send email, it complained that service broker needed to be enabled for the database (no mention of which database, but I assumed msdb). I nipped over the BOL, found the appropriate command, and ran it. Twenty minutes later, it was still running. I killed it and started it again. Same hanging. I hopefully ran the command in the database that houses the job monitor sproc. It finished immediately, but it didn’t make mail work.

Eventually, it occurred to me that the SQL Server Agent, with its dirty little fingers constantly in msdb, could be causing a database configuration change to wait indefinitely for some sort of exclusive access. I stopped SQL Server Agent, ran the command to enable service broker, and it returned immediately. Yay!

When you change servers and restore msdb, you may find your maintenance plans failing and putting the following messages into various logs:

The SSIS subsystem failed to load

Subsystem could not be loaded

The job has been suspended

The specified module could not be found

The problem is most likely that the location of your SQL Server installation directory differs from that of the old server. Ours happened to be on a different disk drive. SQL Server keeps a table of pointers to the SSIS DLLs in msdb, and you need to flush this table and repopulate it with the new locations (KB article 914171). This is very simple.

To verify that this is in fact the problem,

SELECT * FROM msdb.dbo.syssubsystems

If the directory pointed to by the rows doesn’t exist, you’ve found the problem! Two quick queries will fix you right up:

-- Delete the existing rows. DELETE FROM msdb.dbo.syssubsystems -- Fill the table with new rows pointing to the proper location of the DLLs. EXEC msdb.dbo.sp_verify_subsystems 1

And finally, you need to restart SQL Server Agent for it to pick up the new table entries and unsuspend the SSIS jobs.

One More Thing

If you’ve changed servers and have SSIS packages (like maintenance plans), there’s another thing you most likely need to do. Each package/plan has at least one database connection, and they’re probably pointing to the old server. You need to modify each package/plan individually, changing the existing connection if you can (I couldn’t — all of the input boxes were disabled) or creating a new connection and changing all of the steps to use it. I’ve never been a user of DTS packages — too rickety and GUI-driven for me — but I can see why no one likes the direction Microsoft went with turning maintenance plans into DTS packages in SQL Server 2005.

We migrated our production database cluster from an MSA 1000 to an EVA 8100 with new servers this morning. (This is much easier than in-place upgrades where you take down the system, wipe the cluster nodes and rebuild them on the spot if you can swing it.) It went fairly smoothly, though when I went to compare the amount of time the EVA took to make full backups of all the databases (8 minutes) to the time it took the MSA (21 minutes), I found that even though I’d restored msdb from the previous server, there was no job history. Apparently SQL Agent doesn’t take kindly to foreign job history and zorched it as soon as the new server ran its first job. I restored a copy of msdb, shut down SQL Agent and sucked the contents of sysjobhistory back in from the copy, changing the server column to the new server’s name (and having to bump up the value of instance_id since the new server’s SQL Agent had already made a few records), and the transplant took.

Recently, our tempdb log went completely bonkers, growing from its normal 30 GB to fill its 68 GB drive (with all the subsequent freakings-out by SQL Server and required service bouncings which you would anticipate). While glaring at job durations and recently-released builds, I threw a couple of small jobs into SQL Server to monitor tempdb data and log usage so it could page me if it decided to grow the tempdb log excessively and so that I could see when growth started and what it might correlate to.

I made two small tables and a sproc to fill each and scheduled the sprocs to run every 10 minutes. I found two interesting things.

Our tempdb data files could be much smaller. They’re currently 20 GB, and we have 8 of them, as we’re using the T1118 tempdb concurrency enhancement at the direction of Microsoft (it’s ostensibly for 2000 only, but we’re running 2005 and they have us using it), and they top out at 1.8 GB of data each.

The tempdb checkpoint interval, which as far as I can tell I have no control over, as it has nothing to do with the server-wide recovery interval, appears to be 2-3 hours during the week and 6-7 hours on the weekend. When I graphed the log usage in Excel, the amount of data in the log very clearly repeatedly grew steadily to a point (13 GB during the week, 19 GB on the weekend), then plummeted to 0, then grew steadily back to that same point.

If you haven’t changed your tempdb file names, you can use a CHAR(7) in tempdbDataSize. Since we have 8, I renamed the first one from “tempdev” to “tempdev1” for consistency and numbered them (tempdev1 – tempdev8). We have a busy system, so I’m recording my tempdb log sizes in gigabytes. You may need to use megabytes instead, in which case you’ll want to kick those TINYINTs up to SMALLINTs and remove the division by 1024 from the statement (below) which audits the log usage.

I found the code to get the file usage by turning on SQL Profiler and then opening the Shrink Files pane in SSMS (remember not to click OK in this pane if you try this yourself — shrinking tempdb while people are using it is Supremely Bad Idea).

The sendEmail sproc is my custom wrapper for sp_send_dbmail — you can just drop in a sp_send_dbmail call.

Diagnosis

My findings haven’t been particularly useful in diagnosing the mysterious tempdb log growth. As a bandaid to prevent further system unavailability, I changed my sproc that rebuilds indexes overnight to do its sorts in the user database rather than in tempdb. This causes them to take a little longer, as the happy parallelism of having the index and the rebuild spread across multiple physical disks is lost, but the tempdb log has behaved since. One of the first things I checked for was recent large growth in table size (you record rowcounts daily so that you can see growth over time, right?), figuring this would correspond to large growth in index size and thus tempdb log usage during rebuilds, and there wasn’t any, so this isn’t particularly helpful.

You might expect that you could add full-text search to an existing SQL Server instance by firing up the install media. You’d be wrong. You must go into the Control Panel, into Add/Remove programs, click the Microsoft SQL Server 2005 component, and click its Change button. Don’t forget to re-apply your service packs!