This is a blog about stuff I come across as a DBA; fun stuff, hard stuff, weird stuff...just the random day to day stuff that I encounter.

Wednesday, October 21, 2009

File Type changes on a very large table

I had a situation recently where in looking at some tables we realized that they were originally created with a number of columns set to BigInt. As we know, the BigInt type can hold values upto 9,223,372,036,854,775,807, and has a cost of 8 bytes. We don't think that we will ever have values in these specific fields bigger than the max size of the int data type (2,147,483,647) and int uses half the size (4 bytes). This was the case for around 15-20 tables on several rows in each table and each table was around 20 gb. In many of the tables every field was a BigInt, so changing the datatype to Int would save us half the storage space needed for the table. Without thinking it through one of the DBAs tried to change the data type of one of the tables (after hours) and the Tranlog went through the roof, to the point that we actually got down to almost no free space left on our tranlog drive....not good, so how to solve the problem?These tables were setup on a partition, where each month gets its own file, this will come in handy later on.

/***********************************************First: create staging table:***********************************************/1) This I do with the GUI...Right click on the table, choose Script Table As: Create To: New Query Editor Window2) Do the same thing for indexs and triggers, etc (but here I copy to clipboard and paste into the new table window.3) Do a replace all on the tableName to add a _Staging to the end. Note, this depending on your naming, this should add _Staging to the indexes and triggers as well.

/***********************************************Next: determine the min date of the data:***********************************************/

--We use a date/time dimension to store date values, so dates/times are stored as int values in all tablesSELECT MIN(TDT.DateTimeField) FROM TableName TN WITH(NOLOCK) JOIN TimeDimTable TDT WITH(NOLOCK) ON TN.TimeKeyID = TDT.TimeKeyID

/***********************************************Next: Next we BCP out all the data for eachmonth and import it into our staging table.I do this in yearly batches so that I can checknumbers to make sure everything matches as I go.Also, stop before you get to the current month.***********************************************/SET NOCOUNT ON

--The date you enter below needs to be the first of the month, 2 months after the last month you want to process. So 2/1/2009 means that we will process upto 12/31/2008.WHILE @ThisEndDate < '11/01/2009'BEGIN

/***************************************************** Next check that the row counts match up. Here is where having the data in a partition is nice I pull the rowcount for each partition for the tables. I pase this into Excel with some code to alert me if the values dont match: =IF(B3,IF(D3<>E3,"BAD",""),"")The data I paste into excel is:IndexID,PartitionNumber,FileGroupName,RowCt,NewRowCt

Here is the query to get this data:*****************************************************/DECLARE @FileGroupName varchar(50),@TableName varchar(50)SET @FileGroupName = 'Months_Partition_Im_Working_With'SET @TableName = 'TableName'-- do this twice, adding _Staging the second time

SELECT OBJECT_NAME(p.object_id) AS TableName,-- i.name AS IndexName,p.index_id AS IndexID,--NULL AS PartitionScheme,p.partition_number AS PartitionNumber,fg.name AS FileGroupName,NULL AS LowerBoundaryValue,NULL AS UpperBoundaryValue,-- NULL AS Boundary,p.rowsFROM sys.partitions AS pJOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_idJOIN sys.data_spaces AS ds ON ds.data_space_id = i.data_space_idJOIN sys.filegroups AS fg ON fg.data_space_id = i.data_space_idWHERE OBJECTPROPERTY(p.object_id, 'ISMSShipped') = 0and object_name(p.object_id) LIKE @TableNameAND fg.NAME LIKE @FileGroupNameUNION ALL--get info for partitioned table/indexesSELECT OBJECT_NAME(p.object_id) AS TableName,-- i.name AS IndexName,p.index_id AS IndexID,--ds.name AS PartitionScheme,p.partition_number AS PartitionNumber,fg.name AS FileGroupName,prv_left.value AS LowerBoundaryValue,prv_right.value AS UpperBoundaryValue,-- CASE pf.boundary_value_on_right WHEN 1 THEN 'RIGHT' ELSE 'LEFT' END AS Boundary,p.rows AS RowsFROM sys.partitions AS pJOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_idJOIN sys.data_spaces AS ds ON ds.data_space_id = i.data_space_idJOIN sys.partition_schemes AS ps ON ps.data_space_id = ds.data_space_idJOIN sys.partition_functions AS pf ON pf.function_id = ps.function_idJOIN sys.destination_data_spaces AS dds2 ON dds2.partition_scheme_id = ps.data_space_id AND dds2.destination_id = p.partition_numberJOIN sys.filegroups AS fg ON fg.data_space_id = dds2.data_space_idLEFT JOIN sys.partition_range_values AS prv_left ON ps.function_id = prv_left.function_id AND prv_left.boundary_id = p.partition_number - 1LEFT JOIN sys.partition_range_values AS prv_right ON ps.function_id = prv_right.function_id AND prv_right.boundary_id = p.partition_numberWHERE OBJECTPROPERTY(p.object_id, 'ISMSShipped') = 0and object_name(p.object_id) LIKE @TableNameand fg.name LIKE @FileGroupName-- and p.rows > 0-- and p.index_id IN (0,1)-- and p.partition_number = 2ORDER BY TableName, IndexID, PartitionNumber

/*****************************************************Next: Repeat for all months until you get to current.We were lucky that this was for a DataWarehouse andthe data was imported hourly, so we didn't have to beconcerned about getting the current months data andhaving live data coming in at the same time, so wejust did the same thing for time spanfirst of current month - today+1 then as soon as weverified it, we did the swap below.

If we had to worry about live data, I would have donea date span of first of current month to today-1 bcpthen done an after hours insert to get todays data,within a transaction and then done the swap below:*****************************************************/USE [DBName]EXEC sp_rename 'TableName.IndexName','ZZ_IndexName', 'INDEX'--Repeat for all origional table indexes, triggers, etc.

About Me

Hello, I'm a database architect with a background in software engineering and website design. I have been working with Databases since 2000, and specifically as a DBA since 2005. In my off time, I enjoy geocaching (high tech treasure hunting), and Scouts.