Managing Data Space

To efficiently manage storage for your databases, you need to understand what objects take up disk space and how SQL Server stores those objects. In SQL Server 2000, for example, one simple system table tracks space usage, only two objects consume disk space, and only three types of pages exist to store user data. This structure is relatively easy to manage, but it also has its limitations, especially regarding how SQL Server stores and retrieves large object (LOB) data.

Storage Space in SQL Server 2000

In SQL Server releases preceding SQL Server 2005, only two things in a database actually use storage space: tables and indexes. In addition, there are only a few ways that the information in tables and indexes can be stored.

SQL Server 2000 provides three types of pages for storing application information: index pages, data pages, and a third type of page for storing LOB data. LOB data is data defined as one of three data types: text, ntext, or image.

This relatively simple storage model requires only one system table, sysindexes, to keep track of all SQL Server's space-consuming objects. Sysindexes holds one row for each table and one row for each index on the table. It also optionally stores one row per table to keep track of any LOB data the table contains. Every row in sysindexes contains information about how much disk space the table, index, or LOB data consumes and where you can find the pages for that structure.

If a table has a clustered index, the table's data is considered part of the index, so the data rows are actually index rows in sysindexes. For a table with a clustered index, sysindexes has a row with an index ID (indid) value of 1. If a table has no clustered index, there is no organization to the table data; we call such a table a heap. A heap in sysindexes has an indid value of 0. Every additional index has a row in sysindexes that contains an indid value from 2 to 250. (See "Digging Up the Dirt on Indexes," December 2004, for information about how to interpret sysindexes' first and root columns to determine which file and page number contain the first page of the table and the root of the index.)

The sysindexes column that contains the indid value is a tinyint data type, which means it can hold values up to only 255.The maximum indid value for an index is 250 because the values 251-254 are reserved, and SQL Server 2000 uses the special indid value of 255 to keep track of pages holding LOB data. SQL Server uses the same sysindexes columns that keep track of the space a table or index uses to keep track of the space consumed by all the LOB data in any row or column of the table. (For more information about sysindexes, see "Something About Sysindexes," April 2005.)

The simplicity of the sysindexes table is one of its strengths. However, SQL Server 2005 addresses several problems with the sysindexes table structure.

One problem is that the sysindexes table also stores rows for column statistics not associated with an index, which means these statistics must have a unique indid value. If you have a lot of statistics on a table, you could run out of indid values before you build all the indexes you need on the table.

Sysindexes also doesn't allow for other types of pages or for redesign if the relationship between pages changes. Finally, because SQL Server 2000 considers LOB data as a special kind of data associated with a table, indexes can't contain LOB data. In SQL Server 2000 and earlier releases, you can't create an index on a text, image, or ntext column, so the sysindexes structure remained workable. But things changed in SQL Server 2005.

New Ways of Storing Data

SQL Server 2005 still needs to store regular rows for data and indexes. But SQL Server 2005's new varchar(max) data type lets you define a column to hold both regular row data and LOB data. In addition, you can define indexes to have columns of type varchar(max), which means an index can contain LOB data. (For more information about varchar(max), see the Web-exclusive T-SQL 2005 column, "MAX Specifier Boosts Variable-Length Data Type," September 2005.)

Another new mechanism lets you define multiple large varchar fields, but instead of using MAX as the maximum length, you can use an integer up to the SQL Server 2000 maximum length of 8000.This technique lets you populate multiple large varchar columns in a row so that the total length of the row exceeds the maximum length that SQL Server can store on a page. SQL Server simply stores any varchar fields that can't fit on the row on special pages called ROW_OVERFLOW pages, which I discussed a bit in "Piecing Together Fragmentation," December 2005. SQL Server can also store index rows on ROW_OVERFLOW pages.

SQL Server 2005 also lets you partition a table or index and have its rows stored in multiple different locations. Sysindexes has no way to indicate that a single structure occupies multiple storage areas.

Thus, in SQL Server 2005, instead of considering indexes as a subsidiary storage structure to tables, you need to consider tables and indexes on an equal footing. Both tables and indexes need to store regular rows, LOB data, and ROW_OVERFLOW data. And both tables and indexes can be partitioned.

Hobts and Allocation Units

In SQL Server 2000, the sysindexes table contains all the information you need to determine how much space a table, its LOB data, and all its indexes consume. In SQL Server 2005, you need a bit more.

SQL Server 2005 introduces a new logical structure, a hobt (pronounced—you guessed it—"hobbit"),to keep track of both heaps and B-Trees. (Indexes, including tables that have clustered indexes, are stored as B-Trees.) Although SQL Server 2005's syshobts table isn't typically visible, you can see references to hobts in some other system objects. For example, the sys.partitions view includes the column hobt_id, and every partition of every index or heap has a unique hobt_id value.

The sys.partitions view contains one row for each partition of each table and each index in the database, along with the number of rows in that partition. All tables and indexes in SQL Server 2005 contain at least one partition, even if you don't explicitly partition the structures. Each partition can contain data of up to three different storage types—regular IN_ROW data (for either data or index rows), LOB data, or ROW_OVERFLOW data—and each type of data is stored on its own set of pages. (In future columns, I'll discuss how SQL Server actually stores these three types of data.)

SQL Server allocates space for each of these three types of pages separately, and the set of pages of the same type for a single hobt for one partition is called an allocation unit. The sys.allocation_units view contains a row for each allocation unit. Every partition has at least an IN_ROW data allocation unit, and it might optionally have a LOB data allocation unit and a ROW_OVERFLOW allocation unit, depending on the table or index definition.

The sys.allocation_units view has a column called container_id, which maps to the partition_id in sys.partitions. The view also has columns that report how many pages are reserved for the allocation unit and how many of those reserved pages have been used. In addition, sys.allocation_units contains a type column, which holds one of three values that indicate which type of pages it's keeping track of:

1 = IN_ROW data

2 = LOB data

3 = ROW_OVERFLOW data

Investigating with sp_spaceused

To determine the amount of space reserved and used for a particular object in the current database, as well as the number of rows for all partitions of that object, you can call the sp_spaceused stored procedure.You can also use sp_spaceused to find the disk space reserved and used by the entire database.To retrieve this information, sp_spaceused performs a simple join between the sys.partitions and sys.allocation_units views that looks similar to the join in Listing 1's sample code.

Note that the procedure retrieves information about pages used from the sys.allocation_units view and accesses row-count information from the sys.partitions view. Sp_spaceused doesn't count rows when sys.partitions indicates the partition is for an index or when sys.allocation_units indicates the data isn't IN_ROW data.

The variable in the WHERE clause at the end of Listing 1 is the object ID you want information about. The sp_spaceused procedure has determined this ID based on the parameters passed to the procedure, but you could use the object_id value to specify any object you were interested in. For example, if you wanted space-usage information for the Sales.SalesOrderDetail table in the AdventureWorks database, you could replace the WHERE clause in Listing 1 with

WHERE p.object_id = object_id ('Sales.SalesOrderDetail')

Give Me More Space

This column gives you just a glimpse of the full storage picture in SQL Server 2005.The new release has other structures besides tables and indexes that can take up space, including full-text indexes, XML indexes, and structures to support query notifications and SQL Server Service Broker operations. The sp_spaceused stored procedure also needs to take those structures into account, particularly when reporting space usage for the entire database.

Next month, I'll continue my coverage of the information that sp_spaceused needs. I'll also explain how SQL Server 2005 handles other information that the sysindexes table keeps track of in SQL Server 2000.

Discuss this Article 2

I could not find this information anywhere else. MCTS information jumps right in. This article is the only piece I could find to augment "meaning" to the technical task to partitioning (especially WHY we should).
Top notch, thanks.

From the Blogs

Duplicate records clutter databases and render the data within them unclear. This kind of problem is very common, and it’s the main reason that deduping software exists. But there’s another benefit to deduplication software: the ability to infer connections between individual records from various data sets....More

Companies looking to grow and extract value from their data are increasingly turning to Chief Data Officers (CDOs) to execute their data strategy. The role is new, and a playbook is necessary to address the many challenges CDOs face....More

After spending 20 years building analytics, BI and database solutions, I've focused on Cloud data solutions over the past 2 years. I've chosen 5 common challenges that I face every day with Cloud migrations and that you'll face in your Cloud BI projects....More