Solving SQL Server 2000 index fragmentation

Indexing saves you from having to scan an entire table for data search and retrieval, but indexes can easily become fragmented over time. Edgewood Solutions' Jeremy Kadlec explains how they become fragmented and how to maintain beneficial indexes to ensure high performance.

By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

indexes. The indexes improve access to the data so you don't have to scan an entire table, a costly endeavor in terms of CPU, IO and memory usage. Over time, indexes can become fragmented, yielding lower then optimal SQL Server performance, longer transaction times, blocking, locking and lower throughput.

The extent of index fragmentation depends on table fragmentation, which can be caused by page splits. A page split occurs when an insert or update statement is issued, yielding data from one page that exceeds the total capacity for the page. An overflow to another page is necessary to support the data storage. When the page split occurs, SQL Server leaves the first half of the data on the original page and splits the remainder of the data to another page. This is operation is costly in terms of time and concurrency. The page splitting can ultimately be caused by the index fill factor and by the pad_index being high (i.e., closer to 100) and compacting the pages too tightly.

Resolving index fragmentation is both an art and a science, but it is also a long-term initiative to ensure high performance as the application environment changes. In the tips below, we will explore these items:

First and foremost it is imperative to understand the data that is being stored and how the data will be accessed. For new applications, speak with users and application developers about their application flow and data usage requirements -- data access (SELECT) and data manipulation (INSERT, UPDATE and DELETE). For an existing system, issue an sp_depends, which will provide the database objects related to a table. If any of the T-SQL is embedded in the front end, middle tier or ad-hoc code, speak with the application developers to gain insight into their needs. Also speak to users to understand the application issues they experience. In either case, leverage tools or scripts to better understand the business flow and potential for beneficial indexes, as well as to gain validation from users and developers.

Once the data is understood, it is time to design or redesign the indexes. In SQL Server 2000 Books Online, there is a good article about index design that I encourage you to review. As a first rule of thumb, indexes should be considered for columns used in JOIN conditions as well as WHERE, ORDER BY and GROUP BY clauses. Read the article for additional recommendations.

A CRUD chart (Create, Read, Update, Delete) can also help you gain insight into your indexing needs based on data access. Although building a CRUD chart can be a daunting task (most people don't even know how to begin one), creating a chart for key tables may help with overall indexing; since most systems have performance problems related to a subset of the tables, focusing on these will improve the performance with the least amount of effort. The following is a sample CRUD chart for a single table (Customer) by stored procedure\ad-hoc and operation (JOIN, WHERE, ORDER BY, etc.):

Table Name = Customer

ID

Column

Object

Create

Read

Update

Delete

1

CustomerID

spCustomerSelectAll

-

INNER JOIN

-

-

-

-

spCustomerUpdate

-

-

-

WHERE

2

Name

spCustomerSelectAll

-

ORDER BY

-

-

-

-

spCustomerDelete

-

-

-

WHERE

-

-

spCustomerLookup

-

WHERE

-

-

3

Address

-

-

-

-

-

4

City

-

-

-

-

-

5

State

spCustomerSelect_ ByState

-

WHERE GROUP BY

-

-

6

ZipCode

-

-

-

-

-

7

PhoneNumber

-

-

-

-

-

8

EmailAddress

Ad-Hoc

-

WHERE

-

-

*

spCustomerInsert

None

-

-

-

-

This chart provides a scientific understanding of indexes on a per-table basis. From this data, you need to balance index needs based on transaction types, which is more of an art form. Indexes typically benefit SELECT commands, but they can hinder INSERT, UPDATE or DELETE statements because SQL Server has to maintain the index during these operations, yielding additional overhead.

Another critical piece of data is the frequency (daily, weekly, monthly, quarterly, yearly, never) of index rebuilds. The more frequently indexes are rebuilt, the more aggressive the index approach can be. If a system needs to run 24/7 for the next three years to support a single function, the indexes must be designed differently than a database supporting multiple functions with indexes rebuilt on a weekly basis. Nevertheless, systems are not stagnate and they will change over time based on shifting business needs. It pays to plan for these items or to make corrections that will be proactive for the future.

SQL Server has two types of indexes: clustered and nonclustered. I would refer to the two previous for detailed information from SQL Server 2000 Books Online, because this is a topic unto itself. In a nutshell, a table can have no indexes, a clustered index, a clustered index and a nonclustered index or just nonclustered indexes. A single clustered index can be created for each table and the data is physically sorted based on this column. For a table without a clustered index, the data is stored based on the order of the data entry.

The two primary configurations for a CREATE INDEX that pertain to index fragmentation are:

FILL FACTOR -- determines how full the leaf level (data) pages are filled.

PAD_INDEX -- determines how full the intermediary level (pointers from index to data pages) pages are filled, which is typically the same value as the FILL FACTOR.

If you are not sure of the fragmentation level over a specific interval (i.e., day, week, month, quarter, semi-annual, annual, etc.), a good starting point for the fill factor configuration is 80%. Monitor the fragmentation to determine the ideal configuration on a per-table\clustered index basis; the amount of storage required can be dramatically different for configurations with a 5% to 10% difference. The lower the fill factor (i.e., 40% to 50%), the more storage is needed and the more pages an index will have to scan or seek to fulfill the query request. With a high fill factor (90% to 100%), less storage is needed and fewer pages are scanned, but costly page splits can occur causing performance degradation and index fragmentation. The recommendation? Find the ideal fill factor to avoid page splits and fragmentation while not having to support an excessively large database.

The key SQL Server command to identify index fragmentation is DBCC SHOWCONTIG. The following is sample code to determine the index fragmentation for the Authors table in the Pubs database:

USE Pubs
GO
DBCC SHOWCONTIG ('Authors')
GO

To determine the index fragmentation for the clustered index on the Authors table in the Pubs database, issue the following command:

USE Pubs
GO
DBCC SHOWCONTIG ('Authors',1)
GO

As a point of reference, the 1 value indicates the clustered index. A value of 2 to 255 indicates a specific nonclustered index.

One rule of thumb for determining whether or not to rebuild an index is when the scan density is less than 90%.

Getting back to understanding your data from tip 1: If you notice that particular tables consistently have low scan density, consider lowering the fill factor and pad_index in 5% to 10% increments to improve the fragmentation.

With the rate of data growth yielding larger databases and businesses requiring more uptime, every second counts during the maintenance window. Leverage your backup server or restore a recent backup of the production database to a development/test server and then issue DBCC commands. That will give you insight into the database fragmentation at a point in time, so the maintenance window can be focused on rebuilding indexes. Once you have this data, you can issue one of the commands listed in tip 4 to rebuild the fragmented indexes. Then reissue DBCC SHOWCONTIG to validate that the index fragmentation has been corrected. This will validate your scripts and provide a rough estimate on the time needed for the production system.

Issue when: no users are on the system the indexes are changing for the table requiring old indexes to be eliminated and replaced by a new index configuration the clustered index is changing for the table, because all nonclustered indexes will need to be rebuilt because they are dependent on the clustered index.

Issue when: no users are on the system referential integrity exists between tables and needs to be maintained index configuration is not changing the fill factor needs to change for the clustered index it supports an atomic transaction ensuring no indexes will be lost

Related to index fragmentation is disk-level fragmentation, a non-contiguous file system based on file deletions and reorganization of the logical disk file system, which can yield poor performance for I/O intensive processes. The following recommendations minimize the amount of disk-level fragmentation:

Have a dedicated disk for your databases.

Write the database across the disk in a contiguous manner.

Pre-allocate the database size in large allocations relative to the database (i.e., 10 GB increments for a 100
GB database that grows 1 GB on a monthly basis).

Do not auto grow and auto shrink the database

Conclusion

Index fragmentation is critical to a high-performance database. Identifying the beneficial indexes and maintaining them over time will ensure high performance throughout the life of the application. Good luck!

About the author:Jeremy Kadlec is the principal database engineer at Edgewood Solutions, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and delivers frequent presentations at regional SQL Server users groups and nationally at SQL PASS. Kadlec is the SearchSQLServer.com Performance Tuning expert. Ask him a question here.

0 comments

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy