Paging in SQL Server 2005

Thursday Jan 10th 2008 by Quin Street

Share:

Developers and database administrators have long debated methods for paging recordset results from Microsoft SQL Server, trying to balance ease of use with performance. The simplest methods were less efficient because they retrieved entire datasets from SQL Server before eliminating records which were not to be included, while the best-performing methods handled all paging on the server with more complex scripting. The ROW_NUMBER() function introduced in SQL Server 2005 provides an efficient way to limit results relatively easily.

By David Beahm

Introduction

Developers and database administrators have long debated methods for paging recordset results from Microsoft SQL Server, trying to balance ease of use with performance. The simplest methods were less efficient because they retrieved entire datasets from SQL Server before eliminating records which were not to be included, while the best-performing methods handled all paging on the server with more complex scripting. The ROW_NUMBER() function introduced in SQL Server 2005 provides an efficient way to limit results relatively easily.

Paging Efficiency

In order to scale well, most applications only work with a portion of the available data at a given time. Web-based data maintenance applications are the most common example of this, and several data-bindable ASP.NET classes (such as GridView and Datagrid) have built-in support for paging results. While it is possible to handle paging within the web page code, this may require transferring all of the data from the database server to the web server every time the control is updated. To improve performance and efficiency, data which will not be used should be eliminated from processing as early as possible.

Paging Methods

Many popular databases offer functions allowing you to limit which rows are returned for a given query based upon their position within the record set. For example, MySQL provides the LIMIT qualifier, which takes two parameters. The first LIMIT parameter specifies which (zero-based) row number will be the first record returned, and the second parameter specifies the maximum number of records returned. The query:

SELECT * FROM table LIMIT 20,13

...will return the 20th through the 32nd records -- assuming at least 33 records are available to return. If fewer than 33 records are available, the query will return all records from record 20 on. If fewer than 20 records are available, none will be returned.

SQL Server does not have this functionality, however the 2005 release does have a number of other new tricks. For instance, support for CLR procedures means it is possible to use existing paging methods to write VB.NET or C# code that would execute within the SQL Server environment. Unfortunately, CLR procedures are not as efficient as native Transact SQL. To ensure best performance, queries should still be written in TSQL whenever practical.

Using ROW_NUMBER()

TSQL in the 2005 release includes the ROW_NUMBER() function, which adds an integer field to each record with the record's ordinal result set number. Stated more simply, it adds the record's position within the result set as an additional field so that the first record has a 1, the second a 2, etc. This may appear to be of little value, however by using nested queries we can use this to our advantage.

To demonstrate ROW_NUMBER() and to explore how the paging solution works, create a simple salary table and populate it with random data using the following commands:

The ROW_NUMBER() function has no parameters - it simply adds the row number to each record in the result set. To ensure the numbering is consistent, however, SQL Server needs to know how to sort the data. Because of this, ROW_NUMBER() must immediately be followed by the OVER() function. OVER() has one required parameter, which is an ORDER BY clause. The basic syntax for querying the Salaries table is:

SELECT ROW_NUMBER() OVER(ORDER BY person), person, income
FROM Salaries

This returns the following result:

(No column name)

person

income

1

Amelia

36000.00

2

Anna

49000.00

3

Constance

51000.00

4

Danielle

68000.00

5

Elizabeth

23000.00

6

Emerson

84000.00

7

Eva

51000.00

8

Joe

28000.00

9

John

67000.00

10

Jorge

48000.00

11

Karen

73000.00

12

Michael

45000.00

13

Ralph

18000.00

14

Stanley

59000.00

15

Stephanie

47000.00

16

Sue

96000.00

17

Waldo

47000.00

The Salaries data now appears sorted by person, and it has an extra column indicating each record's position within the results.

If for any reason you wanted the results to display in a different order than they were numbered in, you can include a different ORDER BY clause as part of the normal SELECT syntax:

SELECT ROW_NUMBER() OVER(ORDER BY person), person, income
FROM Salaries
ORDER BY income

This returns the following result:

(No column name)

person

income

13

Ralph

18000.00

5

Elizabeth

23000.00

8

Joe

28000.00

1

Amelia

36000.00

12

Michael

45000.00

15

Stephanie

47000.00

17

Waldo

47000.00

10

Jorge

48000.00

2

Anna

49000.00

3

Constance

51000.00

7

Eva

51000.00

14

Stanley

59000.00

9

John

67000.00

4

Danielle

68000.00

11

Karen

73000.00

6

Emerson

84000.00

16

Sue

96000.00

If we want to limit the results displayed to a certain range, we need to nest this SELECT inside another one and provide a name for the ROW_NUMBER() column. To limit our results to records 5 through 9, we can use the following query:

SELECT *
FROM (SELECT ROW_NUMBER() OVER(ORDER BY person) AS
rownum, person, income FROM Salaries) AS Salaries1
WHERE rownum >= 5 AND rownum <= 9

This returns the following result:

rownum

person

income

5

Elizabeth

23000.00

6

Emerson

84000.00

7

Eva

51000.00

8

Joe

28000.00

9

John

67000.00

Again, we can change the sort order by adding an ORDER BY clause. This is most easily accomplished by using the outer SELECT statement:

SELECT *
FROM (SELECT ROW_NUMBER() OVER(ORDER BY person) AS
rownum, person, income FROM Salaries) AS Salaries1
WHERE rownum >= 5 AND rownum <= 9
ORDER BY income

This returns the following result:

rownum

person

income

5

Elizabeth

23000.00

8

Joe

28000.00

7

Eva

51000.00

9

John

67000.00

6

Emerson

84000.00

If we want to support the same type of arguments that MySQL's LIMIT() supports, we can create a stored procedure that accepts a beginning point and a maximum number of records to return. ROW_NUMBER requires that the data be sorted, so we will also have a required parameter for the ORDER BY clause. Execute the following statement to create a new stored procedure:

The pageSalaries procedure begins with SET NOCOUNT ON to disable the record count message (a common step for optimizing query performance). We then declare two necessary variables, @STMT and @ubound. Because we want to be able to change what ORDER BY argument is used, we need to dynamically generate our query statement by storing it in @STMT. The next lines ensure that only positive numbers are used for the starting position and maximum size, then calculate the range of ROW_NUMBER() values being requested. (If we wanted to be zero-based like MySQL's LIMIT, we could do so with a few minor tweaks.) Once the dynamic SQL command has been strung together, it is executed so that the results are returned.

Execute the following statement to test the stored procedure:

pageSalaries 4, 7, 'income'

This returns the following result:

person

income

Amelia

36000.00

Michael

45000.00

Stephanie

47000.00

Waldo

47000.00

Jorge

48000.00

Anna

49000.00

Constance

51000.00

If we execute:

pageSalaries 13, 7, 'income'

we receive back:

person

income

John

67000.00

Danielle

68000.00

Karen

73000.00

Emerson

84000.00

Sue

96000.00

... because the query goes beyond the number of records available.

Taking this one step further, we can make a stored procedure that does a more general form of paging. In fact, it can be generalized to the point that it can be used to return any collection of fields, in any order, with any filtering clause. To create this wunderkind marvel, execute the following command:

You may receive the following error message from SQL Server, which you can confidently ignore:

Cannot add rows to sys.sql_dependencies for the stored procedure because it depends on the missing table 'sp_executeSQL'. The stored procedure will still be created; however, it cannot be successfully executed until the table exists.

The utilPage procedure accepts 6 parameters:

@datasrc

- the table (or stored procedure, etc.) name

@orderBy

- the ORDER BY clause

@fieldlis

- the fields to return (including calculated expressions)

@filter

- the WHERE clause

@pageNum

- the page to return (must be greater than or equal to one)

@pageSize

- the number of records per page

The stored procedure needs the name of a data source to query against (such as a table) and one or more fields to sort by (since OVER() requires an ORDER BY clause). If @filter is blank (the default), it will be set to "1 = 1" as a simple way to select all records. If @pageSize is not supplied, the query will run without paging and will not return a record count.

If, however, @pageSize is supplied, a version of the query is executed to get the total number of records. In order to have this record count available within the procedure and as a returned value, we use sp_executeSQL to support executing the statement while returning an output parameter. The record count is used to prevent returning empty results when possible, and to support paging interfaces that calculate the number of pages available (such as GridView). If we were calling this stored procedure to populate a GridView, we would return @recct as a ReturnValue parameter instead of using a result set, but we will use a result set for demonstration purposes.

The procedure calculates what the actual record positions will be for the requested page. Rather than allow the query to fail, there are safety checks ensuring that @pageSize and @pageNum are greater than zero, and that the result set will not be empty. If the specified page is out of range, this procedure will return the last possible page of records. This is helpful if a user changes more than one setting before refreshing their data, or if a significant amount of data is deleted between requests.

The remainder of the procedure is virtually identical to the pageSalaries procedure. To test the utilPAGE stored procedure, execute the following statement:

utilPAGE 'Salaries', 'person', '*', 'income > 1000', 2, 4

This returns the following two result sets:

recct

17

row

person

income

5

Elizabeth

23000

6

Emerson

84000

7

Eva

51000

8

Joe

28000

If we execute:

utilPAGE 'Salaries', 'person', 'person, income', '', 13, 3

...we receive back:

recct

17

person

income

Stephanie

47000

Sue

96000

Waldo

47000

Even though the request should be for records 36 through 38 - far outside of what is available - the procedure returns the last available page of records. In contrast, requesting the third page with seven records per page using:

utilPAGE 'Salaries', 'person', 'person, income', '', 3, 7

...returns the last three records, as the page is not completely out of bounds:

person

income

Stephanie

47000

Sue

96000

Waldo

47000

All of these examples are based on simple single-table queries, which may not reflect what you need in the real world. While the utilPAGE procedure does not support ad-hoc JOINs, it does work with SQL Views. If you want paging support for multi-table queries, you should create a View (with all of the necessary JOINs) to use as the data source. Using a View follows good design practices as it ensures that your Joins are performed consistently, allows easier ad-hoc querying from the command line, and is much easier to troubleshoot than a stored procedure's dynamic SELECT statement logic.

Conclusion

While SQL Server does not have as simple a method for paging results as some other databases, features introduced in the 2005 release have made it possible to page results efficiently more easily than ever before. In the next article in this series, we will go a step further and integrate this paging logic with a GridView through a Data Access Layer.