Create Pivoted Tables in 3 Steps

Many people find the PIVOT operator syntax in SQL Server Books Online (BOL) hard to understand. The syntax in SQL Server Books Online uses a derived table as the basis for the PIVOT query. What SQL Server Books Online doesn't point out is that you can use a common table expression (CTE) instead. Kathi Kellenberger walks you through creating a PIVOT query that uses a common table expression.

The PIVOT operator, which was introduced in SQL Server 2005, lets you create results that are pivoted, essentially using the data from one of the columns as column headers. For example, suppose you want to create a report that breaks down sales by year and month so that you can compare sales months for different years. Using the 2005 or 2008 version of the AdventureWorks database, you can create a query summarizing the data with the code in Listing 1. Table 1 shows an excerpt from the results. As you can see, looking for trends by month isn't easy.

Listing 1: Query that Summarizes Sales by Year and Month

SELECT SUM(TotalDue) TotalDue, YEAR(OrderDate) AS YearOrdered,
MONTH(OrderDate) AS MonthOrdered
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate),MONTH(OrderDate)
ORDER BY YEAR(OrderDate),MONTH(OrderDate)

TotalDue

YearOrdered

MonthOrdered

Table 1: Partial Results from the Query in Listing 1

1172359.4289

2001

7

2605514.9809

2001

8

2073058.5385

2001

9

...

3781879.0708

2002

7

...

Table 2 shows the results as you would like to see them. In Table 2, the data is summarized and displayed so that the months can be easily compared from year to year. I'll explain how to write queries that use the PIVOT operator to produce the results shown in Table 2.

Year

January

February

March

...

November

December

Table 2: The Pivoted Results

2001

NULL

NULL

NULL

...

3690018.6652

3097637.3384

2002

1605782.1915

3130823.0378

2643081.0798

...

4427598.0006

3545522.7380

2003

2233575.1127

3705635.4979

2611621.2596

...

5961182.6761

6582833.0438

2004

3691013.2227

5207182.5122

5272786.8106

...

NULL

NULL

Note that I won't be using the PIVOT syntax shown in SQL Server Books Online (BOL) because that syntax can be difficult to understand at first glance. The syntax in BOL uses a derived table as the basis for the PIVOT query. What BOL doesn't point out is that you can use a common table expression (CTE) instead. Listing 2 shows the syntax for a PIVOT query that uses a CTE. As you can see, there are two main parts: a base query (callout A) and a PIVOT expression (callout B). This syntax might look intimidating, but I'll guide you through it step by step.

Listing 2: Syntax for a PIVOT Query that Uses a CTE

BEGIN CALLOUT A
-- The base query
WITH AS (

Step 1. Write the Base Query

Writing the base query takes a bit of planning. It's important that this query include only the columns that will be needed in the final results. Any columns not pivoted or aggregated will end up as groupings, so any unnecessary columns will cause extraneous grouping levels and unexpected results.

In this case, the Sales.SalesOrderHeader table has more columns than what is needed in the results. All you need is the OrderYear, OrderMonth, and TotalDue columns. You're going to group by the OrderYear column, pivot by the OrderMonth column, and aggregate the TotalDue column, so the base query is

SELECT TotalDue, YEAR(OrderDate)
AS OrderYear,
DATENAME(MONTH,OrderDate)
AS MonthName
FROM Sales.SalesOrderHeader

The code in Listing 3 shows this base query as a CTE. You should execute the CTE to make sure that the necessary columns are present and that there aren't any extraneous columns. (The results will not be aggregated or pivoted at this point.)

Step 2. Create the PIVOT Expression

The next step is to create the PIVOT expression. The first element in the PIVOT expression is an aggregate function. Often this function will be SUM. The parameter of the aggregate function is the name of the column to be aggregated. The function's results will show up under the pivoted columns. In this example, you want to compare the sum of the TotalDue values by month, so TotalDue is the aggregated column. The PIVOT expression with the aggregate function is then

PIVOT(SUM(TotalDue)

After the aggregate function, you must type the keyword FOR followed by the name of the pivoted column. To determine the pivoted column, you need to figure out which column contains the values that you want displayed as column headers. In other words, which values that are currently displayed vertically do you want to display horizontally? In this example, the pivoted column is OrderMonth, so the code looks like

PIVOT(SUM(TotalDue) FOR OrderMonth

The pivoted column's name is followed by an IN list that's similar to one found in a WHERE clause. This IN list serves two purposes. First, it restricts the rows that are pivoted. Second, it supplies the pivoted column names. If the values that will end up as column names don't follow the rules for regular identifiers, they must be surrounded by brackets ([ ]). For example, if this example used month numbers instead of month names, you'd need to place each month number inside brackets.

One limitation of PIVOT queries is that they aren't dynamic, so all the column headers need to be hard-coded. If the pivoted column has values that vary over time, this part of the expression must be modified each time the data in the pivoted column changes. In this example, the column headers aren't likely to change because they're the months of the year. However, if a query compared sales by territories or sales by salespeople, the query would probably have to be modified frequently. Another option would be to write a stored procedure using dynamic SQL to determine the column headings. If you're interested in learning how to create a dynamic pivot query, see Itzik Ben-Gan's web-exclusive article "Dynamic Pivoting" or Inside Microsoft SQL Server 2005 T-SQL Programming (Microsoft Press, 2006) by Ben-Gan, Dejan Sarka, and Roger Wolter.

The IN list needs to be enclosed in parentheses so the PIVOT expression now looks like

Next, you must give an alias to the PIVOT expression. An alias is required because the PIVOT function's results are treated as a table. The alias goes after the final closing parenthesis, so in this case, the PIVOT expression looks like

The PIVOT expression is now complete. As callout B in Listing 2 shows, it goes after the FROM clause and before the ORDER BY clause if there is one.

Step 3. Add the Column Names to the SELECT List

At this point, you need to add the column names to the SELECT list in the main query. In this case, the columns are OrderYear and the pivoted columns. You should not list the aggregated column, TotalDue, so the SELECT list looks like

Variations

There are many ways the PIVOT query can vary. Take, for example, the PIVOT query in Listing 5.

Listing 5: Another Example of a PIVOT Query

WITH BaseQuery AS(
SELECT TotalDue, YEAR(OrderDate) AS OrderYear,
MONTH(OrderDate) AS OrderMonth
FROM Sales.SalesOrderHeader
)
SELECT OrderYear,[1] AS [January],[2] AS [February],
[3] AS [March],[4] AS [April],[5] AS [May],[6] AS [June],
[7] AS [July], [8] AS [August],[9] AS [September],
[10] AS [October], [11] AS [November],[12] AS [December]
FROM BaseQuery
PIVOT(SUM(TotalDue) FOR OrderMonth IN ([1],[2],[3],[4],[5],
[6],[7],[8],[9],[10],[11],[12])) AS PVT
ORDER BY January DESC

This query varies from the one in Listing 4 two ways:

It uses aliased column names. In this query, the numeric month of the order date is used in the base query. The pivoted column names are aliased in the SELECT list so that the column headers are month names rather than month numbers. To save typing, it's advantageous to produce the desired column headings in the base query, which in this case is the CTE.

It uses an ORDER BY clause to order the returned data. The ORDER BY clause needs to go after the PIVOT expression. The only columns allowed in the ORDER BY clause are those that actually show up as columns in the results. You can include the columns used for grouping and those specified in the IN list. For example, this query uses ORDER BY January DESC to display the results in order of highest to lowest sales in January. If you want to display the year in descending order, you would put ORDER BY OrderYear DESC after the PIVOT expression.

A Useful Tool

Although the PIVOT operator might look intimidating, writing a PIVOT query isn't that difficult if you take it step by step. The PIVOT operator is perfect for pivoting results when the pivoted columns aren't likely to change. Although it's disappointing that the PIVOT operator isn't dynamic, the PIVOT operator is still a very useful tool. (A dynamic PIVOT operator is on my wish list for the next release of SQL Server since the feature didn’t make it into SQL Server 2008.)