The PIVOT Operator

In my next few posts, I’m going to look at how SQL Server implements the PIVOT and UNPIVOT operators. Let’s begin with the PIVOT operator. The PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table. For example, suppose we want to store annual sales data by employee. We might create a schema such as the following:

Notice that this schema has one row per employee per year. Moreover, notice that in the sample data employees 2 and 3 only have sales data for two of the three years worth of data. Now suppose that we’d like to transform this data into a table that has one row per employee with all three years of sales data in each row. We can achieve this conversion very easily using PIVOT:

I’m not going to delve into the PIVOT syntax which is already documented in Books Online. Suffice it to say that this statement sums up the sales for each employee for each of the specified years and outputs one row per employee. The resulting output is:

Notice that SQL Server inserts NULLs for the missing sales data for employees 2 and 3.

The SUM keyword (or some other aggregate) is required. If the Sales table includes multiple rows for a particular employee for a particular year, PIVOT does aggregate them – in this case by summing them – into a single data point in the result. Of course, in this example, since the entry in each “cell” of the output table is the result of summing a single input row, we could just as easily have used another aggregate such as MIN or MAX. I’ve used SUM since it is more intuitive.

This PIVOT example is reversible. The information in the output table can be used to reconstruct the original input table using an UNPIVOT operation (which I will cover in a later post). However, not all PIVOT operations are reversible. To be reversible, a PIVOT operation must meet the following criteria:

All of the input data must be transformed. If we include a filter of any kind including on the IN clause, some data may be omitted from the PIVOT result. For example, if we altered the above example only to output sales for 2006 and 2007, clearly we could not reconstruct the 2005 sales data from the result.

Each cell in the output table must derive from a single input row. If multiple input rows are aggregated into a single cell, there is no way to reconstruct the original input rows.

The aggregate function must be an identity function (when used on a single input row). SUM, MIN, MAX, and AVG all return the single input value unchanged and, thus, can be reversed. COUNT does not return its input value unchanged and, thus, cannot be reversed.

Here is an example of a non-reversible PIVOT operation. This example, calculates the total sales for all employees for all three years. It does not itemize the output by employee.