oracle-developer.net

pivot and unpivot queries in 11g

Pivot queries involve transposing rows into columns (pivot) or columns into rows (unpivot) to generate results in crosstab format. Pivoting is a common technique, especially for reporting, and it has been possible to generate pivoted resultsets with SQL for many years and Oracle versions. However, the release of 11g includes explicit pivot-query support for the first time with the introduction of the new PIVOT and UNPIVOT keywords. These are extensions to the SELECT statement and we will explore the syntax and application of these new features in this article.

pivot

We will begin with the new PIVOT operation. Most developers will be familiar with pivoting data: it is where multiple rows are aggregated and transposed into columns, with each column representing a different range of aggregate data. An overview of the new syntax is as follows:

In addition to the new PIVOT keyword, we can see three new pivot clauses, described below.

pivot_clause: defines the columns to be aggregated (pivot is an aggregate operation);

pivot_for_clause: defines the columns to be grouped and pivoted;

pivot_in_clause: defines the filter for the column(s) in the pivot_for_clause (i.e. the range of values to limit the results to). The aggregations for each value in the pivot_in_clause will be transposed into a separate column (where appropriate).

The syntax and mechanics of pivot queries will become clearer with some examples.

a simple example

Our first example will be a simple demonstration of the PIVOT syntax. Using the EMP table, we will sum the salaries by department and job, but transpose the sum for each department onto its own column. Before we pivot the salaries, we will examine the base data, as follows.

We can see that the department salary totals for each job have been transposed into columns. There are a few points to note about this example, the syntax and the results:

Line 8: our pivot_clause sums the SAL column. We can specify multiple columns if required and optionally alias them (we will see examples of aliasing later in this article);

Lines 1-4: pivot operations perform an implicit GROUP BY using any columns not in the pivot_clause (in our example, JOB and DEPTNO). For this reason, most pivot queries will be performed on a subset of columns, using stored views, inline views or subqueries, as in our example;

Line 9: our pivot_for_clause states that we wish to pivot the DEPTNO aggregations only;

Line 10: our pivot_in_clause specifies the range of values for DEPTNO. In this example we have hard-coded a list of four values which is why we generated four pivoted columns (one for each value of DEPTNO). In the absence of aliases, Oracle uses the values in the pivot_in_clause to generate the pivot column names (in our output we can see columns named "10", "20", "30" and "40").

It was stated above that most pivot queries will be performed on a specific subset of columns. Like all aggregate queries, the presence of additional columns affects the groupings. We can see this quite simply with a pivot query over additional EMP columns as follows.

In this case, all the EMP columns apart from SAL have become the grouping set, with DEPTNO being the pivot column. The pivot is effectively useless in this case.

An interesting point about the pivot syntax is its placement in the query; namely, between the FROM and WHERE clauses. In the following example, we restrict our original pivot query to a selection of job titles by adding a predicate.

This appears to be counter-intuitive, but adding the predicates before the pivot clause raises a syntax error. As an aside, in our first example we used subquery factoring (the WITH clause) to define the base column set. We can alternatively use an inline-view (as follows) or a stored view (we will do this later).

aliasing pivot columns

In our preceding examples, Oracle used the values of DEPTNO to generate pivot column names. Alternatively, we can alias one or more of the columns in the pivot_clause and one or more of the values in the pivot_in_clause. In general, Oracle will name the pivot columns according to the following conventions:

Pivot Column Aliased?

Pivot In-Value Aliased?

Pivot Column Name

N

N

pivot_in_clause value

Y

Y

pivot_in_clause alias || '_' || pivot_clause alias

N

Y

pivot_in_clause alias

Y

N

pivot_in_clause value || '_' || pivot_clause alias

We will see examples of each of these aliasing options below (we have already seen examples without any aliases). However, to simplify our examples, we will begin by defining the input dataset as a view, as follows.

This time, Oracle generated column names from the aliases only. In fact, we can see from all of our examples that the pivot_in_clause is used in all pivot-column naming, regardless of whether we supply an alias or value. We can therefore be selective about which values we alias, as the following example demonstrates.

pivoting multiple columns

Our examples so far have contained a single aggregate and a single pivot column, although we can define more if we wish. In the following example we will define two aggregations in our pivot_clause for the same range of DEPTNO values that we have used so far. The new aggregate is a count of the salaries that comprise the sum.

We have doubled the number of pivot columns (because we doubled the number of aggregates). The number of pivot columns is a product of the number of aggregates and the distinct number of values in the pivot_in_clause. In the following example, we will extend the pivot_for_clause and pivot_in_clause to include values for JOB in the filter.

We have limited the query to just 3 jobs within department 30. Note how the pivot_for_clause columns (DEPTNO and JOB) combine to make a single pivot dimension. The aliases we use apply to the combined value domain (for example, "D30_SLS" to represent SALES in department 30).

Finally, because we know the pivot column-naming rules, we can reference them directly, as follows.

general restrictions

There are a few simple "gotchas" to be aware of with pivot queries. For example, we cannot project the column(s) used in the pivot_for_clause (DEPTNO in most of our examples). This is to be expected. The column(s) in the pivot_for_clause are grouped according to the range of values we supply with the pivot_in_clause. In the following example, we will attempt to project the DEPTNO column.

Oracle raises an ORA-00904 exception. In this case the DEPTNO column is completely removed from the projection and Oracle tells us that it doesn't exist in this scope. Similarly, we cannot include any column(s) used in the pivot_clause, as the following example demonstrates.

We attempted to project the SAL column but Oracle raised the same exception. This is also to be expected: the pivot_clause defines our aggregations. This also means, of course, that we must use aggregate functions in the pivot_clause. In the following example, we will attempt to define a pivot_clause with a single-group column.

Oracle raises a new ORA-56902 exception: the error message numbers are getting much higher with every release!

execution plans for pivot operations

As we have stated, pivot operations imply a GROUP BY, but we don't need to specify it. We can investigate this by explaining one of our pivot query examples, as follows. We will use Autotrace for convenience (Autotrace uses EXPLAIN PLAN and DBMS_XPLAN to display theoretical execution plans).

The plan output tells us that this query uses a HASH GROUP BY PIVOT operation. The HASH GROUP BY is a feature of 10g Release 2, but the PIVOT extension is new to 11g. Pivot queries do not automatically generate a PIVOT plan, however. In the following example, we will limit the domain of values in our pivot_in_clause and use Autotrace to explain the query again.

This time the CBO has costed a simple aggregation over a group by with pivot. It has correctly identified that only one record will be returned from this query, so the GROUP BY operation is unnecessary. Finally, we will explain our first pivot example but use the extended formatting options of DBMS_XPLAN to reveal more information about the work that Oracle is doing.

DBMS_XPLAN optionally exposes the column projection information contained in PLAN_TABLE for each step of a query. The projection for ID=2 shows the base columns that we select in the PIVOT_DATA view over EMP. The interesting information, however, is for ID=1 (this step is our pivot operation). This clearly shows how Oracle is generating the pivot columns. Many developers will be familiar with this form of SQL: it is how we write pivot queries in versions prior to 11g. Oracle has chosen a CASE expression, but we commonly use DECODE for brevity, as follows.

pivot performance

From the evidence we have seen, it appears as though Oracle implements the new PIVOT syntax using a recognised SQL format. It follows that we should expect the same performance for our pivot queries regardless of the technique we use (in other words the 11g PIVOT syntax will perform the same as the SUM(DECODE...) pivot technique. We will test this proposition with a larger dataset using Autotrace (for general I/O patterns) and the wall-clock (for elapsed time). First we will create a table with one million rows, as follows.

The most important outputs are highlighted. We can see that the query completed in 4.5 seconds and generated approximately 14,000 PIOs and LIOs. Interestingly, the CBO chose a SORT GROUP BY over a HASH GROUP BY for this volume, having estimated almost 1.2 million records.

By way of comparison, we will run the pre-11g version of pivot, as follows.

With a couple of minor exceptions, the time and resource results for this query are the same as for the new PIVOT syntax. This is as we expected given the internal query re-write we saw earlier. In fact, the new PIVOT version of this query generated more recursive SQL and more in-memory sorts, but we can conclude from this simple test that there is no performance penalty with the new technique. We will test this conclusion with a higher number of pivot columns, as follows.

We have generated 20 pivot columns with this example. Note that the above output is from a third or fourth run of the example to avoid skew in the results. Ultimately, the I/O patterns and elapsed time are the same as our original example, despite pivoting an additional 16 columns. We will compare this with the SUM(DECODE...) technique, as follows.

We can begin to see how much more convenient the new PIVOT syntax is. Furthermore, despite the workloads of the two methods being the same, the manual pivot technique is 25% slower (observable over several runs of the same examples and also a version using CASE instead of DECODE).

pivoting an unknown domain of values

All of our examples so far have pivoted a known domain of values (in other words, we have used a hard-coded pivot_in_clause). The pivot syntax we have been using doesn't, by default, support a dynamic list of values in the pivot_in_clause. If we use a subquery instead of a list in the pivot_in_clause, as in the following example, Oracle raises a syntax error.

FOR deptno IN (SELECT deptno FROM dept))
*
ERROR at line 4:
ORA-00936: missing expression

Many developers will consider this to be a major restriction (despite the fact that pre-11g pivot techniques also require us to code an explicit set of values). However, it is possible to generate an unknown set of pivot values. Remember from the earlier syntax overview that PIVOT allows an optional "XML" keyword. As the keyword suggests, this enables us to generate a pivot set but have the results provided in XML format. An extension of this is that we can have an XML resultset generated for any number of pivot columns, as defined by a dynamic pivot_in_clause.

When using the XML extension, we have three options for generating the pivot_in_clause:

we can use an explicit list of values (we've been doing this so far in this article);

we can use the ANY keyword in the pivot_in_clause. This specifies that we wish to pivot for all values for the columns in the pivot_for_clause; or

we can use a subquery in the pivot_in_clause to derive the list of values.

We will concentrate on the dynamic methods. In the following example, we will use the ANY keyword to generate a pivoted resultset for any values of DEPTNO that we encounter in our dataset.

The XML resultset is of type XMLTYPE, which means that we can easily manipulate it with XPath or XQuery expressions. We can see that the generated pivot columns are named according to the pivot_clause and not the pivot_in_clause (remember that in the non-XML queries the pivot_in_clause values or aliases featured in all permutations of pivot column-naming). We can also see that the XML column name itself is a product of the pivot_for_clause: Oracle has appended "_XML" to "DEPTNO".

We will repeat the previous query but add an alias to the pivot_clause, as follows. If we wish to change the column name from "DEPTNO_XML", we use standard SQL column aliasing.

We can see a key difference between this XML output and the resultset from the ANY method. When using the subquery method, Oracle will generate a pivot XML element for every value the subquery returns (one for each grouping). For example, ANALYST employees only work in DEPTNO 20, so the ANY method returns one pivot XML element for that department. The subquery method, however, generates four pivot XML elements (for DEPTNO 10,20,30,40) but only DEPTNO 20 is non-null. We can see this more clearly if we extract the salaries element from both pivot_in_clause methods, as follows.

Despite the fact that three departments do not have salary totals, Oracle has generated an empty element for each one. Again, only department 20 has a value for salary total. Whichever method developers choose, therefore, depends on requirements, but it is important to recognise that working with XML often leads to inflated dataset or resultset volumes. In this respect, the subquery method can potentially generate a lot of additional data over and above the results themselves.

unpivot

We have explored the new 11g pivot capability in some detail above. We will now look at the new UNPIVOT operator. As its name suggests, an unpivot operation is the opposite of pivot (albeit without the ability to disaggregate the data). A simpler way of thinking about unpivot is that it turns pivoted columns into rows (one row of data for every column to be unpivoted). We will see examples of this below, but will start with an overview of the syntax, as follows.

The syntax is similar to that of PIVOT with some slight differences, including the meaning of the various clauses. These are described as follows:

unpivot_clause: this clause specifies a name for a column to represent the unpivoted measure values. In our previous pivot examples, the measure column was the sum of salaries for each job and department grouping;

unpivot_for_clause: the unpivot_for_clause specifies the name for the column that will result from our unpivot query. The data in this column describes the measure values in the unpivot_clause column; and

unpivot_in_clause: this contains the list of pivoted columns (not values) to be unpivoted.

The unpivot clauses are quite difficult to describe and are best served by some examples.

simple unpivot examples

Before we write an unpivot query, we will create a pivoted dataset to use in our examples. For simplicity, we will create a view using one of our previous pivot queries, as follows.

The PIVOTED_DATA view contains our standard sum of department salaries by job, with the four department totals pivoted as we've seen throughout this article. As a final reminder of the nature of the data, we will query this view.

We can see from the results that Oracle has transposed each of our pivoted columns in the unpivot_in_clause and turned them into rows of data that describes our measure (i.e. 'D10_SAL', 'D20_SAL' and so on). The unpivot_for_clause gives this new unpivoted column a name (i.e "SALDESC"). The unpivot_clause itself defines our measure data, which in this case is the sum of the department's salary by job.

It is important to note that unpivot queries can work on any columns (i.e. not just aggregated or pivoted columns). We are using the pivoted dataset for consistency but we could just as easily unpivot the columns of any table or view we have.

handling null data

The maximum number of rows that can be returned by an unpivot query is the number of distinct groupings multiplied by the number of pivot columns (in our examples, 5 (jobs) * 4 (pivot columns) = 20). However, our first unpivot query has only returned nine rows. If we look at the source pivot data itself, we can see nine non-null values in the pivot columns; in other words, eleven groupings are null. The default behaviour of UNPIVOT is to exclude nulls, but we do have an option to include them, as follows.

By including the null pivot values, we return the maximum number of rows possible from our dataset. Of course, we now have eleven null values, but this might be something we require for reporting purposes or "data densification".

unpivot aliasing options

In the pivot section of this article, we saw a wide range of aliasing options. The UNPIVOT syntax also allows us to use aliases, but it is far more restrictive. In fact, we can only alias the columns defined in the unpivot_in_clause, as follows.

SQL> SELECT job
2 , saldesc
3 , deptsal
4 FROM pivoted_data
5 UNPIVOT (deptsal
6 FOR saldesc IN (d10_sal AS 'SAL TOTAL FOR 10',
7 d20_sal AS 'SAL TOTAL FOR 20',
8 d30_sal AS 'SAL TOTAL FOR 30',
9 d40_sal AS 'SAL TOTAL FOR 40'))
10 ORDER BY
11 job
12 , saldesc;

JOB SALDESC DEPTSAL
---------- -------------------- ----------
ANALYST SAL TOTAL FOR 20 6600
CLERK SAL TOTAL FOR 10 1430
CLERK SAL TOTAL FOR 20 2090
CLERK SAL TOTAL FOR 30 1045
MANAGER SAL TOTAL FOR 10 2695
MANAGER SAL TOTAL FOR 20 3272.5
MANAGER SAL TOTAL FOR 30 3135
PRESIDENT SAL TOTAL FOR 10 5500
SALESMAN SAL TOTAL FOR 30 6160
9 rows selected.

This is a useful option because it enables us to change the descriptive data to something other than its original column name. If we wish to alias the column in the unpivot_clause (in our case, DEPTSAL), we need to use standard column aliasing in the SELECT clause. Of course, aliasing the unpivot_for_clause is irrelevant because we have just defined this derived column name in the clause itself (in our case, "SALDESC").

general restrictions

The UNPIVOT syntax can be quite fiddly and there are some minor restrictions to how it can be used. The main restriction is that the columns in the unpivot_in_clause must all be of the same datatype. We will see this below by attempting to unpivot three columns of different datatypes from EMP. The unpivot query itself is meaningless: it is just a means to show the restriction, as follows.

It appears that using datatype conversions within the unpivot_in_clause is not even valid syntax and Oracle raises an exception accordingly. The workaround is, therefore, to convert the columns up-front, using an in-line view, subquery or a stored view. We will use subquery factoring, as follows.

The output has been reduced, but we can see the effect of unpivoting on the EMP data (i.e. we have 3 unpivot columns, 14 original rows and hence 42 output records).

Another restriction with UNPIVOT is that the columns we include in the unpivot_in_clause are not available to us to project outside of the pivot_clause itself. In the following example, we will try to project the DEPTNO column.

The points of interest are highlighted. First, we can see a new UNPIVOT step (ID=3). Second, we can see a filter predicate to remove all NULL values for DEPTSAL. This is a result of the default EXCLUDING NULLS clause. If we use the INCLUDING NULLS option, this filter is removed. Note that the GROUP BY PIVOT operation at ID=5 is generated by the pivot query that underlies the PIVOTED_DATA view.

We will extract some more detailed information about this execution plan by using DBMS_XPLAN's format options, as follows. In particular, we will examine the alias and projection details, to see if it provides any clues about Oracle's implementation of UNPIVOT.

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'UNPIVOT'
2 FOR
3 SELECT job
4 , saldesc
5 , deptsal
6 FROM pivoted_data
7 UNPIVOT (deptsal
8 FOR saldesc IN (d10_sal AS 'SAL TOTAL FOR 10',
9 d20_sal AS 'SAL TOTAL FOR 20',
10 d30_sal AS 'SAL TOTAL FOR 30',
11 d40_sal AS 'SAL TOTAL FOR 40'))
12 ORDER BY
13 job
14 , saldesc;

The projection of the unpivoted columns is highlighted between operations 1 and 3 above. This does not really provide any clues to how Oracle implements UNPIVOT. Note that a 10046 trace (SQL trace) provides no clues either, so has been omitted from this article.

The alias information is slightly more interesting, but still tells us little about UNPIVOT. It might be a red herring, but when Oracle transforms a simple query, the generated alias names for query blocks usually follow a pattern such as "SEL$1", "SEL$2" and so on. In our unpivot query, the aliases are as high as SEL$17, yet this is a relatively simple query with few components. This could suggest that a lot of query re-write is happening before optimisation, but we can't be certain from the details we have.

other uses for unpivot

Unpivot queries are not restricted to transposing previously pivoted data. We can pivot any set of columns from a table (within the datatype restriction described earlier). A good example is Tom Kyte's print_table procedure. This utility unpivots wide records to enable us to read the data down the page instead of across. The new UNPIVOT can be used for the same purpose. In the following example, we will write a static unpivot query similar to those that the print_table utility is used for.

Turning this into a dynamic SQL solution is simple and can be an exercise for the reader.

unpivot queries prior to 11g

To complete this article, we will include a couple of techniques for unpivot queries in versions prior to 11g and compare their performance. The first method uses a Cartesian Product with a generated dummy rowsource. This rowsource has the same number of rows as the number of columns we wish to unpivot. Using the same dataset as our UNPIVOT examples, we will demonstrate this below.

The resultset is the equivalent of using the new UNPIVOT with the INCLUDING NULLS option. The second technique we can use to unpivot data joins the pivoted dataset to a collection of the columns we wish to transpose. The following example uses a generic NUMBER_NTT nested table type to hold the pivoted department salary columns. We can use a numeric type because all the pivoted columns are of NUMBER. We will create the type as follows.

SQL> CREATE OR REPLACE TYPE number_ntt AS TABLE OF NUMBER;
2 /

Type created.

Using this collection type for the pivoted department salaries, we will now unpivot the data, as follows.

While we have unpivoted the department salaries, we have lost our descriptive labels for each of the values. There is no simple way with this technique to decode a row number (like we did in the Cartesian Product example). We can, however, change the collection type we use to include a descriptor. For this purpose, we will first create a generic object type to define a single row of numeric unpivot data, as follows.

We can see that the new 11g UNPIVOT syntax is easier to use than the pre-11g alternatives. We will also compare the performance of each of these techniques, using Autotrace, the wall-clock and our MILLION_ROWS test table. We will start with the new 11g syntax and unpivot the three numeric columns of our test table, as follows.

The 11g UNPIVOT method generated 3 million rows in under 10 seconds with only slightly more logical I/O than in our PIVOT tests. We will compare this with the Cartesian Product method, but using a rowsource technique that generates no additional I/O (instead of the ALL_OBJECTS view that we used previously).

The Cartesian Product method is considerably slower than the new 11g UNPIVOT syntax. It generates considerably more I/O and takes over twice as long (note that these results are repeatable across multiple re-runs). However, investigations with SQL trace indicate that this additional I/O is a result of direct path reads and writes to the temporary tablespace, to support a large buffer sort (i.e. the sort that accompanies a MERGE JOIN CARTESIAN operation). On most commercial systems, this buffer sort will probably be performed entirely in memory or the temporary tablespace access will be quicker. For a small system with slow disk access (such as the 11g database used for this article), it has a large impact on performance. We can tune this to a degree by forcing a nested loop join and/or avoiding the disk sort altogether, as follows.

We have significantly reduced the elapsed time and I/O for this method on this database, but have introduced one million tiny sorts. We can easily reverse the nested loops order or use the NO_USE_MERGE hint (which also reverses the NL order), but this doubles the I/O and adds 10% to the elapsed time.

This method is comparable in I/O to the new UNPIVOT operation but is approximately 35-40% slower. Further investigation using SQL trace suggests that this is due to additional CPU time spent in the collection iterator fetches. Therefore, the new UNPIVOT operation is both easier to code and quicker to run than its SQL alternatives.

further reading

For more information on the new PIVOT and UNPIVOT syntax, see the SQL Reference documentation. For some more examples of the use of pivot and unpivot queries, see the Data Warehousing Guide here and here.

source code

The source code for the examples in this article can be downloaded from here.