Introduction

Since their introduction in SQL Server 2005, window functions like ROW_NUMBER and RANK have proven to be extremely useful in solving a wide variety of common T-SQL problems. In an attempt to generalize such solutions, database designers often look to incorporate them into views to promote code encapsulation and reuse. Unfortunately, a limitation in the SQL Server query optimizer often means that views* containing window functions do not perform as well as expected. This post works through an illustrative example of the problem, details the reasons, and provides a number of workarounds.

*This problem can also occur in derived tables, common table expressions and in-line functions, but I see it most often with views because they are intentionally written to be more generic.

Window functions

Window functions are distinguished by the presence of an OVER() clause and come in three varieties:

Ranking window functions

ROW_NUMBER

RANK

DENSE_RANK

NTILE

Aggregate window functions

MIN, MAX, AVG, SUM

COUNT, COUNT_BIG

CHECKSUM_AGG

STDEV, STDEVP, VAR, VARP

Analytic window functions

LAG, LEAD

FIRST_VALUE, LAST_VALUE

PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST

The ranking and aggregate window functions were introduced in SQL Server 2005, and considerably extended in SQL Server 2012. The analytic window functions are new for SQL Server 2012.

All of the window functions listed above are susceptible to the optimizer limitation detailed in this article.

Example

Using the AdventureWorks sample database, the task at hand is to write a query that returns all product #878 transactions that occurred on the most recent date available. There are all sorts of ways to express this requirement in T-SQL, but we will choose to write a query that uses a windowing function. The first step is to find transaction records for product #878 and rank them in date order descending:

The results of the query are as expected, with six transactions occurring on the most recent date available. The execution plan contains a warning triangle, alerting us to a missing index:

As usual for missing index suggestions, we need to remember that the recommendation is not the result of a through analysis of the query – it is more of an indication that we need to think a bit about how this query accesses the data it needs.

The suggested index would certainly be more efficient than scanning the table completely, since it would allow an index seek to the particular product we are interested in. The index would also cover all the columns needed, but it would not avoid the sort (by TransactionDate descending). The ideal index for this query would allow a seek on ProductID, return the selected records in reverse TransactionDate order, and cover the other returned columns:

With that index in place, the execution plan is much more efficient. The clustered index scan has been replaced by a range seek, and an explicit sort is no longer necessary:

The final step for this query is to limit the results to just those rows that rank #1. We cannot filter directly in the WHERE clause of our query because window functions may only appear in the SELECT and ORDER BY clauses.

We can workaround this restriction using a derived table, common table expression, function, or view. On this occasion, we will use a common table expression (aka an in-line view):

The execution plan is the same as before, with an extra Filter to return only rows ranked #1:

The query returns the six equally ranked rows we expect:

Generalizing the query

It turns out that our query is very useful, so the decision is taken to generalize it and store the definition in a view. For this to work for any product, we need to do two things: return the ProductID from the view, and partition the ranking function by product:

Our expectation is that the execution plan for this new query will be exactly the same as before we created the view. The query optimizer should be able to push the filter specified in the WHERE clause down into the view, resulting in an index seek.

We need to stop and think a bit at this point, however. The query optimizer can only produce execution plans that are guaranteed to produce the same results as the logical query specification – is it safe to push our WHERE clause into the view?< The answer is yes, so long as the column we are filtering on appears in the PARTITION BY clause of the window function in the view. The reasoning is that eliminating complete groups (partitions) from the window function will not affect the ranking of rows returned by the query. The question is, does the SQL Server query optimizer know this? The answer depends on which version of SQL Server we are running.

SQL Server 2005 execution plan

A look at the Filter properties in this plan shows it applying two predicates:

The ProductID = 878 predicate has not been pushed down into the view, resulting in a plan that scans our index, ranking every row in the table before filtering for product #878 and rows ranked #1.

SQL Server 2008+ execution plan

This is the execution plan for the same query on SQL Server 2008 or later:

The ProductID predicate has been successfully pushed past the ranking operators, replacing the index scan with the efficient index seek.

The 2008 query optimizer includes a new simplification rule SelOnSeqPrj (select on sequence project) that is able to push safe outer-scope predicates past window functions. To produce the less efficient plan for this query in SQL Server 2008 or later, we have to temporarily disable this query optimizer feature:

Unfortunately, the SelOnSeqPrj simplification rule only works when the predicate performs a comparison with a constant. For that reason, the following query produces the sub-optimal plan on SQL Server 2008 and later:

The problem can still occur even where the predicate uses a constant value. SQL Server may decide to auto-parameterize trivial queries (one for which an obvious best plan exists). If auto-parameterization is successful, the optimizer sees a parameter instead of a constant, and the SelOnSeqPrj rule is not applied.

For queries where auto-parameterization is not attempted (or where it is determined to be unsafe), the optimization may still fail, if the database option for FORCED PARAMETERIZATION is on. Our test query (with the constant value 878) is not safe for auto-parameterization, but the forced parameterization setting overrides this, resulting in the inefficient plan:

Note: The pre-execution (‘estimated’) execution plan still shows an index scan because the value of the variable is not actually set yet. When the query is executed, however, the execution plan shows the desired index seek plan:

The SelOnSeqPrj rule does not exist in SQL Server 2005, so OPTION (RECOMPILE) cannot help there. In case you are wondering, the OPTION (RECOMPILE) workaround results in a seek even if the database option for forced parameterization is on.

All versions workaround #1

In some cases, it is possible to replace the problematic view, common table expression, or derived table with a parameterized in-line table-valued function:

This produces the desired index seek plan on all versions of SQL Server that support window functions. This workaround produces a seek even where the predicate references a parameter or local variable – OPTION (RECOMPILE) is not required.< The function body could of course be simplified to remove the now-redundant PARTITION BY clause, and to no longer return the ProductID column. I left the definition the same as the view it replaced to more clearly illustrate the cause of the execution plan differences.

All versions workaround #2

The second workaround only applies to ranking window functions that are filtered to return rows numbered or ranked #1 (using ROW_NUMBER, RANK, or DENSE_RANK). This is a very common usage however, so it is worth mentioning.

An additional benefit is that this workaround can produce plans that are even more efficient than the index seek plans seen previously. As a reminder, the previous best plan looked like this:

That execution plan ranks 1,918 rows even though it ultimately returns only 6. We can improve this execution plan by using the window function in an ORDER BY clause instead of ranking rows and then filtering for rank #1:

Both pre- and post-execution plans show an index seek without needing an OPTION (RECOMPILE) query hint. The following is a post-execution (‘actual’) plan:

If the view had used ROW_NUMBER instead of RANK, the replacement view would simply have omitted the WITH TIES clause on the TOP (1). The new view could also be written as a parameterized in-line table-valued function of course.

One could argue that the original index seek plan with the rnk = 1 predicate could also be optimized to only test 7 rows. After all, the optimizer should know that rankings are produced by the Sequence Project operator in strict ascending order, so execution could end as soon as a row with a rank greater than one is seen. The optimizer does not contain this logic today, however.

Final Thoughts

People are often disappointed by the performance of views that incorporate window functions. The reason can often be traced back to the optimizer limitation described in this post (or perhaps because the view designer did not appreciate that predicates applied to the view must appear in the PARTITION BY clause to be safely pushed down).

I do want to emphasise that this limitation does not just apply to views, and neither is it limited to ROW_NUMBER, RANK, and DENSE_RANK. You should be aware of this limitation when using any function with an OVER clause in a view, common table expression, derived table, or in-line table-valued function.

SQL Server 2005 users that encounter this issue are faced with the choice of rewriting the view as a parameterized in-line table-valued function, or using the APPLY technique (where applicable).

SQL Server 2008 users have the extra option of using an OPTION (RECOMPILE) query hint if the issue can be solved by allowing the optimizer to see a constant instead of a variable or parameter reference. Remember to check post-execution plans when using this hint though: the pre-execution plan cannot generally show the optimal plan.

I have noticed you regularly use the 'WITH SCHEMABINDING' option on your VIEWs. Do you have post or something already written for the pros and cons of this? None of my views/UDFs are schemabound now and i am comtemplating the value of it because it is the requirement for putting an index on a view. However, introducing something like this into a well established process and sizable schema seems scary. Thoughts?

There are important reasons to use schema binding with functions (see my recent posts on the Halloween Problem on this site for details). As far as views are concerned, this is partly habit, but I do find it useful when cleaning up test scripts – I can't drop the tables I create and leave the views hanging. More practically, I would often forget to run sp_refreshview or sp_refreshsqlmodule when changing an object that some view depends on, without which unexpected results could occur.