Analytic Functions – II

In many of our posts, we have discussed different kinds of Analytic Functions and their usage to solve real world problems. However, we have seen at many client sites that many developers or DBAs still do not use many of these analytic functions in their code. These are very well documented by Microsoft (SQL Server) and Oracle. In this post, we will take a look at some of them. We will pick up this post from the first post that we had done about this topic – here.

Let’s take an example first which will distinguish an aggregate function from an analytic function since that is one of differences that need to be cleared up. So, let’s compare GROUP BY with the COUNT(*) using the PARTITION BY clause. Here are two examples (we are using SQL Server syntax in this post – the coverage of analytic functions in Oracle is much more robust than SQL Server and has been around since a very long time … we will cover the differences as well as the functions that are only available in Oracle in this series):

SELECT [TYPE], COUNT(*) AS CNT
FROM SYS.OBJECTS
GROUP BY [TYPE]
ORDER BY [TYPE]
GO

In this above SQL, it will give us the counts grouped by the type of the objects in the schema. Another thing to note is that if we have a column in the “Select” list, then it has to be in the GROUP BY clause as well. Here is the output from the schema that we are using:

As you can see from above, the analytic function was computed as part of the select and we have got the value repeated per object name of that type. The syntax as per BOL for ranking and window functions is:

In the above example, if we take away the partition section and just do: COUNT(*) OVER () – in that case, the aggregate function COUNT(*) will apply to the entire result set returned by the query i.e. a total count of the number of records since we are not asking it to partition the result set based on anything. Example:

SELECT [NAME], [TYPE], COUNT(*) OVER () AS CNT
FROM SYS.OBJECTS
ORDER BY [TYPE]
GO

So, bottom line is that if say we have 1000 total records returned by a query which can be partitioned into say 20 different groups based on the type, we can partition the data based on those “groups” and then apply the aggregate at the top of it and get all those results back in a single SQL statement. This allows us to do a lot of very complicated computations in very simple SQL statement(s). What we used as an aggregator at the top of those partitions in the above example was the COUNT(*). We can also use other aggregation functions like SUM(), AVG(), MIN(), MAX() etc. or we can make use of Ranking functions like RANK(), DESNSE_RANK(), NTILE(), ROW_NUMBER() etc.. We will look at those in a second. Another thing that we would like to point out is that as you can see from the BOL syntax above, the ranking functions also have the facility to use an ORDER BY clause in addition to the partition clause. So, what does that do? We have already seen that we can partition the result set into groups and then apply the Ranking/Aggregate function(s) at the top of them. The “ORDER BY” clause after the PARTITION clause helps us to order the data set in a particular way, example:

SELECT [NAME], [TYPE], RANK() OVER (PARTITION BY [TYPE] ORDER BY [OBJECT_ID]) AS RNK
FROM SYS.OBJECTS
GO

The rank above is the rank of the object within a particular partition. So, for the check constraints that are shown above, the total count was 23 i.e. the partition of check constraints had 23 records and the rank above is obtained for those 23 records by doing an ordering of those records within that partition. Another thing to note is that when you are using the ranking functions, you can also choose not to use a partition and just order the result:

What this has done is that it has assigned a dense rank number to all the records that have the same [TYPE] value.

And here is an example demonstrating all the windowing function usage:

SELECT
[NAME]
, [TYPE]
/*************************************
Ranking Windowing Functions
**************************************/
, RANK() OVER (PARTITION BY [TYPE] ORDER BY [OBJECT_ID]) AS RNK
, DENSE_RANK() OVER (ORDER BY [TYPE]) AS DENSE_RNK
, ROW_NUMBER() OVER (PARTITION BY [TYPE] ORDER BY [NAME]) AS ROWNUMBER
, NTILE(10) OVER (ORDER BY [TYPE]) AS NTILE_VAL
/*************************************
Aggregate Windowing Functions
— Using random columns just to illustrate usage
**************************************/
, SUM([SCHEMA_ID]) OVER (PARTITION BY [TYPE]) AS SUM_AGGR
, AVG([SCHEMA_ID]) OVER (PARTITION BY [TYPE]) AS AVG_AGGR
, COUNT(*) OVER (PARTITION BY [TYPE]) AS COUNT_AGGR
, MIN([OBJECT_ID]) OVER (PARTITION BY [TYPE]) AS MIN_AGGR
, MAX([OBJECT_ID]) OVER (PARTITION BY [TYPE]) AS MAX_AGGR
FROM SYS.OBJECTS
GO

We have used random columns (Object_ID or Schema_ID) above just to illustrate the usage of the ranking and aggregate windowing functions. BOL has some very good examples using Sales tables in the AdventureWorks database – here.

In the next post in this series, we will look into the rich functionality of analytics in Oracle. We will cover functions like LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE() and a couple of other very useful ones and will also look at ROW Type and RANGE Type Windows clauses.

In future releases of SQL Server, there will hopefully be a lot of additions in the space of Analytics – some of those features have been requested by many SQL Server MVPs at MSFT Connect – you can read more on it here. It also has a link to MVP Itzik’s article.

Oracle Guru Tom Kyte has written extensively about analytics in his books and his portal – you will benefit from going through those discussions – here.

Good information on SQL Standards and also about the differences between different RDBMS (includes analytics as well) – here.