Main menu

Tag Archives: Subquery Factor

Post navigation

In my article A Note on Running Sums and Products in SQL, I used three different SQL techniques to get running products: Analytic Functions, Model Clause and Recursive Subquery Factors. I explained this in a recording on Twitter. I then wondered whether I could explain each of these SQL techniques in general using a single Twitter recording (which has a time limit of 140 seconds) each, and you can see the results in this Twitter thread.

In this article I set out the example queries that I used along with the results. You can get the complete scripts and recordings on GitHub, Oracle SQL Projects.

It’s a common requirement to compute running sums over time in SQL; for example, to find sales volumes to date. This is easy to do using analytic functions, like this, schematically:

SELECT SUM(sales) OVER (PARTITION By partition_key ORDER BY date)
FROM ...

The ORDER BY clause implicitly adds a window range of UNBOUNDED PRECEDING, and if you omit it you get the overall totals by partition_key.

Recently I needed to compute the products of some numeric factors across records over time. This kind of requirement often arises in financial calculations, and is slightly more tricky since Oracle doesn’t have an analytic function for products as it does for sums. However, we can achieve the same functionality using the well known mathematical equivalence:

log(xy) = log(x) + log(y)

and therefore:

xy = exp(log(x) + log(y))

More generally, if we have a set of N records , and a function, f, defined on a record, we can write the running sum, for function f, and n in (1..N) as:

and the running product as:

Then from the above equivalence, we can write:

This means we can get the running products in SQL using the analytic SUM combined with the (natural) log (LN in Oracle SQL) and exp functions. Let’s see how it works using Oracle’s HR demo schema. We’ll take the employees table and use:

department_id as the partition key

employee_id as the dimension to order by

salary as the measure

To start with, let’s get the running and total sums, showing results for department_id = 60:

If we didn’t have this technique we could compute the results using explicit recursion, either by MODEL clause, or by recursive subquery factors. Let’s do it those ways out of interest. First here’s a MODEL clause solution:

In this article I show how the concept can also be useful in testing, and writing, SQL queries regardless of whether or not automated testing is in use. The idea is that queries often contain complex logic involving CASE, Nvl and other logical constructs, as well as retrieval of database data. If we could somehow separate out the pure logical part from the impure database accesses, we may be able to do more effective testing, since pure functions are inherently easier to test than impure ones. We will show this by means of a simple example against the Oracle HR demo schema.

Suppose we want to calculate an employee bonus using the following logic:

Use a 10% multiplier applied to one of two salaries…

…for department managers, use the departmental average salary; for others, use their own salary

For employees who have been previously employed, i.e. who have a job history record, add a further 10%

We see the bonus calculation in the select list with fields embedded from tables and a subquery. Setting up test data in multiple tables, and filtering out database noise can be a difficult task, so it would be nice if we could bypass that to test the calculation logic independently. If we are on version 12.1 or higher we can facilitate this by making the calculation into a WITH function, like this:

Now the declared function, which is ‘pure’, separates out the calculation logic from the impure parts of the query that reference database fields. We can now test this function by replacing the rest of the query with a test data generator designed to cover all scenarios.

In the presentation referenced above I discussed how to assess test coverage properly, in terms of behavioural, or scenario, coverage, rather than the popular but spurious ‘code coverage’ metrics. I explained the value of thinking in terms of domain and subdomain partitioning to maximise true test coverage. If the subdomains are orthogonal (or independent) we can test behaviour across their partitions in parallel. What about the current case? We can see that we have three subdomains, each having two partitions, and in fact they are interdependent (because they multiply together an error in one factor could neutralise an error in another): that means we need 2x2x2 = 8 test records. There is no need to vary the base salary, so we will use a bind variable:

The results can be checked manually, and there is probably little value in automating this beyond scripting.

Ok, but what if we are on a database version prior to 12.1, or for some reason we don’t want to use a WITH function? In that case, we can do something similar, but not quite as cleanly because we will need to modify the code under test slightly, to reference the test data subquery:

Conclusions
We have shown how extracting pure functionality from a query can help in making testing more rigorous and modular.

We have also shown how the WITH function feature, new in v12.1, can be used to extract pure functions from the main SQL and so enhance modularity and testability. This is a usage for the feature that is not commonly noted, the advantage usually cited being replacement of database functions to avoid context switches.