How Many Ways to Solve this SQL Problem?

6072011

July 6, 2011

Since there were so many unique solutions to the last blog article that posed a SQL challenge, I thought that I would try another blog article that asks a similar type of question. Assume that someone showed you the following output:

Assume that you know nothing other than the fact that the C2 values are listed in ascending order when sorted by column C1. How many different ways can this particular problem be solved. Yes, there is an easy way, but assume that you were trying to “help educate” the person who provided the requested output.

In the above, the row with the 0 in the D column was the hardest part of the solution. Why would I use UNION ALL and not UNION – what was not in the specification?

This blog article was inspired by an old question found in a Usenet group from 1998 – if you were answering the question in 1998, would your answer be any different? Be creative with your solution. While you are thinking about a solution, take a look at this old Usenet thread and consider how difficult it was to find the “50 highest paid workers” in the last century.

17 responses

Nice example, and very close to what I had in mind when I wrote “Yes, there is an easy way”:

SELECT
C2,
C2-NVL(LAG(C2,1) OVER (ORDER BY C2), C2) D
FROM
T2
ORDER BY
C2;

As can be seen by the above, I forgot that it was possible to supply a default value for the result of the LAG function on the first row.

Any other solutions? Possibly something that either uses different analytic functions, or no analytic functions at all (for those pre-Oracle 8i Enterprise Edition databases / pre-Oracle 9i Standard Edition databases?

I just realized that in the above example using the LAG analytic funtion, I *should* have ordered the rows by column C1 rather than C2 – the way that Stew ordered the rows was the *intended* ordering. The above SQL statement was an example of a quick note that I made while writing the article. In this case, Stew’s example has a couple of advantages over the LAG example that I had in mind.

That is a bit more efficient than what I posted in the blog article – nice example.

So far, we have:
* An example that demonstrates a query similar to what someone might write, if that person knows about UNION ALL, subqueries in the WHERE clause, and inline views, but not the syntax for an Oracle specific outer join (possibly the SQL in 10 Minutes book reader… that was me at one point, but I do not think that the book covers inline views, so that part of the SQL statement would need to be collapsed into the main query)
* A potentially efficient SQL statement using a self-join that works with both older and newer Oracle Database release versions
* A potentially (more) efficient SQL statement using the LAG analytic function to read the previous row’s value in a column that works with more recent Oracle Database release versions

Are there any other techniques for solving this problem? Ask yourself, “What if Oracle does not have this capability, what would I do different?” If we set efficiency aside for a moment, and assume that the following features were not supported (or a developer did not know that these were possibilities), how else could this problem be solved:
* Scalar subqueries (Gary’s example)
* LAG/LEAD analytic functions to peek at previous and next row values for a column (Stew’s example)
* Self join (Maxim’s example)

For instance, could the following be used as a starting point for a solution?:

It’s also possible to generate the result by unioning the table with itself, but generating row numbers for the two sets that are out of sequence with each-other by one. Then pivoting the results to get the values from each set as separate columns and then subtracting the column values:

Nice job finishing up the starting point that I left as a challenge. The fact that PRIOR can be used in the SELECT clause when using a CONNECT BY clause in the query probably is not well known (I occasionally forget the syntax).

Complicating your second example is the fact that the C1 values in the table have gaps in the sequence – this was intentional so that people could not create a simple self join (outer-join) that specified T1.C1 = T2.C1+1. What you posted is a very good work-around for the lack of gapless sequential values, and is a solution that I had not previously considered as a solution for this problem.

Here is a variation of Chris’ second solution. I use a cartesian join instead of union all (which should save a full scan) and I don’t really pivot since I can just make C2 negative when needed and use SUM.
select max(c2) c2, sum(decode(pass,1,decode(rn,1,0,C2),-C2)) D from
(select level pass from DUAL connect by level <= 2) a,
(select rownum RN, T2.* from T2)
group by RN+pass
having SUM(PASS) <> 2
order by max(C1);

I saw your comment and quietly fixed the SQL statement to match your description. The less than and greater than characters tend to get lost because those characters have special meaning in HTML. != is a good substitute for <> or you can use (without spaces):
& lt; (for <)
& gt; (for >)

I am wondering if this part of the SQL statement could cause problems if the data is not added to the table blocks in a completely sequential order possibly due to the amount of free space remaining in the table’s blocks when the new rows are added:

What you posted is an interesting approach to the problem – I believe that your query can be fixed by removing the ROWNUM from “(select rownum RN, T2.* from T2)”, sliding that into an inline view with an ORDER BY clause, and adding ROWNUM outside the inline view. I have NOT attempted this change in your SQL statement, but I suspect that it will work.

Thank you for supplying the link to the AskTom thread. I need to find some time to take a look at that thread.

Thank you, Charles. This is poetic justice: I have recently pointed out the lack of ORDER BY in other queries, and in my turn I am guilty of disorderly conduct 🙂 Here is a corrected (and perhaps correct) version:select max(C2) C2, SUM(DECODE(PASS,1,DECODE(RN,1,0,C2),-C2)) D from
( select level PASS from DUAL connect by level <= 2 ),
( select rownum RN, C1, C2 from ( select * from T2 order by C1 ) )
group by RN+PASS
having SUM(PASS) != 2
order by max(C1);

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:
<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: