[ https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573867#action_12573867
]
A B commented on DERBY-2998:
----------------------------
Thanks for patch 19, Thomas. I think your approach to handling the virtual column nodes is
a good one (certainly better than the alternatives seen thus far). In the interest of getting
something into the codeline for others to play with--and to simplify future patches--per Dan's
comment a few days back, I'll try to commit patch 19 sometime today.
Once that is committed, perhaps you can post one or more follow-up patches to address the
following un-related comments, which I noticed while looking into the VCN issue. Of these
# 3 is probably the most important one...
1) Instead of using "row_number()" for the default name of a ROW_NUMBER() column, I think
it'd be better to use a normal generated column name, as is done for all other un-named expressions
in Derby (I believe?). So something like:
select i, j, i+j, row_number() over() from t1
should return four columns, "I", "J", "3", and "4", where the last two are generic generated
names used for unnamed expressions. With patch 19 such a query currently returns "I", "J",
"3", "row_number()", which deviates from the existing pattern.
2) Regarding the following logic in PredicateList.pushExpressionsIntoSelect():
ColumnReference ref = (ColumnReference)e.nextElement();
if (!ref.pointsToColumnReference() &&
!ref.pointsToWindowFunction())
{
state = false;
break;
}
I think it says "if the column reference points to something that is neither a) another column
reference NOR b) a window function, then do *not* push it down." So if the column reference
points to a window function, we we could still potentially push it down. Is that what we
want? For the record, I commented the call to "pointsToWindowFunction()" out completely and
all of the queries still ran the same. So is this check still necessary?
3) The following query returns one row, regardless of how many rows there are in T1:
select distinct row_number() over(), 'HMM' from t1
It appears that we will evaluate the entire SELECT result set, then apply the DISTINCT, and
finally, assign the row numbers. So if there are five rows in T1, we'll get five rows having
a single "HMM" column, then we'll apply the distinct to get a single "HMM" row, and that row
will have row number "1". But is that the correct behavior? Or is the row_number() supposed
to be included in the "DISTINCT" qualification, in which case the query would return 5 rows
because each row has a different row number and thus each is distinct from the others?
Just to see what would happen I ran the above query on DB2 v8 and it returned one row for
each row in T1. I took a look at the syntax rules for <window function> in SQL 2003
spec section 6.10 but could not, in my inexperience with reading specs, determine one way
or the other. That said, though, I think functions in a ResultColumnList are typically evaluated
once per row, *prior* to the DISTINCT being applied, so I would guess ROW_NUMBER() should
do the same?
4) Might be good to add some test cases for GROUP BY and HAVING clauses in the presence of
ROW_NUMBER(). I tried a few quick ones and they seem to be working fine, but it wouldn't
hurt to add some tests for posterity. These are the ones I tried, but use your imagination
:)
select r from (select i, row_number() over() as r, j from t1) x group by r;
select * from (select i, row_number() over() as r, j from t1) x group by i, j, r;
select * from (select i, row_number() over() as r, j from t1) x group by j, r, i;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by i desc;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by r desc;
select * from
(select i, row_number() over() as r, j from t1) x
group by j, r, i
having r > 2 and i >=3
order by i asc, r desc;
And yes I realize that grouping by a ROW_NUMBER() is kind of silly since row numbers are unique
across rows--but it's good make sure the behavior is correct :)
5) SQL spec indicates that window functions like ROW_NUMBER() can be used in an ORDER BY clause,
but that is not yet implemented (right?). Do you think it would be worth it to add a statement
saying as much to the documentation?
> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
> Key: DERBY-2998
> URL: https://issues.apache.org/jira/browse/DERBY-2998
> Project: Derby
> Issue Type: Sub-task
> Components: SQL
> Reporter: Thomas Nielsen
> Assignee: Thomas Nielsen
> Priority: Minor
> Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, d2998-12.diff, d2998-12.stat,
d2998-13.diff, d2998-13.stat, d2998-14.diff, d2998-14.stat, d2998-15.diff, d2998-15.stat,
d2998-16.diff, d2998-16.stat, d2998-17.diff, d2998-17.stat, d2998-18.diff, d2998-18.stat,
d2998-19.diff, d2998-19.stat, d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff,
d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat, d2998-9-derby.log, d2998-9.diff,
d2998-9.stat, d2998-doc-1.diff, d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff,
d2998-test2.stat, d2998-test3.diff, d2998-test3.stat, d2998-test4.diff, d2998-test4.stat,
d2998-test6.diff, d2998-test7.diff, d2998-test8.diff, d2998-test9.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL (DERBY-581), implement
the ROW_NUMBER() window function.
> More information about this feature is available at http://wiki.apache.org/db-derby/OLAPRowNumber
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.