[ http://issues.apache.org/jira/browse/DERBY-805?page=all ]
A B updated DERBY-805:
----------------------
Attachment: d805_phase4_v2.patch
DERBY-805_v5.html
[ Attaching d805_phase4_v2.patch, which only has a 1-line diff from the v1 patch ]
After stepping away from the code for the weekend and then re-reading the document for Phase
4, I noticed a small (one-line) error in the first version of my Phase 4 changes.
In the document I correctly wrote the following regarding the need to revert access paths
for subtrees when an Optimizable's current permutation is found to be "not the best":
<begin_quote>
In order to fix this behavior, we have to add logic to save the "best plans" for an Optimizable's
subtree before each new permutation, and then if the current permutation isn't the best one,
we need to revert the entire subtree's plans back to what they were for the "best" permutation.
<end_quote>
But the code that I wrote didn't quite match this statement. In the following diff:
+ // If the final path that we considered for curOpt was _not_ the best
+ // path for this round, then we need to revert back to whatever the
+ // best plan for curOpt was this round. Note that the cost estimate
+ // for bestAccessPath could be null here if the last path that we
+ // checked was the only one possible for this round.
+ if (!retval &&
+ (curOpt.getBestAccessPath().getCostEstimate() != null) &&
+ (curOpt.getCurrentAccessPath().getCostEstimate() != null))
+ {
the check for "!retval" means that we will only revert the plans if we have exhausted all
permutations for the current Optimizable. But as I wrote in the document, we need to check
(and potentially revert) the subtree's plans after _each_ new permutation, not just after
the final one.
To show why this matters, assume we have some Optimizable with three possible permutations
P1, P2, and P3, of which P1 is the "best". With the code as I wrote it in d805_phase4_v1.patch,
we would end up doing the following:
-- Pick permutation P1.
-- Since P1 is the first permutation, there is no previous "best" path
so there's nothing to save.
-- Optimize P1 and get the cost; say it's 25.
-- Since P1 isn't our final permutation, we don't revert. So the plans
corresponding to P1 are still saved at the Optimizable and its subtree.
-- Pick permutation P2.
-- Before optimizing P2, save the best paths for the current Optimizable's
subtree; this means we'll save off the paths corresponding to P1.
-- Optimize P2 and get the cost; say it's 50.
-- Since P2 isn't our final permutation, we *don't* revert. So the plans
corresponding to P2 are still saved at the Optimizable and its subtree.
-- Pick permutation P3.
-- Before optimizing P3, save the best paths for the current Optimizable's
subtree; this means we'll save off the paths corresponding to *P2*.
(This is wrong.)
-- Optimize P3 and get the cost; say it's 75.
-- Since P3 is our final permutation, and since it is NOT the best permutation
that we found, we'll now revert the paths of this Optimizable's subtree.
That mean's we'll load the plans that we most recently saved and
generate the query plan based on those. But that will give us the
plans for _P2_, when what we wanted was the plans for _P1_.
So d805_phase4_v2.patch fixes this by removing the check for "!retval" in the above diff.
Then after we optimize P2 and get the cost, we'll see that it's not the best so we'll immediately
revert the paths back to what they were for P1. The P1 paths (instead of the P2 paths) will
then get saved off before optimization of P3 starts, and thus when we do the "revert" after
P3, we'll load the correct paths (P1).
As it turns out this particular change doesn't change any of the other Phase 4 behavior.
The reason is that an Optimizable that is not a FromBaseTable only (currently) has two permutations--nested
loop join and hash join--and thus the above scenario can't currently happen. FromBaseTable's
can have more than two permutations, but since they don't have subtrees beneath them this
whole save/revert logic is not required. Thus even though the v1 patch was technically incorrect,
everything ran as expected. For accuracy, though, I think this small error should be fixed,
so that's what the second version of the phase 4 patch (d805_phase4_v2.patch) does.
I've also added a _v5.html document that is almost identical to _v4 except for two things:
1) the small change that I just described, and 2) I've included a note with the info about
index statistics that Andrew posted to derby-dev (thanks to Bryan for asking the question
and to Andrew for providing the answer).
Just to be safe I re-ran derbyall on Red Hat Linux with ibm142 sane jars and saw no failures.
> Push join predicates into union and other set operations. DERBY-649 implemented scalar
(single table) predicate pushdown. Adding join predicate push down could improve performance
significantly.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-805
> URL: http://issues.apache.org/jira/browse/DERBY-805
> Project: Derby
> Type: Sub-task
> Components: SQL
> Versions: 10.1.2.0, 10.2.0.0
> Environment: generic
> Reporter: Satheesh Bandaram
> Assignee: A B
> Fix For: 10.2.0.0
> Attachments: DERBY-805.html, DERBY-805_v2.html, DERBY-805_v3.html, DERBY-805_v4.html,
DERBY-805_v5.html, d805_phase1_v1.patch, d805_phase1_v1.stat, d805_phase1_v2.patch, d805_phase1_v2.stat,
d805_phase1_v3.patch, d805_phase1_v3.stat, d805_phase2_v1.patch, d805_phase2_v1.stat, d805_phase3_v1.patch,
d805_phase3_v1.stat, d805_phase4_v1.patch, d805_phase4_v1.stat, d805_phase4_v2.patch, phase2_javadocFix.patch,
predPushdown_testFix.patch
>
> Fix for DERBY-649 implemented scalar (single table) predicate push down into UNIONs.
While this improves performance for one set of queries, ability to push join-predicates further
improves Derby performance by enabling use of indices where possible.
> For example,
> create view V1 as select i, j from T1 union all select i,j from T2;
> create view V2 as select a,b from T3 union all select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like
> select * from V1, V2 where V1.j = V2.b and V1.i =1;
> If the join order choosen is V1,V2, V1 can use index on V1.i (if present) following fix
for DERBY-649. But if there is a index on V2.b also, Derby currently can't use that index.
By pushing join predicate, Derby would be able to use the index and improve performance. Some
of the queries I have seen (not the one shown here...) could improve from 70-120 seconds to
about one second.
> Note there is a good comment by Jeff Lichtman about join-predicate push down. I am copying
parts of it here for completeness of this report: (Modified)
> If predicate push down is done during optimization, it would be possible to push joins
into the union as long as it's in the right place in the join order.
> For example:
> create view v as select * from t1 union all select * from t2;
> select * from v, t3 where v.c1 = t3.c2;
> In this select, if t3 is the outer table then the qualification could be pushed into
the union and optimized there, but if t3 is the inner table the qualification can't be pushed
into the union.
> If the pushing is done at preprocess time (i.e. before optimization) it is impossible
to know whether a join qualification like this can be safely pushed.
> There's a comment in UnionNode.optimizeIt() saying:
> /* RESOLVE - don't try to push predicated through for now */
> This is where I'd expect to see something for pushing predicates into the union during
optimization.
> BTW, the business of pushing and pulling predicates during optimization can be hard to
understand and debug, so maybe it's best to only handle the simple cases and do it during
preprocessing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira