On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Currently, however, we only consider this possibility when the inner
>> rel is NOT a joinrel. It seems like it might be possible to change
>> this, but it doesn't look straightforward.
>
> Well, it's straightforward enough in theory, it's just the exponential
> growth in the number of join paths to consider that's a problem :-(.
> I think what we'd need is a heuristic to limit the paths considered.
[ picking this up again now that CommitFest is over ]
So that I don't have to keep referring to "this possibility" and
similar circumlocution, let's define an index-accelerated path (feel
free to suggest a different term) to be the generalization to joinrels
of what we now call a nested inner index-scan. In other words, they
can only be usefully used on the inner side of a nested loop, where
they'll be rescanned with different parameters for each outer row, and
they can only ever win when some part of the path involves a *partial*
index scan that can used one of the passed parameters as an index
qual.
For a fixed a set of parameters to be pushed down from the outer side,
it seems to me that we could build up a set of index-accelerated paths
for a joinrel {A B} by considering the combination of (1) an
index-accelerated path for A joined to an index-accelerated path for
B, or (2) an index-accelerated path for A joined to the cheapest total
path for B. I believe we don't need to worry about path keys because
this will only ever be used on the inner side of a nested loop, so the
pathkeys will be ignored anyway. In other words, the first heuristic
is that you can't build up an index-accelerated path for the joinrel
unless there is an index-accelerated path for a least one of its
baserels.
That still leaves a lot of silly paths, though. In many cases, if
you're thinking about joining A to {B C} using an index-accelerated
path, you'd be just as well off joining to B first and then to C. So
it might be that we only need to consider index-accelerated paths when
there is no legal join between the LHS and a subset of the RHS. But
I'm not completely sure that I'm right about this, nor whether it's
easy to check for.
The other problem I see here is that the bottom-up approach that we
use in general is going to be difficult to apply here, because the set
of paths will vary depending on what parameters are pushed down from
the outer side.
> I think Andrew was suggesting that we not attempt to consider this
> automatically, but only when the user writes the query in a way that
> exposes it directly via LATERAL. While that goes against my normal
> instincts for the planner, it isn't unreasonable as a first step.
Possibly, but I'm not sure we have a good enough design sketch at this
point to even know whether such a restriction would be useful.
Another thought, only semi-related. One of the big use cases for
LATERAL in general is to use a set-returning function in the FROM
clause that uses vars from a preceding FROM item. I am idly wondering
if there's a reason why ExecProject is not its own node type. It
seems that we have close to the same logic scattered through a whole
bunch of different node types...
...Robert