Here is the new patch.
Our experiments show no noticeable performance issue when using the
patch for cases where the optimization is not used because the number
of extra statements executed when the optimization is disabled is
insignificant.
We have updated the patch to remove a couple of if statements, but
this is really minor. The biggest change was to MultiExecHash that
avoids an if check per tuple by duplicating the hashing loop.
To demonstrate the differences, here is an analysis of the code
changes and their impact.
Three cases:
1) One batch hash join - Optimization is disabled. Extra statements
executed are:
- One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
- One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
- One if optimization_on in MultiExecHash per probe tuple (line 431
of nodeHashjoin.c)
- One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)
2) Multi-batch hash join with limited skew - Optimization is disabled.
Extra statements executed are:
- One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
- Executes ExecHashJoinDetectSkew method (at line 357 of
nodeHashjoin.c) that reads stats tuple for probe relation attribute
and determines if skew is above cut-off. In this case, skew is not
above cutoff and no extra memory is used.
- One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
- One if optimization_on in MultiExecHash per probe tuple (line 431
of nodeHashjoin.c)
- One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)
3) Multi-batch hash join with skew - Optimization is enabled. Extra
statements executed are:
- One if (hashtable->nbatch > 1) in ExecHashJoin (line 356 of nodeHashjoin.c)
- Executes ExecHashJoinDetectSkew method (at line 357 of
nodeHashjoin.c) that reads stats tuple for probe relation attribute
and determines there is skew. Allocates space for XXX which is 2% of
work_mem.
- One if optimization_on in MultiExecHash (line 259 of nodeHash.c)
- In MultiExecHash after each tuple is hashed determines if its join
attribute value matches one of the MCVs. If it does, it is put in the
MCV structure. Cost is the hash and search for each build tuple.
- If all IM buckets end up frozen in the build phase (MultiExecHash)
because they grow larger than the memory allowed for IM buckets then
skew optimization is turned off and the probe phase reverts to Case 2
- For each probe tuple, determines if its value is a MCV by
performing hash and quick table lookup. If yes, probes MCV bucket
otherwise does regular hash algorithm as usual.
- One if statement in ExecScanHashBucket per probe tuple (line 1071
of nodeHash.c)
- Additional cost is determining if a tuple is a common tuple (both
on build and probe side). This additional cost is dramatically
outweighed by avoiding disk I/Os (even if they never hit the disk due
to caching).
The if statement on line 440 of nodeHashjoin.c (in ExecHashJoin) has
been rearranged so that in the single batch case short circuit
evaluation requires only the first test in the IF to be checked.
The "limited skew" check mentioned in Case 2 above is a simple check
in the ExecHashJoinDetectSkew function.
- Bryce Cutt
On Thu, Feb 26, 2009 at 12:16 PM, Bryce Cutt <pandasuit(at)gmail(dot)com> wrote:
> The patch originally modified the cost function but I removed that
> part before we submitted it to be a bit conservative about our
> proposed changes. I didn't like that for large plans the statistics
> were retrieved and calculated many times when finding the optimal
> query plan.
>
> The overhead of the algorithm when the skew optimization is not used
> ends up being roughly a function call and an if statement per tuple.
> It would be easy to remove the function call per tuple. Dr. Lawrence
> has come up with some changes so that when the optimization is turned
> off, the function call does not happen at all and instead of the if
> statement happening per tuple it is run just once per join. We have
> to test this a bit more but it should further reduce the overhead.
>
> Hopefully we will have the new patch ready to go this weekend.
>
> - Bryce Cutt
>
>
> On Thu, Feb 26, 2009 at 7:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Heikki's got a point here: the planner is aware that hashjoin doesn't
>> like skewed distributions, and it assigns extra cost accordingly if it
>> can determine that the join key is skewed. (See the "bucketsize" stuff
>> in cost_hashjoin.) If this patch is accepted we'll want to tweak that
>> code.
>>
>> Still, that has little to do with the current gating issue, which is
>> whether we've convinced ourselves that the patch doesn't cause a
>> performance decrease for cases in which it's unable to help.
>>
>> regards, tom lane
>>
>