=?ISO-8859-2?Q?Miroslav_=A9ulc?= <[EMAIL PROTECTED]> writes:
> [ concerning a deeply nested LEFT JOIN to get data from a star schema ]

> So I have some results. I have tested the query on both PostgreSQL 8.0.1
> and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL
> result is 11,667.916 ms, MySQL result is 448.4 ms.
That's a fairly impressive discrepancy :-(, and even the slot_getattr()
patch that Atsushi Ogawa provided isn't going to close the gap.
(I got about a 4x speedup on Miroslav's example in my testing, which
leaves us still maybe 6x slower than MySQL.)
Looking at the post-patch profile for the test case, there is still
quite a lot of cycles going into tuple assembly and disassembly:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
24.47 4.49 4.49 _mcount
8.01 5.96 1.47 9143692 0.00 0.00 ExecEvalVar
6.92 7.23 1.27 6614373 0.00 0.00 slot_deformtuple
6.54 8.43 1.20 9143692 0.00 0.00 slot_getattr
6.21 9.57 1.14 103737 0.01 0.03 ExecTargetList
5.56 10.59 1.02 103775 0.01 0.01 DataFill
3.22 11.18 0.59 103775 0.01 0.01 ComputeDataSize
2.83 11.70 0.52 ExecEvalVar
2.72 12.20 0.50 9094122 0.00 0.00 memcpy
2.51 12.66 0.46 encore
2.40 13.10 0.44 427448 0.00 0.00 nocachegetattr
2.13 13.49 0.39 103775 0.00 0.02 heap_formtuple
2.07 13.87 0.38 noshlibs
1.20 14.09 0.22 225329 0.00 0.00 _doprnt
1.20 14.31 0.22 msquadloop
1.14 14.52 0.21 chunks
0.98 14.70 0.18 871885 0.00 0.00 AllocSetAlloc
0.98 14.88 0.18 $$dyncall
0.76 15.02 0.14 594242 0.00 0.00 FunctionCall3
0.71 15.15 0.13 213312 0.00 0.00 comparetup_heap
0.65 15.27 0.12 6364 0.02 0.13 printtup
0.60 15.38 0.11 790702 0.00 0.00 pfree
(_mcount is profiling overhead, ignore it.) It looks to me like just
about everything in the top dozen functions is there as a result of the
fact that join steps form new tuples that are the merge of their input
tuples. Even our favorite villains, palloc and pfree, are down in the
sub-percent range.
I am guessing that the reason MySQL wins on this is that they avoid
doing any data copying during a join step. I wonder whether we could
accomplish the same by taking Ogawa's patch to the next level: allow
a TupleTableSlot to contain either a "materialized" tuple as now,
or a "virtual" tuple that is simply an array of Datums and null flags.
(It's virtual in the sense that any pass-by-reference Datums would have
to be pointing to data at the next level down.) This would essentially
turn the formtuple and deformtuple operations into no-ops, and get rid
of a lot of the associated overhead such as ComputeDataSize and
DataFill. The only operations that would have to forcibly materialize
a tuple would be ones that need to keep the tuple till after they fetch
their next input tuple --- hashing and sorting are examples, but very
many plan node types don't ever need to do that.
I haven't worked out the details, but it seems likely that this could be
a relatively nonintrusive patch. The main thing that would be an issue
would be that direct reference to slot->val would become verboten (since
you could no longer be sure there was a materialized tuple there).
I think this would possibly affect some contrib stuff, which is a strong
hint that it'd break some existing user-written code out there.
Thoughts?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly