RE: Using PQ in FTS

One of the things that I am struggling with is how to determine the
following final statistics of the parallelized statement from the 10046
trace files: "cpu", "elapsed", "disk"

This is a 9.2.0.6 instance and when I look at the 10046 trace files of
the parallelized statement (QC and its Slave processes) and compare the
statistics with the 10046 trace file of the serialized statement, I see
the following:
-- The "rows" and "fetch" count statistics from QC's trace file match
exactly with the statistics obtained from the serialized execution of
the statement. So, these are the final statistics.
-- The "disk" statistic aggregated from trace files of "QC + P000 +
...P0007" came out to be quite less than what I see from the serialized
execution. For example, the aggregated "disk" statistic from all slave
(there were eight of them) and QC processes is 3,110,518 where as it is
5,860,777 for the serialized- statement. I was expecting that in an
optimal scenario where the serialized-statement was able to find/get
some percentage of data blocks from the buffer cache during FTS, the
aggregated disk reads from the parallelized-statement would be greater
than that of the serialized-statement; or in a worst case scenario where
the serialized-statement was not able to find any data block from the
buffer cache during FTS and it had to read all blocks from the disk, the
disk reads of the parallelized-statement would be very close to the disk
reads from the serialized-statement.

Jonathan has shed some light on the "query" statistic obtained from the
QC and P00n trace files:
"
PX only bypasses the cache for table scans and index fast full scans.
There may be indexed access components in you plan. However, even if you
do no indexed access, the blocks that have been read direct have to be
made read-consistent. 10g has a statistic to make it clear that this
happens: "consistent gets direct".
"

So, how do I answer the following from the 10046 trace files:
-- How much "CPU time" was spent by a query that was run in parallel
with "x" number of slaves?
-- What was the actual "elapsed time" of a query that was run in
parallel with "x" number of slaves?
-- If the "disk" statistic is the aggregated statistic obtained from the
QC and all slave processes then why is it much smaller than that
obtained from the serialized statement?