WHERE mil.business_date = '2010-02-01' --(SEL cal_date FROM bidw_sales.d_date d WHERE d.cal_date BETWEEN '2010-02-01' AND '2010-02-28')GROUP BY 1,2,3,4

The explain plan is as follows:Explanation 1) First, we do a single-AMP JOIN step from a single partition of bidw_sales.mil by way of the primary index "bidw_sales.mil.Business_Date = DATE '2010-02-01'" with a residual condition of ("bidw_sales.mil.Business_Date = DATE '2010-02-01'"), which is joined to a single partition of bidw_sales.gc by way of the primary index "bidw_sales.gc.Business_Date = DATE '2010-02-01'" with a residual condition of ("bidw_sales.gc.Business_Date = DATE '2010-02-01'"). bidw_sales.mil and bidw_sales.gc are joined using a sliding-window merge join (contexts = 15, 1), with a join condition of ("(bidw_sales.gc.LocationId = bidw_sales.mil.LocationId) AND ((bidw_sales.gc.GuestcheckId = bidw_sales.mil.GuestCheckId) AND (bidw_sales.gc.Business_Date = bidw_sales.mil.Business_Date ))"). The input tables bidw_sales.mil and bidw_sales.gc will not be cached in memory. The result goes into Spool 3 (all_amps) (compressed columns allowed), which is built locally on that AMP. The size of Spool 3 is estimated with low confidence to be 2 rows (94 bytes). The estimated time for this step is 0.04 seconds. 2) Next, we do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan , grouping by field1 ( bidw_sales.mil.LocationId ,bidw_sales.mil.MenuItemMasterId ,bidw_sales.mil.Business_Date ,bidw_sales.gc.OrderTypeId). Aggregate Intermediate Results are computed locally, then placed in Spool 4. The size of Spool 4 is estimated with low confidence to be 2 rows (154 bytes). The estimated time for this step is 0.01 seconds. 3) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (122 bytes). The estimated time for this step is 0.01 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.06 seconds.

=====================================================Awesome right? Wrong, the query does not return that fast! What is wrong? How can the plan be so far off actual execution which is >15 minsddl below for tables invovled

Re: Optimizer Lies :)

Let's assume both tables are large (because they are partitioned):

A PI on a column with a small number of distinct values is rather stupid.Partitioning a single-column NUPI on business_date by business_date is ...A SET table with a large number of rows per value is ...

Who created that table???

And for that kind of mess i'd expect: stats are probably defined, but not properly re-collected.

Re: Optimizer Lies :)

Dieter - I could use your help so I'll forget you called me stupid :) The single column nupi was used because businessdate is the only column that is regularly used in a where block when accessing the table, and the skew value was still very low. stats are collected 2x a week for all our tables. What would you suggest? Would a multi column UPI be better even if it's never accessed in a query? The partitioning was placed there because that seems to get us the best response times? The tables are large, 500 million + for one, 100 million for the other. Is that large enough for partitioning? Again thanks for any advice you can offer

Random - that's us trying to see how fast the query goes for one day vs a range

Re: Optimizer Lies :)

For the table you created you don't need that partition, it's totally useless (just creates a larger table).

Partitioning is best on columns which are used a lot in WHERE, but a PI only on that column results in hundreds of thousands of rows per PI-value.Maybe the table is not skewed, but the number of rows per PI-value should be a low as possible, a few hundreds, maybe a few thousands (and then you usually switch to a MULTISET table avoiding dup row checks.

PI is best on columns which are used a lot in joins or accessed a lot with equality constraints.If Business_Date is already your partitioning then GuestcheckId or (GuestcheckId,Business_Date) will be a better choice as PI if the join is alwaysgc.guestcheckid = mil.guestcheckid ANDgc.locationid = mil.locationid ANDgc.business_date = mil.business_date

And then think about if you really need UNIQUE INDEX idx_gc1 ( GuestcheckId ); The PI on GuestcheckId will be non-unique (unless you include business_date), but uniqueness is usually cheched in your load job.

Btw, secondary indexes usually don't help in Teradata if you join a large amount of rows:INDEX idx_perf ( GuestCheckId ,LocationId ,Business_Date );