Saturday, August 8, 2009

When I read the questions people ask about data modeling on OBIEE forums, one that seems to come up frequently is what to do when you have fact tables that do not join to all dimension tables.
This picture illustrates the problem. FactTable2 joins to two dimension tables, but FactTable1 only joins to only one.
The data for this example is very simple:
Here’s what typically happens when the naive metadata designer is finished. The first query results look good.
But when the query contains a column from the second dimension table, suddenly the data that was there before disappears, and the first reaction is that something is wrong.
From one perspective, these are correct result. The query asked for Fact1 aggregated by DimA and DimX. Since Fact1 does not join to Dim X, nulls are the correct answer.
Looking at the SQL generated gives you an idea of how OBIEE navigates this query. It determines that the query wants Fact1 aggregated by DimADesc and DimXDesc. The only fact table that can aggregate by those two dimensions is Fact2. Therefore, the SQL it generates uses FactTable2 in the FROM clause, not FactTable1, even though Fact1 does not map to any column in FactTable2. The BI Server is aware Fact1 does not map to FactTable2, so it returns Null (C3 in the outer query block below) as the value of Fact1.select distinct D1.C1 as C1, D1.C2 as C2, cast(null as doubleprecision) as C3 from (select distinct T1416.DIMADESC as C1, T1420.DIMXDESC as C2 from DIMX T1420, DIMA T1416, FACTTABLE2 T1429 where (T1416.DIMAKEY = T1429.DIMAFKEY and T1420.DIMXKEY = T1429.DIMXFKEY) ) D1 order by C1, C2
However, what if Fact1 is a level-based measure, always calculated at the Grand Total level for DimX? Then the BI Server knows it does not have to aggregate by the attribute values of DimX and returns these results.
In the SQL you can see that there is still a query block involving FactTable2. This query determines the values of DimADesc and DimXDesc that will be in the results. The query to FactTable1 aggregates Fact1 by DimADesc. The two result sets are then fully outerjoined – even null values, if they are returned, will be joined, Null to Null.

WITH SAWITH0 AS
(SELECT DISTINCT
T1420.DIMXDESC AS C1,
T1416.DIMADESC AS C2
FROM DIMX T1420,
DIMA T1416,
FACTTABLE2 T1429
WHERE (T1416.DIMAKEY = T1429.DIMAFKEY
AND T1420.DIMXKEY = T1429.DIMXFKEY)),
SAWITH1 AS
(SELECT
sum(T1424.FACT1) AS C1,
T1416.DIMADESC AS C2
FROM
DIMA T1416,
FACTTABLE1 T1424
WHERE (T1416.DIMAKEY = T1424.DIMAFKEY)
GROUP BY T1416.DIMADESC)
SELECT DISTINCT
CASE
WHEN SAWITH1.C2 IS NOT NULL THEN SAWITH1.C2
WHEN SAWITH0.C2 IS NOT NULL THEN SAWITH0.C2
END AS C1,
SAWITH0.C1 AS C2,
SAWITH1.C1 AS C3
FROM
SAWITH0
FULL OUTER JOIN SAWITH1
ON nvl(SAWITH0.C2,'q') = nvl(SAWITH1.C2,'q')
AND nvl(SAWITH0.C2,'z') = nvl(SAWITH1.C2,'z')
ORDER BY C1, C2

As the physical query shows, results will be determined by the foreign keys of DimX in FactTable2. If the rows where DimXFKey=2 are deleted, then the result rows where DimXDesc = Y will drop out.

In this Business Model, both fact table sources (for FactTable1 and FactTable2) have an aggregation content of Detail for both dimensions (Detail is the default level when all logical levels are null in the logical table source). However, FactTable1 will not suffice as a source since it does not physically join to DimX.

Changing the aggregation content for FactTable1 does not alter the results or the SQL generated. (Note here that when one dimension has a level that is specified and the other dimension(s) are unspecified, the meaning the BI Server ascribes to unspecified is Grand Total).

An alternate approach is to physically join DimX to FactTable1 using a complex join having the join condition 1=1. The aggregation content for both logical fact table sources can be set at Detail so that any query containing columns from DimX can use FactTable1 as a source. The measure, Fact1, no longer has to be set to Grand Total level for DimX. Since FactTable2 will not be involved in the query, the foreign key values in FactTable2 will not matter. Now the results are back to what we saw in the first query.

However, the SQL is quite different, and you can see why all the values of DimXDesc are returned.

The important points to remember are
that a physical fact table that does not join
to a dimension table can be made to join with
a complex join having the condition 1=1.
The logical fact table source that contains this
physical fact table can be set at the Detail level
for the dimension that is joined like this. When it is possible for the user to select more than one value from the dimension that has a 1=1 join, the facts should be set at the grand total for that dimension.