Wednesday, April 24, 2013

Breaking a single fact table into multiple small fact tables with their corresponded measures could dramatically improve performance, all these small fact tables shares same conformed dimensions. This approach is applicable for both relational and DMR with compatible mode, and it does not request to change report specification.

Best suitable case: the fact table is very big table with many different measures; and each measure is associated with part of records in the table.

To demonstrate this idea, assume there is a table F with 100 million records with measure M1, M2, and M3, associated with dimensions D1, D2 and D3. The solution is to break F into three different fact tables F1, F2 and F3, each fact table associated with their own measures M1, M2 and M3, respectively. For the sake of simplicity, assume each fact table has only 3 million records for F1, 3 million records for F2, and 4 Million records for F3

F – 100 millions records with M1, M2 and M3

F1 – 30 millions records with M1

F1 – 30 millions records with M2

F1 – 40 millions records with M3

Current Model

Table structure

CREATE TABLE [dbo].[F](

[D1] [int] NOT NULL,

[D2] [int] NOT NULL,

[D3] [int] NOT NULL,

[M1] [int] NULL,

[M2] [int] NULL,

[M3] [int] NULL

)

GO

CREATE TABLE [dbo].[D3](

[Key] [int] NULL

)

GO

CREATE TABLE [dbo].[D2](

[Key] [int] NULL

)

GO

CREATE TABLE [dbo].[D1](

[Key] [int] NULL

)

Framework manager

The framework manager at database tier and business tier is designed as follows

Queries

Case 1 query data for M1, query is generated as follows

SELECT

F.D1 AS D1,

F.D2 AS D2,

F.D3 AS D3,

SUM(F.M1) AS M1

FROM

dbo.F F

GROUP BY

F.D1,

F.D2,

F.D3

Case 2 query data for M1 and M2, query is generated as follows

SELECT

F.D1 AS D1,

F.D2 AS D2,

F.D3 AS D3,

SUM(F.M1) AS M1,

SUM(F.M2) AS M2

FROM

dbo.F F

GROUP BY

F.D1,

F.D2,

F.D3

Case 3 query data for M1, M2 and M3, query is generated as follows

SELECT

F.D1 AS D1,

F.D2 AS D2,

F.D3 AS D3,

SUM(F.M1) AS M1,

SUM(F.M2) AS M2,

SUM(F.M3) AS M3

FROM

dbo.F F

GROUP BY

F.D1,

F.D2,

F.D3

Regardless of what measure(s) selected in report, database needs to scan whole 100 million records.

New Model

Table structure

CREATE TABLE [dbo].[F1](

[D1] [int] NOT NULL,

[D2] [int] NOT NULL,

[D3] [int] NOT NULL,

[M1] [int] NULL

)

Go

CREATE TABLE [dbo].[F2](

[D1] [int] NOT NULL,

[D2] [int] NOT NULL,

[D3] [int] NOT NULL,

[M2] [int] NULL

)

Go

CREATE TABLE [dbo].[F3](

[D1] [int] NOT NULL,

[D2] [int] NOT NULL,

[D3] [int] NOT NULL,

[M3] [int] NULL

)

GO

CREATE TABLE [dbo].[D3](

[Key] [int] NULL

)

GO

CREATE TABLE [dbo].[D2](

[Key] [int] NULL

)

GO

CREATE TABLE [dbo].[D1](

[Key] [int] NULL

)

Framework manager

The framework manager at database tier and business tier is designed as follows

Please note that only F will be exposed, while F1, F2 and F3 is hidden.

Queries

Case 1 query data for M1, query is generated as follows

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F1.M1) AS M1

FROM

dbo.D1 D1

INNER JOIN dbo.F1 F1

ON D1.Key = F1.D1

INNER JOIN dbo.D2 D2

ON D2.Key = F1.D2

INNER JOIN dbo.D3 D3

ON D3.Key = F1.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

Only F1 is used, meaning that only 30 million records will be scanned

Case 2 query data for M1 and M2, query is generated as follows

SELECT

COALESCE( FS1.D1, FS2.D1) AS D1,

COALESCE( FS1.D2, FS2.D2) AS D2,

COALESCE( FS1.D3, FS2.D3) AS D3,

FS1.M1 AS M1,

FS2.M2 AS M2

FROM

(

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F1.M1) AS M1

FROM

dbo.D1 D1

INNER JOIN dbo.F1 F1 ON D1.Key = F1.D1

INNER JOIN dbo.D2 D2 ON D2.Key = F1.D2

INNER JOIN dbo.D3 D3 ON D3.Key = F1.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

) FS1

FULL OUTER JOIN

(

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F2.M2) AS M2

FROM

dbo.D1 D1

INNER JOIN dbo.F2 F2 ON D1.Key = F2.D1

INNER JOIN dbo.D2 D2 ON D2.Key = F2.D2

INNER JOIN dbo.D3 D3 ON D3.Key = F2.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

) FS2

ON

FS1.D1 = FS2.D1 AND

FS1.D2 = FS2.D2 AND

FS1.D3 = FS2.D3

Only F1 and F2 are used, meaning that only 60 million records will be separately scanned

Case 3 query data for M1, M2 and M3, query is generated as follows

SELECT

COALESCE( FS1.D1, FS2.D1, FS3.D1) AS D1,

COALESCE( FS1.D2, FS2.D2, FS3.D2) AS D2,

COALESCE( FS1.D3, FS2.D3, FS3.D3) AS D3,

FS1.M1 AS M1,

FS2.M2 AS M2,

FS3.M3 AS M3

FROM

(

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F1.M1) AS M1

FROM

dbo.D1 D1

INNER JOIN dbo.F1 F1 ON D1.Key = F1.D1

INNER JOIN dbo.D2 D2 ON D2.Key = F1.D2

INNER JOIN dbo.D3 D3 ON D3.Key = F1.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

) FS1

FULL OUTER JOIN

(

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F2.M2) AS M2

FROM

dbo.D1 D1

INNER JOIN dbo.F2 F2 ON D1.Key = F2.D1

INNER JOIN dbo.D2 D2 ON D2.Key = F2.D2

INNER JOIN dbo.D3 D3 ON D3.Key = F2.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

) FS2

ON

FS1.D1 = FS2.D1 AND

FS1.D2 = FS2.D2 AND

FS1.D3 = FS2.D3

FULL OUTER JOIN

(

SELECT

D1.Key AS D1,

D2.Key AS D2,

D3.Key AS D3,

SUM(F3.M3) AS M3

FROM

dbo.D1 D1

INNER JOIN dbo.F3 F3 ON D1.Key = F3.D1

INNER JOIN dbo.D2 D2 ON D2.Key = F3.D2

INNER JOIN dbo.D3 D3 ON D3.Key = F3.D3

GROUP BY

D1.Key,

D2.Key,

D3.Key

) FS3

ON

COALESCE( FS2.D1, FS1.D1) = FS3.D1 AND

COALESCE( FS2.D2, FS1.D2) = FS3.D2 AND

COALESCE( FS2.D3, FS1.D3) = FS3.D3

Even three tables F1, F2 and F3 are scanned, the performance should be much better than scan the same table with 100 million records.

You can apply this approach for DMR with compatible mode as well, see Framework manager below, only F and three dimensions D1, D2 and D3 are exposed.

The major advantage of approach above is that there is no need to report specification. The changes occur only in database layer and business layer. This approach can be even more effective when fact is based on query with prompt.

2 comments:

Why would you have 3 fact tables if all your 3 measures share the same conformed dimensions?!Every row in your big fact will have non-zero values for ALL 3 measures (so you'd have to scan the same rowset for any calculation)This approach would have any merits only if the measures are completely independent and have non-conformed dimensions...

HI,I have 2 facts in my model with few dimension like year and etc.. i would take example of 1 dimension Year.Year Dimension has Key, same keys are mapped to fact1 and fact2. I have created 1:1 to 1:n joins between year-fact1 and year to fact2.Now i have created the Star Schema Grouping making the Q1 as Query Subject which has Shortcuts of Year and Fact1 table. And Q2 query Subject with shortcuts of Year and Fact2.Now out of Q1 query Subject i have created an hierarchy called Years. With Levels Year-to-Quarter-to-MonthThen i have created 2 Measure Dimension.M1 from Q1 which has Costs from fact1 table andM2 from Q2 which has Cost from fact2 table.i created a cross tab with Years at raw and Cost from M1 at column. i am Getting Costs for all years fine.But when i create a Drill Through on Cost, passing Year as data element in drill through parameters. Also in the target report i am slicing the Year level from same hierarchy. but the only difference is i am using Cost measure from M2 measure dimension now in target report.This gives me the errors. like to enable local processing at FM level whcih i dont want to do.So can i use the same hierarchy for slicing both measure dimension. or i have to create 2 separate hierarchy for each measure dimensions. If i have to create 2 separate hierarchy, what i will passing in drill through parameters and what can be my slicer expression in target report.and if the same hierarchy i can use for both M1 and M2, then what should i pass and what can be my slicer expression in target report so i dont receive the errors.Thank so much in advance.You can reach out to me on dmk.3678@gmail.com