Popular White Paper On This Topic

Hi Robin,
I can think of several differences.
1. In database configuration, the block size parameters are larger in a data warehouse than in an OLTP database.
2. Data warehouses and data marts are typically designed in a dimensional, also known as star schema design with multiple dimension tables containing reference data, such as product, customer, time, geography, surrounding a fact table that contains the generally quantifiable data, e.g., dollars, hours, etc. and foreign keys that are the primary keys of the dimension tables. By contrast, an OLTP database application is typically designed in a 3rd normal database.
3. Indexing is one of the typical ways to improve performance. However, for a dimensional data mart or data warehouse fact table, indexing is not so effective, because the fact table is mostly comprised of foreign keys and a few data columns. A fact table has millions or billions of rows and an index on the fact table would be nearly as large as the fact table itself, providing no advantage. Instead, aggregate or summary fact tables have been proven to be far more effective in improving performance where the query or report requirement can be met with summary information. Indexes on large dimension tables can improve performance, because a dimension table has many attributes, frequently long text attributes and is typically does not have as many rows as a fact table.
I'm sure there are other differences. Perhaps other people can weigh in on what they are.
Bob Kaku

In a typical OLTP system you find many users, performing relatively small concurrent transactions - small in terms of data volume. These transaction may span many tables due to normalised data models. Generally data locks and transactions are essential for data consistency.
In a DWH loads are typically performed by daily or monthly batch loads. Data Models are far less normalised. Since loads are daily or even monthly, they may do a lot of inserts, updates (sometimes even deletes, though that is typically rare). Transactions for these loads are sized according to performance requirements, not consistency. Often an entity in a DWH is consistent (in terms of business definition of this entity) once the load is completed successfully, not once the data is committed.
In terms of of user interaction, that's mostly read operations, which may be quite large. In terms of tuning queries, many OLTP developers need to drop their preconceived notions of what is fast. Often in OLTP you optimise for first row performance to return something to the user as fast as possible - in a DWH you usually need to tune for throughput / last row. So this is something developers / trouble shooters definetely need to understand.
There's a number of features in modern DBMS particularly useful for DWH (not restricted to DWH, but not used very often outside DWH) which include Bitmap Index, Star Joins, Materialised Tables / Aggregate Awareness.

Hi Bobkaku and thanks for the insights !
Would you be kind enough to explain roughly the "aggregate/summary fact table" concept? Is it simply other physical fact table using same dimensions but in a different granularity level ? Loaded every month/year from the initial fact table ? And does it mean the model became a snowflake ?
Thanks in advance for your input and potentialy your example,
Julien.

An aggregate or summary fact table cannot have the exactly the same dimensions as the detail fact table. The nature of summarization demands that the detail information needs to be summarized by either removing one or more of the identifying dimensions or by summarizing according to a hierarchical summary value within a dimension. Example of the first case: Let say you have a detail sales fact table with product, customer, geography, and time as dimensions. If you find that you have many reports or queries that want sales data, but not necessarily at the detail that requires product ID, then you can create a summary fact table that has customer, geography, and time as dimensions, but no product dimension. For the second case, let say you have many reports and queries that want sales data, but at the product brand level. Brand could be defined as summary level within the product dimension. If the product dimension is very large, you may choose to create summary fact table with customer, geography, time, and a new summary brand dimension table. The second case is probably uncommon as most dimension tables are not that large and simple aggregation according to the brand attribute of the product dimension with indexing is adequate.
As far as loading is concerned, you could load the summary fact table with an ETL from the detail table, but that implies a sequence of having the detail table completely loaded before you begin the summary processes. Another option is to load the summary tables directly from the staging or ODS tables as ETL processes in parallel to the detail ETL processes.
This aggregate/summary table does not make the model a snowflake. A snowflake is just an attempt to normalize the denormalized model by hanging a subdimension table off a dimension table. There are some cases where it makes sense. For example, a Bill of Materials hierarchy that has unpredictable number of levels. In this case, the hierarchy cannot be flattened with the main dimension table.

Thanks a lot for all these explanations!
I would also add that such an aggregate/summarize table could be built as a cube directly, to avoid duplicating data in the DWH. But this would require a great build time / CPU ...
Best Regards.

Yes, you could definitely use a cube. Most modern report tools can access both the cube and relational store in the same query.
But I think you'd have to play with it to see which is the best solution. It really depends upon your reporting needs.

Dear Jaymz,
In terms of tuning strategies OLAP engines such as SQL Server Analysis Services and Hyperion use very different storage architectures. SSAS in the processing of cubes pre-calculates and stores aggregate and detail facts at all the intersections of dimensions at every level. For MOLAP (Memory OLAP) SSAS uses in memory proactive caching to return results orders of magnitude faster then an OLTP RDBMS. In addition dimensional models are much simpler to understand for typical business users reducing the spider web complexity of normalized OLTP databases to star schemas and faster to query. Typical queries against a 3rd normal form databases often involve many complex joins and sub-queries whereas star schemas have ?one hop? joins from the dimensions to the facts.