One of the difficulties of querying a normalized database is in the
big number of tables you may need to join sequentially in one query.
You can easily have to chain 10 and more tables. This is difficult
for a user (he must know his tables really well), and it may have very
poor performance. Or even crash the database.

The common approach to resolve this problem is to try to restructure the
data. You denormalize your tables. You also restructure them into so-called
"star"-configuration to avoid long chains. This means that you create one big "facts"
table (the center of the star) surrounded by 10-15 "dimension"
tables. This way you avoid long chains. Then you basically
query one central "fact" table - and
narrow your scope by joining it with some "dimension"
tables. Your chain length =1. Sometimes you may add extra
2-nd layer (chain length=2 - details) - this is called the Snowflake configuration
- see images:

Star Configuration

Snow Flake Configuration

The Star Schema is also known as a "star-join schema", "data cube", and "multidimensional schema".

The main benefit of Star schema configuration is that it makes
easy for users to to make reports/queries,
especially implementing multi-dimensional views of data
with different granularity for different dimensions. The applications (reports) become simplier and easier to understand for the user.

Fact table - usually contains "facts"
of events involving dimensions. For example, a purchase may be considered
as a fact, which is characterized by many dimensions (time, store, product,
promotion, etc.). Thus a row in a fact table corresponding to one
purchase will have foreign keys to all corresponding dimension tables. The fact table stores the data at the lowest level of granularity, for example, for time dimansion the granularity may be - seconds. The other levels of granularity (hour, day, week, month,...) are stored in corresponding "dimension" table.

Dimensions - constitute the majority
of objects in your database. They are typically something by which you
would break your data into groups. For example, you could break your data
by time (min, hour, day, week, month, quater, year), by store (store, zip, city, state, country), by product (...), by promotion (...), etc. The multiple levels of granularity (for time - hour, day, week, month ,...) are stored in corresponding dimension table.

Details (for example 2nd layer
in the Snowflake) - some details associated with dimensions, separated
into separate tables. This make sense if these details not very important
(not required very frequently).

To optimize queries, it is necessary to create corresponding indexes. For example, for a report using data from 4 dimensions (d1,d2,d3,d4) you may use the following index:

create index i1 on MyFactTable (id_d1, id_d2, id_d3, id_d4)

Database servers may have special built-in optimizations to perform a query on a star schema. For example, in Oracle you can set STAR_TRANSFORMATION_ENABLED=TRUE and create bitmap indexes on keys in the fact table.

Fact table is usually very big (tens of millions of rows - and larger) and gets more and more new data on a daily basis. It make sense to partition it by some time interval, for example by a month or quater of the year. To do this you need to add a column to the fact table (for example, month) by which you will do the partitionning.

There are ways to bulk-load data in parallel - methods are different depending on the database brand (DB2, Oracle, ...). For example, for speed you can disable indexes and logging while doing the load - and then recreate the indexes after the load. Note - as rebuilding of an index for a big table takes a lot of time, there are ways to speed up this as well (partition the index, thus rebuilding index locally, use parallel calculations, turn off logging while rebuilding the index).

Data mart is usually a set of partiular data designed to meet some specific knowledge needs. It may be one set of tables in Star or Snowflake
configuration. But may be something else. A big database with thousands of tables can be represented (for analysis)
by tens of Data Marts.
Together these Data Marts constitute
a "Data Warehouse".

Note, that several different star systems can re-use the same dimension
tables.

Example
- 4 dimensions: stores, time periods, products, measures.
Thus data relationships can not be fit in a 2-dimentional table. It
is more like a multi-dimensional cube (hyper-cube).
You can slice this cube in different directions - and show a 2-dimensional
slice (or 3-dimensional picture).
Note, that some of cube dimensions correspond to Dimensions of the
data mart (something you would group by - stores, time periods, products),
whereas others may be calculated measures (total, average, etc.).
Each dimension usually has some hierarchy
- multiple levels of granularity (for
example for time: hour, day, week, month, etc.).

The OLAP existed for ~30 years, but the term OLAP (On-Line Analytical
Processing) was introduced only in 1993 by Dr. Codd et al. in paper "Providing
OLAP to User-Analysts: An IT Mandate". Yes, the same Codd who formulated
database principles in 70s. The work was sponsored by Arbor Software, the
creators of ESSBASE.

There are so many specialized software packages for data mining and
OLAP - more than enough to have your head spin. For a sample list go to
- www.knowledgestorm.com/
- and search for OLAP - you will get ~200s of software solutions.
Here are some (not properly sorted, and I am sure that some important ones
are missed - please let me know at - selector@pipeline.com) :