Data Warehouse

A"Data warehouse" is simply a "centralized place" where you can find all the data of your company. Data Warehouse software are very often simple database: Oracle, Teradata, SQL Server, MySQL,...

You usually use a "Data warehouse" in combination with a "OLAP tool" or a "reporting tool". Based on the content of the "Data warehouse", the reporting tool produces "standard reports" like:

Evolution of your sales over the years, month by month

Evolution of the number of Acquired customers, month by month

Evolution of the number of "lost customers", month by month

These classical questions (or requests) directly translates to
SQL queries that are executed by the Data Warehouse system. The OLAP/Reporting tools only displays in a fancy way, the results computed by the Data Warehouse. Computing those results usually take a large amount of computing-time (during which the Data Warehouse system might become irresponsive). So, if you hired good consultants to create your Data Warehouse, they normally created inside your database a "small, special table" that already stores all the answers to these "tactical" questions (so that there is no need to re-compute all time the same answers to your different "tactical" questions: you only need to consult the "special table" that already contains the answer). This "special table" is re-computed & updated every week or so...

Differences between tools

Basically, There now exists two major technologies when it comes to database:

Non-clustered databases(i.e. the old, standard approach).
These databases are running on one unique "big" server.Pro: Lower acquisition & maintenance price.Con: Limited storage (i.e. a few Terabyte at max.) and limited Computing power (i.e. complex sql queries
runs for a very long time and sometime do not execute at all).

Distributed (or clustered) databases
A clustered database is a database that is running simultaneously on many different servers connected together with high-speed data cables (e.g. infiny band).Pro: Potentiallly unimited storage and unlimited Computing power (i.e. If you need more computing power or
more storage-space, you simply add new servers inside the cluster).Con: Higher acquisition & maintenance price.

The differences between all these database systems are:

Speed, Scalability, Versatility, Cost of Ownership

All major database systems are equally able to store vast amounts of information. For Non-clustered databases, the storage space is limited to a few Terabytes. A complete analysis of the advantages of each database system is out of the scope of this simple document.

Some database systems are more focused on "data warehouse" usage than others. Typically, in a "data warehouse":

A large quantity of the submitted (SQL) queries are heavy "tactical queries" that require a large amount of computing power to execute (because they process a large amount of data).

Most of the time, the users are only performing queries. So, the databases that are optimized for "data warehouse purposes" have very fast queries. Since Distributed (or clustered) databases have more computing power, they are usually better suited for data warehousing purposes (because they can easily run the required tatical queries).

Although the differences between the different database tools are inexistant in terms of end-result, you can still observe large differences when it comes to: speed, cost of ownership, quality of the interface ("user-friendliness"), energy efficiency, etc.