15 Implementation costs The implementation of a Data Warehouse with Data Marts is the most expensive solution, because it is necessary to build the system including connections between Data Warehouse and its Data Marts. It is also necessary to build a second ETL which manages the preparation of data for the Data Marts. In case of implementing Data Marts or a Data Warehouse only, the ETL is only implemented once. The costs may be almost the same in building one of these systems. The Data Marts only require a little more hardware and network connections to the data sources. But due to the fact, that building the ETL is the most expensive part, these costs may be relative low. The virtual Data Warehouse may have the lowest implementation costs, because e.g. existing applications and infrastructure is used. Administration costs The Data Warehouse only solution offers the best effort in minimizing the administration costs, due to the centralized design of the system. In this solution it is only necessary to manage a central system. Normally the client management is no problem, if using web technology or a centralized client deployment, which should be a standard in all mid-size to big enterprises. A central Backup can cover the whole data of the Data Warehouse. The solution with Data Marts only are more expensive, because of its decentralized design. There are higher costs in cases of product updates or maintaining the online connections, you also have to backup each Data Mart for itself, depending on his physical location. Also the process of filling a single Data Mart is critical. Errors during update may cause loss of data. In case of an error during an update, the system administration must react at once. Data Marts with a central Data Warehouse are more efficient, because all necessary data is stored in a single place. When an error during an update of a Data Mart occurs, this is normally no problem, because the data is not lost and can be recovered directly from the Data Warehouse. It may also be possible to recover a whole Data Mart out of the Data Warehouse. Virtual Data Warehouses administration costs depend on the quality of the implementation. Problems with connections to the online data sources may cause user to ask for support, even if the problem was caused by a broken online connection or a failure in the online data source. End-users may not be able to realize whether the data source or the application on their computer cause a problem. Average data age The virtual Data Warehouse represents the most actual data, because the application directly connects to the data sources and fetches its information online. The retrieved information is always up to date. Information provided by Data Mart only or Data Warehouse only solutions are collected to specific time. Generally each day by night. These times can vary from hourly to monthly or even longer. The selected period depends on the cost of the process retrieving and checking the information. A solution with one central Data Warehouse and additional Data Marts houses less actual data then Data Warehouse only. The data of the Data Warehouse must be converted and copied to the Data Marts, which is time consuming. Performance A virtual Data Warehouse has the poorest performance all over. All data is retrieved during runtime directly from the data sources. Before data can be used, it must be converted for presentation. Therefore a huge amount of time is spent by retrieval and converting of data. The Data Marts host information, which are already optimized for the client applications. All data s stored in an optimal state in the database. Special indexes in the databases speed up information retrieval. Page 15 of 56 Pages

16 Implementation Time The implementation of a Data Warehouse with its Data Marts takes the longest time, because complex networks and transformations must be created. Creating Data Warehouse only or Data Marts only should take almost the same amount of time. Most time is normally spent on creating the ETL (about 80%), so the differences between Data Warehouse only and Data Marts only should not differ much. Implementing a Virtual Data Warehouse can be done very fast because of its simple structure. It is not necessary to build a central database with all connectors. Data Consistency When using Data Warehouse or Data Mart technology a maximum consistency of data is achieved. All provided information is checked for validity and consistency. A virtual Data Warehouse may have problems with data consistency because all data is retrieved at runtime. When data organization on sources changes, the consistency of new data may be consistent, but older data may not be represented in its current model. Flexibility The highest flexibility has a virtual data warehouse. It is possible to change the data preparation process very easy because only the clients are directly involved. There are nearly no components, which depend on each other. In Data Warehouse only solution flexibility is poor, because there may exist different types of clients that depend on the data model of the Data Warehouse. If it would be necessary to change a particular part of the data model intensive testing for compatibility with existing applications must be done, or even the client applications have to be updated. A solution with Data Marts, with or without a central Data Warehouse has medium flexibility due that client applications normally uses Data Marts as their point of information. In case of a change in the central Data Warehouse or the data sources, it is only necessary to update the process of filling the Data Marts. In case of change in the Data Marts only the depending client applications are involved and not all client applications. Data Consistency Data consistency is poor in a virtual Data Warehouse. But it also depends on the quality of the process, which gathers information from the sources. Data Warehouses and Data Marts have very good data consistency because the information stored in their databases have been checked during the ETL process. Quality of information The quality of information hardly depends on the quality of the data population process (ETL process) and how good the information are processed and filtered before stored in the Data Warehouse or presented to a user. Therefore it is not possible to give a concrete statement. History A virtual Data Warehouse has no history at all, because the values or information are retrieved at runtime. In this architecture it is not possible to store a history because no central database is present. The other architectures provide a central point to store this information. The history provides a basis for analyzing business process and their efforts, because it is possible to compare actual information with information of the past. Exercises (+Solutions) to BA Lesson DWH&DM Chapter 3 Page 16 of 56 Pages

44 Ad 2.: create view Flights_to_Munich as select f.fno as Flight-Number, a.name as Departure-Airport, f.time as Departure-Time from flight f, airport a where f.to='muc' and a.fid=f.from Ad3.: insert into flight values (471, BER, HAN, 17:30 ) Ad4.: update flight set time = 10:35 where Fno=181 Ad5 (optional): select name as Departure_Airport, count (*) as Departure_Count from airport, flight where fid=from group by name union select name as Departure_Airport, 0 as Departure_Count from airport where not exists (select * from flight where from=fid) order by departure_count Delivers the following result: ********************************************************************** db2 => select name as Departure_ Airport, count (*) as Departure_Count from airpo rt, flight where fid=from group by name union select name as Departure_Airpo rt, 0 as Departure_Count from airport where not exists (select * from flight whe re from=fid) order by departure_count DEPARTURE_AIRPORT DEPARTURE_COUNT Berlin 0 Frankfurt 0 Hannover 1 Mannheim 1 Stuttgart 1 Muenchen 2 6 record(s) selected. ************************************************************************** Here is also a second solution (which is shorter) and gives the same results as above by Stefan Seufert: SELECT Name as Departure_ Airport, count (Flight.From) as Departure_Count Page 44 of 56 Pages

45 FROM Airport LEFT OUTER JOIN Flight ON Airport.FID = Flight.From GROUP BY Name ORDER BY Departure_Count The idea is, that count(field) in contradiction to count(*) only count the fields which are not NULL. Since the attribute in the count function is from the flight table, only the flights which have departures are counted, all other get the 0 value. Exercise 4.2: Consider the following Star Schema: Product Time Prod_id Brand Subcategory Category Department... Store Store_id Name Store_No Store_Street Store_City Sales_Fact Prod_id Time_id Promo_id Store_id Dollar_Sales Unit_Sales Dollar_Cost Cust_Count Promo_id Time_id Fiscal_Period Quarter Month Year... Promotion Promo_Name Price_Reduct. Build the SQL, such that the result is the following report, where time condition is the Fiscal_Period = 4Q95 : Brand Axon Framis Widget Zapper Dollar Sales Unit Sales Page 45 of 56 Pages

46 Solution with Standard SQL(for example with DB2): SELECT p.brand AS Brand, Sum(s.dollar_sales) AS Dollar_Sales, Sum(s.unit_sales) AS Unit_Sales FROM sales_fact s, product p, time t WHERE p.product_key = s.product_key AND s.time_key = t.time_key AND t.fiscal_period="4q95" GROUP BY p.brand ORDER BY p.brand By using the SQL Wizard (Design View) in the database Microsoft Access, we see the following Access SQL : SELECT Product.brand AS Brand, Sum([Sales Fact].dollar_sales) AS Dollar_Sales,Sum([Sales Fact].unit_sales) AS Unit_Sales FROM ([Sales Fact] INNER JOIN [Time] ON [Sales Fact].time_key = Time.time_key) INNER JOIN Product ON [Sales Fact].product_key = Product.product_key WHERE (((Time.fiscal_period)="4Q95")) GROUP BY Product.brand ORDER BY Product.brand; Exercises (+Solutions) to BA Lesson DWH&DM Chapter 5 Exercise 5.1: Compare ER Modelling (ER) with multidimensional data models (MDDM), like STAR or SNOWFLAKE schemas (see appendix page): Compare in IBM Reedbook Data Modeling Techniques for DWH (see DWH lesson homepage) Chapter6.3 for ER modeling and Chapter 6.4 for MDDM Build a list of advantages and disadvantages for each of these two concepts, in the form of a table: ER Model Criteria1 ++ Crit.2 + Crit.3 - Crit.4 -- MDDM Model Criteria5 ++ Crit.6 + Crit.7 - Crit.8 -- SOLUTION: Entity-relationship An entity-relationship logical design is data-centric in nature. In other words, the database design reflects the nature of the data to be stored in the database, as opposed to reflecting the anticipated usage of that data. Because an entity-relationship design is not usage-specific, it can be used for a variety of application types: OLTP and batch, as well as business intelligence. This same usage Page 46 of 56 Pages

47 flexibility makes an entity-relationship design appropriate for a data warehouse that must support a wide range of query types and business objectives. MDDM Model: Compare as examples the Star - and Snowflake schemas, which are explained in the next solution (5.2) Exercise 5.2: Compare MDDM Model schemas STAR and SNOWFLAKE Compare in IBM Reedbook Data Modeling Techniques for DWH (see DWH lesson homepage) Chapter Build a list of advantages and disadvantages for each of these two concepts, in the form of a table (compare exercise 5.1): First Solution: Star schema The star schema logical design, unlike the entity-relationship model, is specifically geared towards decision support applications. The design is intended to provide very efficient access to information in support of a predefined set of business requirements. A star schema is generally not suitable for general-purpose query applications. A star schema consists of a central fact table surrounded by dimension tables, and is frequently referred to as a multidimensional model. Although the original concept was to have up to five dimensions as a star has five points, many stars today have more than five dimensions. The information in the star usually meets the following guidelines: A fact table contains numerical elements A dimension table contains textual elements The primary key of each dimension table is a foreign key of the fact table A column in one dimension table should not appear in any other dimension table Snowflake schema The snowflake model is a further normalized version of the star schema. When a dimension table contains data that is not always necessary for queries, too much data may be picked up each time a dimension table is accessed. To eliminate access to this data, it is kept in a separate table off the dimension, thereby making the star resemble a snowflake. The key advantage of a snowflake design is improved query performance. This is achieved because less data is retrieved and joins involve smaller, normalized tables rather than larger, de-normalized tables. The snowflake schema also increases flexibility because of normalization, and can possibly lower the granularity of the dimensions. The disadvantage of a snowflake design is that it increases both the number of tables a user must deal with and the complexities of some queries. For this reason, many experts suggest refraining from using the snowflake schema. Having entity attributes in multiple tables, the same amount of information is available whether a single table or multiple tables are used. Expert Meaning (from DM Review): First, let's describe them. A star schema is a dimensional structure in which a single fact is surrounded by a single circle of dimensions; any dimension that is multileveled is flattened out into a single dimension. The star schema is designed for direct support of queries that have an inherent dimension-fact structure. Page 47 of 56 Pages

48 A snowflake is also a structure in which a single fact is surrounded by a single circle of dimensions; however, in any dimension that is multileveled, at least one dimension structure is kept separate. The snowflake schema is designed for flexible querying across more complex dimension relationships. The snowflake schema is suitable for many-to-many and one-to-many relationships among related dimension levels. However, and this is significant, the snowflake schema is required for many-to-many fact-dimension relationships. A good example is customer and policy in insurance. A customer can have many policies and a policy can cover many customers. The primary justification for using the star is performance and understandability. The simplicity of the star has been one of its attractions. While the star is generally considered to be the better performing structure, that is not always the case. In general, one should select a star as first choice where feasible. However, there are some conspicuous exceptions. The remainder of this response will address these situations. First, some technologies such a MicroStrategy require a snowflake and others like Cognos require the star. This is significant. Second, some queries naturally lend themselves to a breakdown into fact and dimension. Not all do. Where they do, a star is generally a better choice. Third, there are some business requirements that just cannot be represented in a star. The relationship between customer and account in banking, and customer and policy in Insurance, cannot be represented in a pure star because the relationship across these is many-to-many. You really do not have any reasonable choice but to use a snowflake solution. There are many other examples of this. The world is not a star and cannot be force fit into it. Fourth, a snowflake should be used wherever you need greater flexibility in the interrelationship across dimension levels and components. The main advantage of a snowflake is greater flexibility in the data. Fifth, let us take the typical example of Order data in the DW. Dimensional designer would not bat an eyelash in collapsing the Order Header into the Order Item. However, consider this. Say there are 25 attributes common to the Order and that belong to the Order Header. You sell consumer products. A typical delivery can average 50 products. So you have 25 attributes with a ratio of 1:50. In this case, it would be grossly cumbersome to collapse the header data into the Line Item data as in a star. In a huge fact table you would be introducing a lot of redundancy more than say 2 billion rows in a fact table. By the way, the Walmart model, which is one of the most famous of all time, does not collapse Order Header into Order Item. However, if you are a video store, with few attribute describing the transaction, and an average ratio of 1:2, it would be best to collapse the two. Sixth, take the example of changing dimensions. Say your dimension, Employee, consists of some data that does not change (or if it does you do not care, i.e., Type 1) and some data that does change (Type 2). Say also that there are some important relationships to the employee data that does not change (always getting its current value only), and not to the changeable data. The dimensional modeler would always collapse the two creating a Slowly Changing Dimension, Type 2. This means that the Type 1 is absorbed into the Type 2. In some cases I have worked on, it has caused more trouble than it was worth to collapse in this way. It was far better to split the dimension into Employee (type 1) and Employee History (type 2). Thereby, in such more complex history situations, a snowflake can be better. Page 48 of 56 Pages

49 Seventh, whether the star schema is more understandable than the snowflake is entire subjective. I have personally worked on several data warehouse where the user community complained that in the star, because everything was flattened out, they could not understand the hierarchy of the dimensions. This was particularly the case when the dimension had many columns. Finally, it would be nice to quit the theorizing and run some tests. So I did. I took a data model with a wide customer dimension and ran it as a star and as a snowflake. The customer dimension had many attributes. We used about 150MM rows. I split the customer dimension into three tables, related 1:1:1. The result was that the snowflake performed faster. Why? Because with the wide dimension, the DBMS could fit fewer rows into a page. DBMSs read by pre-fetching data and with the wide rows it could pre-fetch less each time than with the skinnier rows. If you do this make sure you split the table based on data usage. Put data into each piece of the 1:1:1 that is used together. What is the point of all this? I think it is unwise to pre-determine what the best solution is. A number of important factors come into play and these need to be considered. I have worked to provide some of that thought-process in this response. Second Solution: Page 49 of 56 Pages

50 Exercise5.3: An enterprise wants to build up an ordering system. The following objects should be administered by the new ordering system. Supplier with attributes: name, postal-code, city, street, post office box, telephone-no. Article with: description, measures, weight Order with: order date, delivery date Customer with: name, first name, postal-code, city, street, telephone-no Conditions: Each article can be delivered by one or more suppliers. Each supplier delivers 1 to 10 articles. An order consists of 2 to 10 articles. Each article can only be one time on an order form. But you can order more than on piece of an article. Each order is done by a customer. Customer can have more than one order (no limit). Good customers will get a rabatt. The number of articles in the store should also be saved. It not important who is the supplier of the article. For each object we need a technical key for identification. Task: Create an ER model. Model the necessary objects and the relations between them. Define the attributes and the keys. Use the following notation: Entity Attribute Relation Page 50 of 56 Pages

52 write a SQL script, s.t. you get the following target table: Page 52 of 56 Pages

53 We see the following conditions: 1. Map the name of the cities in the sources to a number 1 100, define this as City_Id 2. Define last digit of SKU in SALES as Product_Key 3. Define them Month of Transdate as Time _Id (range:01 12) 4. Def. Scenario_Id with cases (Year of Transdate )=1997 as 1,( Transdate)=1996 as 3, else 2 5. Fill all columns of target table with the same columns of sources 6. Define new column: Ending_Inventory = (Opening_Inv. + Additions) -Items_Sold Solution: Select Case SAMPLTBC.sales.city When Manhattan then 1.. When Maui then 100 End As City_Id Substr (SAMPLTBC.sales.SKU,12,1) as Product_Key Case When Month(SAMPLTBC.sales.transdate) = 01 then 1. When Month(SAMPLTBC.sales.transdate) = 12 then 12 End As Time_Id Case When Year(SAMPLTBC.sales.transdate) = 1997 then 1 Page 53 of 56 Pages

Exercise (Part II) Notes: The exercise is based on Microsoft Dynamics CRM Online. For all screenshots: Copyright Microsoft Corporation. The sign ## is you personal number to be used in all exercises. All

Exercise (Part XI) Notes: The exercise is based on Microsoft Dynamics CRM Online. For all screenshots: Copyright Microsoft Corporation. The sign ## is you personal number to be used in all exercises. All

Filing system designer FileDirector Version 2.5 Novelties FileDirector offers an easy way to design the filing system in WinClient. The filing system provides an Explorer-like structure in WinClient. The

p^db=`oj===pìééçêíáåñçêã~íáçå= Error: "Could not connect to the SQL Server Instance" or "Failed to open a connection to the database." When you attempt to launch ACT! by Sage or ACT by Sage Premium for

Diss. ETH No. 12075 Group and Session Management for Collaborative Applications A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZÜRICH for the degree of Doctor of Technical Seiences

Exercise (Part I) Notes: The exercise is based on Microsoft Dynamics CRM Online. For all screenshots: Copyright Microsoft Corporation. The sign ## is you personal number to be used in all exercises. All

p^db=`oj===pìééçêíáåñçêã~íáçå= How to Disable User Account Control (UAC) in Windows Vista You are attempting to install or uninstall ACT! when Windows does not allow you access to needed files or folders.

0 Corporate Digital Learning, How to Get It Right Learning Café Online Educa Berlin, 3 December 2015 Key Questions 1 1. 1. What is the unique proposition of digital learning? 2. 2. What is the right digital

Exercise (Part VIII) Notes: The exercise is based on Microsoft Dynamics CRM Online. For all screenshots: Copyright Microsoft Corporation. The sign ## is you personal number to be used in all exercises.

Name: AP Deutsch Sommerpaket 2014 The AP German exam is designed to test your language proficiency your ability to use the German language to speak, listen, read and write. All the grammar concepts and

Version: 00; Status: E Seite: 1/6 This document is drawn to show the functions of the project portal developed by Ingenics AG. To use the portal enter the following URL in your Browser: https://projectportal.ingenics.de

Labour law and Consumer protection principles usage in non-state pension system by Prof. Dr. Heinz-Dietrich Steinmeyer General Remarks In private non state pensions systems usually three actors Employer

Prediction Market, 28th July 2012 Information and Instructions S. 1 Welcome, and thanks for your participation Sensational prices are waiting for you 1000 Euro in amazon vouchers: The winner has the chance

Diss. ETH No. 16589 Efficient Design Space Exploration for Embedded Systems A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH for the degree of Doctor of Sciences presented by

preliminary note Every letter is something special and unique. It's difficult to give strict rules how to write a letter. Nevertheless, there are guidelines how to start and finish a letter. Like in English

Supplier Questionnaire Dear madam, dear sir, We would like to add your company to our list of suppliers. Our company serves the defence industry and fills orders for replacement parts, including orders

This press release is approved for publication. Press Release Chemnitz, February 6 th, 2014 Customer-specific software for autonomous driving and driver assistance (ADAS) With the new product line Baselabs

Support Technologies based on Bi-Modal Network Analysis H. Agenda 1. Network analysis short introduction 2. Supporting the development of virtual organizations 3. Supporting the development of compentences

SAP PPM Enhanced Field and Tab Control A PPM Consulting Solution Public Enhanced Field and Tab Control Enhanced Field and Tab Control gives you the opportunity to control your fields of items and decision

Lernaufgabe Let s make our school nicer Your task: Let s make our school nicer Imagine the SMV wants to make our school nicer and has asked YOU for your help, because you have learnt a lot about British

Talking about language learning Two Swiss teenagers, Ralf and Bettina, are both studying English at a language school in Bristo and are talking about language learning. Remember that Swiss German is quite

SELF-STUDY DIARY (or Lerntagebuch) GER102 This diary has several aims: To show evidence of your independent work by using an electronic Portfolio (i.e. the Mahara e-portfolio) To motivate you to work regularly

Advanced Availability Transfer Transfer absences from HR to PPM A PLM Consulting Solution Public Advanced Availability Transfer With this solution you can include individual absences and attendances from

Service Design Dirk Hemmerden - Appseleration GmbH An increasing number of customers is tied in a mobile eco-system Hardware Advertising Software Devices Operating System Apps and App Stores Payment and

USBASIC SAFETY IN NUMBERS #1.Current Normalisation Ropes Courses and Ropes Course Elements can conform to one or more of the following European Norms: -EN 362 Carabiner Norm -EN 795B Connector Norm -EN

Inequality Utilitarian and Capabilities Perspectives (and what they may imply for public health) 1 Utilitarian Perspectives on Inequality 2 Inequalities matter most in terms of their impact onthelivesthatpeopleseektoliveandthethings,