I am David Fombella and this is my place to share my little knowledge in Business Intelligence ( 100 % supporter of #Pentaho also on twitter: @pentaho_fan)

Main menu

Tag Archives: dwh

1. Viewing table data distribution across segment Servers To view the data distribution of a table’s rows (the number of rows on each segment), you can run a query such as:

SELECT gp_segment_id, count(*)
FROM abc
GROUP BY gp_segment_id;

To see the data distribution of a table’s rows on segment servers and the directory location of where the data is located, you can run a query such as:

SELECT
table_name.gp_segment_id,
hostname,
fselocation as datadir,
count(*)
FROM table_name,
pg_filespace_entry pgfse,
gp_segment_configuration gsc
WHERE
gsc.dbid=pgfse.fsedbid
and table_name.gp_segment_id= pgfse.fsedbid
GROUP BY
table_name.gp_segment_id,hostname,datadir
ORDER BY
gp_segment_id;

Balanced Distribution: A table is considered to have a balanced distribution if all of the segments have roughly the same number of rows. 2. Check the size of a table Answer: Table Level:

The “Slowly Changing Dimension” problem is a common and persistent issue in data warehousing. Simply this applies to records where attributes change over time. Here is an example; Sue is a customer with a major retailer. She first lived in Potter Heigham, Norfolk. So, the original entry in the customer lookup table has the following record:

lngCustomerURN

strName

Town

County

1001

Sue

Potter Heigham

Norfolk

Later she moved to Cambridge, Cambridgeshire in March, 2009. How should we modify its customer table to reflect the change? This is the “Slowly Changing Dimension” problem.

There are in general three ways to solve this type of problem, and they are categorized as follows:

Type 1: Replace the old record with a new one, the only problem is that no trace of the old record exists and any historical information is lost. If for example, we had sales for one region associated with her, when the replacement is done, the sale will move with her.

Type 2: We add a new record to the customer table. We retain the existing history of old transactions against the customer and new transactions are linked to the new customer record.

Type 3: The original record is modified to reflect the change, this generally involves having an additional column (or columns). This is a simplified Customer record to show how a type 3 record could look.

lngCustomerURN

strName

strPreviousTown

strPreviousCounty

strCurrentTown

strCurrentCounty

1001

Sue

Potter Heigham

Norfolk

Cambridge

Cambridgeshire

It is important to recognise that even with a type 2 change additional fields need to exist in the table in order to ensure that the data retain their integrity.

lngCustomerURN

strName

strTown

strCounty

blnActive

blnDateActive

1001

Sue

Potter Heigham

Norfolk

False

22/11/2003

1001

Sue

Cambridge

Cambridgeshire

True

12/03/2009

There is a type 6 change which basically incorporates all the combined functionality of types 1, 2 and 3 combined. Type 6 was first described by Ralph Kimball who called it a ‘hybrid approach’ combining the three basic techniques.

Business Intelligence is a growing field. The roots are in Data Warehousing (collecting data), data mining (doing analytics on data) and decision support (dashboards, reports and event notifications). That is why I am attracted to this field. During 2011 I worked as Business Intelligence consultant at CSC, focused in designing and developing BI solutions (ETL processes, ad –hoc reporting, OLAP) with Pentaho Business Intelligence and Analytics. Currently I am working at Stratebi in Madrid (Pº de la Castellana).

Nowadays, you need to provide the business decision makers
with the tools they need to make use of the data. In this
context, “tools” means much more than just software. It
means everything the business users need to understand
what information is available, find the subsets they need,
and structure the data to illuminate the underlying business
dynamics

In this blog I will try to share my experiences with Business Intelligence and open source software.