Thoughts on Open Source, Analytics

Monthly Archives: May 2006

I noticed someotherbloggers posted but I’ll encourage people as well. The more people that respond the more accurate the survey represents reality!“We would very much welcome your participation in The OLAP Survey 6. This is the largest independent survey of business intelligence users worldwide. The Survey will obtain input from a large number of users to better understand their buying decisions, the implementation cycle and the business success achieved. Both business and technical respondents are welcome.

The OLAP Survey is strictly independent. While Oracle, Microsoft, Cognos and other vendors assist by inviting users to participate in the Survey, the vendors do not sponsor the survey, nor influence the questionnaire design or survey results. As a participant, you will not only have the opportunity to ensure your experiences are included in the analyses, but you will also receive a summary of the results from the full survey. You will also have a chance of winning one of ten $50 Amazon vouchers.Click here to complete the survey on-line.”

Mark Rittman has worked up an excellent article on the new Oracle BI Landscape. He basically covers the whole range of products and technologies and how they’re being repackaged and rebranded into a new product set. Great read.
I thought the following was of interest:

The Analytic Server is effectively a ROLAP (Relational OLAP) engine, that connects to data sources but does not have any local storage in the traditional, database sense, although it does have a cache which holds copies of data previously retrieved, so that subsequent requests for the same data are returned faster.

What struck me about this statement is that from a technology perspective this is nearly identical to Mondrian‘s (Open Source OLAP engine) architecture. While proprietary vendors (Oracle included) had been touting the benefits of a true MOLAP solution instead of just ROLAP. However, with the processing headed to mid tier servers (instead of OLAP crunching happening on a Pentium I desktop) one can leverage the mid tier for caching, and analytical processing (MOLAP-esque stuff). In other words, you can get MOST of the benefits of MOLAP with a well designed and performant ROLAP server proactively caching/crunching on your behalf! Cool stuff!

It’s common to need to generate surrogate keys as part of a Data Warehouse / Business Intelligence solution. Protecting your users from souce system changes, or using time variant data are just a couple of the reasons why one needs to generate a surrogate key when loading data into a DW.

Here is a simple example. I’m loading Product data from an XML file into a MySQL table. The transformation in Kettle looks like this:

The output coming out of the XML file has been flattened using XQuery so that it looks like a flat table struture. Here’s a screen capture of a nifty feature of Kettle where you can preview the output of a step:

Ok, this is our source data. The ID you’re seeing is the OLTP key. We’ll refer to it as the Natural Key. This is the ORDER_ID, or CUSTOMER_ID, or well, you get the idea. This is the key that your source application uses in it’s relational structures.

What we want is to INSERT or UPDATE (aka UPSERT) this record into our DW. We’ll INSERT if we don’t already have this NATURAL key in the DW and generate a surrogate primary key. We’ll UPDATE if we already have this NATURAL key in the warehouse.
Our table looks like:

Notice that our PRIMARY KEY is the Surrogate we’ve set up to be an AUTO_INCREMENTING column. This means that MySQL will generate a surrogate when we INSERT a record. All we have to do is get Kettle to match our unique contrainst (INT_PRODUCT_NK) and either INSERT/UPDATE.

I’ve used the INSERT / UPDATE operator in Kettle. The configuration looks like this:
What we’re doing is configuring Kettle to do a lookup on the table to see if the natural key already exists. If it does, we’ll perform DML (insert or update) on the fields in the lower window.

After running the transformation my INT_PRODUCT table looks like this.
Now, I realized I didn’t choose very illustrative natural keys (all are 1,2,3) but the PRODUCT_ID comes from MySQL as a surrogate.

Subsequent executions of this Kettle transformation will INSERT records as they arrive or UPDATE. MySQL and Kettle in combination are making quick work out of generating these surrogates!

I get asked frequently by people who evaluate Pentaho: How do I get started? Many people (tens of thousands on a monthly basis) download the Pre Configured application and immediately get a sense for the benefits of the Pentaho open source project.

What people almost always want to do next, is get started with their OWN data. You the reader of this blog might find a pre canned demo interesting, but you’ll want to reserve judgement on the leading open source BI Project until you’ve seen your own data flowing into beautiful PDFs/Excel/OLAP Pivot Tables. I totally understand…

This is a very SIMPLE, BASIC, LIGHTWEIGHT guide on how to get the downloadable DEMO connected to YOUR DATABASE and executing YOUR SQL Query using the lynchpin of the Pentaho platform, an Action Sequence (refer to this blog for more on what Action Sequences are).

Disclaimer: Pentaho has development tools that once installed makes the development of action sequences more productive then editing xml as below. In other words, this is just rough and dirty to get you going and connect… I highly suggest downloading the Development Workbench to actually build the Action Sequences.

Step 6:
Let’s create your first action sequence!
Create (using notepad or vi) a file named MyFirst.xaction in $DEMO_BASE/pentaho-solutions/samples/datasources
Make sure and replace the select I have above with a select statement that you KNOW works in your database.

Step 8:
Open up Firefox (cough IE) and hit the following URL:
http://localhost:8080/pentaho/ViewAction?&solution=samples&path=datasources&action=MyFirst.xaction

You should see the basic output of the Action Sequence which, in this case, is the results of the SQL you provided.

Action Successful

id

title

1

Open

That’s it! You’re connected to your own datasource and now you can start to explore the immense power of Action Sequences with your data… There are approximately 50 example .xactions in the demo so use them as a reference when trying to exploring the features of the platform. I HIGHLY SUGGEST downloading the workbench as well. It’s a visual environment for authoring .xaction files. Trust me, you’ll appreciate the GUI!

I’ve made the switch from Movable Type to WordPress. I’ve used Movable Type for more than two years now, quite happily actually, but thought it high time to move to WordPress. I’m rather impressed with WordPress thus far; very competitive product all in all. Some things in the installation were even EASIER than MT.

I turned on comments/trackbacks on my MT blog approximately 7 weeks ago. In that 7 weeks, I had received more than 2,000 spam comments, trackbacks, etc. I plan to use a spam service on this blog to prevent such an abuse again.

If you are encountering any feed or viewing issues please do let me know.