Wednesday, December 18, 2013

I'm writing this post for share the presentation and resources of my talk on the second meetup Oporto MongoDB User Group. The talk consist on demonstrate the integration between Pentaho and MongoDB, on the three aspects: ETL, Reporting and Dashboarding.

Look the presentation:

The data I use is just for demonstration, I didn't care with performance, content or data quality, the goal are just demonstrate how you can integrate Pentaho with MongoDB with simple examples. And show the potential and how easy is integrating with other systems, in this case Google Maps API.

In resume, if you want integrate Pentaho with MongoDB, there're two options for do it:

ETL: With Pentaho Data Integration (aka kettle) you can use the "MongoDB input" and "MongoDB output" steps for get and save data. The ETL transformations can be use on Pentaho Reporting or CDE using CDA.

Programming: This is the best choose for me because I'm familiar with Java development. I know, using ETL steps is more intuitive but as any platform or product, you have more performance if you design exactly what you want. So, in ETL for programming you can use the "User Defined Java Class" step. In Report designer you can use the option Scriptable or a MongoDB datasource. Finally, in dashboards using CDE, that's use CDA for data access, you can use the "scriptable over scripting" datasource and the language beanshell, for example.

This is the link with resources of presentation. Note: You need be familiar with configuration those resources to works fine.

Thursday, September 26, 2013

Saiku Analytics is a great open source server or Pentaho plugin for explore and vizualize data!
About Saiku: "Saiku was founded in 2008 by Tom Barber and Paul Stoellberger. Originally called the Pentaho Analysis Tool, if started life as a basic GWT based wrapper around the OLAP4J library. Over the years it has evolved, and after a complete rewrite in 2010, it was reborn as Saiku.Saiku offers a user friendly, web based analytics solution that lets users, quickly and easily analyse corporate data and create and share reports. The solution connects to a range of OLAP Servers including Mondrian, Microsoft Analysis Services, SAP BW and Oracle Hyperion and can be deployed rapidly and cost effectively to allow users to explore data in real time." - Meteorite

About OpenShift: "OpenShift is a cloud computing platform as a service product from Red Hat. A version for private cloud is named OpenShift Enterprise.The software that runs the service is open-sourced under the name OpenShift Origin, and is available on GitHub. Developers can use Git to deploy web applications in different languages on the platform.OpenShift also supports binary programs that are web applications, so long as they can run on Red Hat Enterprise Linux. This allows the use of arbitrary languages and frameworks. OpenShift takes care of maintaining the services underlying the application and scaling the application as needed." - Wikipedia

In this post I'll demonstrate how you can publish Saiku Analytics server on OpenShift platform using a free account that provide 1GB storage per gear. Using another words (commercial words) is put your business analytics on the sky (ok, on the cloud :) ) by free (or low cost depends of your bussiness).

After create your account on OpenShift website, you need install/configure OpenShift RHC Client Tools. I'll describe the steps for install and configure in Linux but you can follow the instructions on this link.
The steps on Linux Ubuntu are:

sudo apt-get install ruby-full rubygems git-core

sudo gem install rhc

rhc setup (put your credentials)

The next step is create your application on OpenShift, you have three ways for do that. One using OpenShift website with your account (as I'll demonstrate), other way is using RHC client tool and the other using JBoss Developer Studio (you can find out more about those ways in this link).
The steps using website are:

Saturday, July 27, 2013

A new book about Pentaho is out! The name is Instant Pentaho Data Integration Kitchen.

"The book is about kitchen and how to use the PDI's command line tools efficiently to help people in their day to day operations with Pentaho Data Integration. It is a practical book and it seemed relatively easy to write." Sergio Ramazzina

I was the technical reviewer for this book and I was very happy because the publisher choose me and is another experience that I really liked.

Also, I want to wish them congratulations to the author Sergio Ramazzina to write this book and contribute to opensource community.

Thursday, June 6, 2013

The Kettle have a small functionality to run ETL, it uses Apache VFS that let you access a set of files from inside an archive and use them directly in your processes. However, you can use this for execute ETL in somewhere on the web.

Run in file system

I create this little sample (a job that executes a transformation). And I compress this two in the zip file.

So, I have this zip file on that path in my own computer: C:\Users\latinojoel\Desktop\sample.zip

Sunday, May 26, 2013

This a little sample how create a job notifier of another job is terminated with error or success.
The first step is create a job sample that represents job of ETL process (for example, a job that is responsible to populate DW).

Create a job example

The first step is create a transformation that receive from ${VALUE} variable and if that value match with 'Y' the transformation execute successfully else ${VALUE} have a different value of 'Y' the transformation execute with error.
See the following transformation workflow:

The second step is create job that execute the above transformation. See the following job workflow:

Create the main job

Create a transformation notifier

In this transformation you'll need define how you can be notified. In this sample you would be notified by email, android push notification (using PDI Manager) and apple push notification (is a new plugin, will be available in the few days).

Transformation receive a parameter that indicate if ETL executes with success or not. And basis on that, the message notification alters.

See the following ETL workflow:

Create a job notifier

This job is responsible to execute any job on same folder, what you need is pass by parameter the job name that you want execute. If the job run with success he'll execute above transformation passing 'Success' on parameter STATE.

See the following job workflow:

See the following how execute the job:

You can download all files on this in this link (note: you need configure the kettle.properties file).

Wednesday, February 6, 2013

As I said in my first post in my blog, there are two things I want share with you:

PDI Android Push Notifications Plugin: This is a plugin that permit to you send push notifications from Pentaho Data Integration to any android application that enable GCM service. This plugin is written in Scala!!! Check out: https://github.com/latinojoel/pdi-android-pushnotifications. Note: this plugin yet is a release candidate.

PDI Manager Android App: Is a Android App that enable to you receive push notifications from PDI.

ConfigurationYou need tobe familiar with GCM (Google Cloud Messaging for Android), you can read more about this service in this link. As I say, this plugin is designed for send push notifications to any android application because all most important parameters are configurable.

Main Options screen

Properties screen

For sending push notification to PDI Manager Android AppIn this moment, PDI Manager is very young :-), so it's only configured to receive four properties inside the push Status, Project, Date and Data. Use the follow API Key:

AIzaSyAh7Nf-N7bE4xIwsVb7nk4mmls_yEQwZQA

See the image example below:

A little sampe.In the next few days I will provide a better documentation. For now, I think this post help you test this new solution.PS: Feel free to comment and if you find a bug or you have a idea for this solution, you can create a issue or pull request on GitHub.

So, as you can see, you need define a bin.xml file in src/assembly/ path, you can define the name and path that you want. Is only mandatory have a specific structure content. That you can see a example below:

Saturday, February 2, 2013

This is my first post in my personal blog, so you are ask why? The answer is because on last days I developed something that I think is very useful to pentaho community. In this moment I don’t want talk very must about that but I can say it’s to improve the quality work of ETL developers.