I downloaded version 0.9 of Kafka. I happened to have a Cloudera Quickstart VM running (CDH 5.4), so I figured I’d run through the quick start of Kafka from that VM. I had no trouble starting up Kafka and sending and receiving basic messages via the console consumer and producer. Getting started with Kafka is very simple! Now on to Kettle.

Within Spoon (Version 6.0), I installed the Kafka Marketplace Plugins. After restarting, I created a very simple transformation. I placed an “Apache Kafka Consumer” step on the palette followed by a “Write to Log” step, can’t get much simpler than that!

In the Kafka Consumer dialog, I specified the topic name as “test” to match what I did during the Kafka Quick Start. I then set the “zookeeper.connect” property to my Zookeeper’s location running on my Cloudera VM, “192.168.56.102:2181″. Finally I specified the group.id as “kettle-group”.

Now that I had things wired up, I figured it was time to run! I had some basic thoughts at this point. Which message does the consumer group start reading from in the Kafka Topic? How long does the step run for before exiting? We’ll get to those answers in a few minutes. First let’s run it and see what happens…

Fun with Java Classes. I’m not exactly sure why Kettle can’t find the Kafka class here. I quickly resolved this by placing all the plugin lib jar files in Spoon’s main classpath:

cp plugins/pentaho-kafka-consumer/lib/* lib

Note that this was a hammer of a solution. I renamed all the jar files to start with “kafka”, that way I could quickly undo my change if necessary. Also, I’ve created the following issue over on github, maybe there’s a better approach to fixing this one that I haven’t thought of yet.

Once I restarted Spoon, I re-ran the transformation and … got no results from Kafka. I tried a bunch of different configurations, I sent additional messages to Kafka, but no luck. So I did what any developer would do, and checked out the latest source code.

From there I ran “mvn package”, and got a fresh new build. I replaced plugins/steps/pentaho-kafka-consumer with the new target/pentaho-kafka-consumer-TRUNK-SNAPSHOT.zip. After running it and seeing a similar NoClassDefFoundError, I repeated my steps with the new plugin jars, moving them to the main classpath.

Another thing I ran into was on the Kafka configuration side. Kafka was using the hostname of my VM for comms, which my OS wasn’t aware of. I fixed this by updating config/server.properties advertised.host.name to the public IP address of the VM.

After restarting Spoon, I successfully read in the messages from Kafka! Note that at this time you can’t reset the message offset for a specified group, so the only way to re-read messages is to change the “group.id”. This is a feature that Ruckus is considering adding, it would be a great way to contribute to the open source plugin!

After getting the Consumer working, I went ahead and tried out the Producer. Note that the Producer step needs Binary data to feed a topic. All I had to do was feed in Binary data, specify the topic name (I used “test” again), and finally specify the “metadata.broker.list” with the correct IP and port, and it worked like a charm! Note that at this time I didn’t have to rebuild the producer plugin like I did the consumer, but without the consumer jars being placed in the lib folder the producer wouldn’t function either.

So how might you use Kettle and Kafka together? Kafka is becoming the de-facto big data message queue, and can be used in combination with Spark and other Hadoop technologies for data ingestion and streaming. Kettle can be used as a way to populate a Kafka Topic via the Apache Kafka Producer, or it could be used to consume messages from a topic via the Apache Kafka Consumer for downstream processing. Ruckus Wireless, the company that contributed the steps, uses Pentaho Data Integration to ingest data into Vertica and then visualize the data with Pentaho Business Analytics. You can learn more about Ruckus Wireless use case here:

I just finished reading Packt Publishing’s Pentaho Business Analytics Cookbook by Sergio Ramazzina, this is a great up-to-date guide on utilizing the full Pentaho Analytics Suite, including a mix of both enterprise and community components. Useful details around configuring data sources, building a set of your first reports, parameterization, dashboarding, and a whole lot more are covered step by step to make sure you walk away with a good understanding of what tools are available and how to get started with them. This is the first definitive guide I have seen around Pentaho Mobile, and I really appreciate chapter 11 on customizing the Pentaho experience for your business.

If you are looking to get up to speed on Pentaho Analytics very quickly, I highly recommend this book!

During the development of 5.1, Pentaho has taken steps to integrate Mondrian 4 into our business analytics platform. This article goes over what we have accomplished so far, where we are headed, and also instructions for getting Mondrian 4 working with Pentaho 5.1 Community Edition.

Pentaho Enterprise Edition has Mondrian 4 bundled for a specific reason - we’ve now introduced native MongoDB support as a plugin to Mondrian 4. This use case allows customers to slice and dice data from MongoDB collections in Pentaho Analyzer. You can learn more about the capability here: http://www.pentaho.com/request-analyzer-mongodb

As we continue to evolve the Pentaho Platform, we need a more flexible plugin architecture for driving innovation. To allow both Mondrian 3 and Mondrian 4 runtime environments, we’ve introduced OSGi as a core part of the platform. Mondrian 4 is our first use case, but we’ll be introducing many others in future versions.

Once Mondrian 4 is installed as an OSGi bundle, it is available as an OLAP4J resource to the platform via Pentaho’s proxy system Driver aptly named “PentahoSystemDriver”. The following steps below walk you through getting Pentaho Mondrian up and running within Pentaho CE 5.1. Note that these instructions won’t work against previous versions of Pentaho, and these instructions are not necessary in Pentaho EE 5.1, as Mondrian 4 is already configured and installed.

Using this utility code is nice because the MDXOlap4jConnection will manage mapping Pentaho’s roles to Mondrian’s.

So how does all of this work?

Pentaho has bundled Apache Felix into the Pentaho Platform. Felix is an OSGi container which now manages Mondrian 4 and its dependencies. The core bundles that make up Pentaho’s OSGi container can be found in pentaho-solutions/system/osgi/core_bundles, here you’ll find a number of utility OSGi jars including Gemini’s Blueprint, which we use for wiring OSGi components. Blueprint is similar to the Spring Framework. Also, the Mondrian 4 jar contains some metadata that registers it with the Pentaho platform as an available OLAP4J driver with the JDBC prefix name of “mondrian4″. You can check out the metadata file OSGI-INF/blueprint/beans.xml to see the specific XML to declare the driver. To see how the internal wiring is done, and how PentahoSystemDriver is involved, you can check out the pentaho-platform package org.pentaho.platform.osgi.

There is still a lot of work to do!

Here are some of the areas we will need to complete in future versions to make this a seamless experience:

It’s been four years since I published Pentaho Reporting 3.5 for Java Developers. A lot has changed in Pentaho Reporting since then, so it’s great to see a new book now available from Packt, Pentaho 5.0 Reporting by Example: Beginner’s Guide, co-authored by Mariano Mattio and Dario Bernabeu. This book has a different purpose than the Java Developer book, it’s focus is a deeper dive into examples to quickly bring folks up to speed on the various capabilities of Pentaho Reporting.

For those who already are familiar with the basics of Pentaho Reporting, I would still recommend this book for a couple of reasons. First, Chapter 12 covers both content linking and sparklines, very useful features for your every day reports. Second, one of the newest features in Pentaho Reporting 5.0 is stylesheets. In Chapter 13, this book does a great job at an initial introduction to get you started on this powerful capability.

I’m proud to announce that included in our release of Pentaho BA Server 4.8 and Pentaho Data Integration 4.4 to SourceForge today, we’ve bundled as a plugin our first version of the Pentaho Marketplace! With the Marketplace, it is now easy to download and install cool plugins developed by the community.

To get started with the BA Server Marketplace, log in as an administrator, and click the Marketplace toolbar icon or select Tools -> Marketplace. In Spoon, click the Help -> Marketplace. Find the plugin you want to install or upgrade, and click!

I would like to say thank you to all the hard work done by the many folks who participated in making this project reach 1.0! The Pedros @ WebDetails have worked around the clock to deliver the BA Server Marketplace, their team has done a great job with the UI experience as well as a telemetry capability so the community can see what plugins are popular and how the marketplace is used (More on that in a future blog!). Also, Wes Brown, Pentaho’s head of pixel management, provided a lot of great feedback and ideas for the UI, helping make the user experience fantastic! On the PDI side, Matt Casters did a great job putting together a first version of the Marketplace within Spoon, followed by assistance from Sean Flatley and Matt Burgess, two of our core Kettle developers at Pentaho. I hear Matt B. is also brewing up a number of new plugins, go check out more about them on his blog :-).

It’s great to see so many helping hands on a project like this, all done out of passion for the product, and the goal of opening up the product to even more capabilities and contributions!!

If you do find any issues with the Marketplace, please let us know. We already have many plans for future versions, so keep an eye out … in the Marketplace … for Marketplace updates :-).

I’m pleased to announce that Pentaho’s Engineering Team will be hosting IRC Office Hours each week. IRC is a great place to go and chat with Pentaho’s developers, but sometimes we’re too busy traveling the world or hacking away at the next release to catch up with folks in the irc.freenode.net ##pentaho channel. We’re hoping that hosting office hours will allow for more collaboration, so as a community we can continue to expand and build on the #1 open source business analytics and data integration platform.

This weekend I had the pleasure of reading Maria Roldan and Adrian Pulvirenti’s Pentaho Data Integration 4 Cookbook, published by Packt Publishing. I was one of the reviewers for Maria’s first Packt book, Pentaho 3.2 Data Integration: Beginner’s Guide, as well as a Packt author myself, so when I was asked if I’d be willing to write about the most recent addition to the Pentaho collection of books, I happily obliged.

I highly recommend this book to all those out there looking to learn more about PDI. The book has many great recipes for specific situations, but also throughout the book you learn many important swiss-army-knife-type skills that will aid you in your daily use of Pentaho Data Integration. The book includes everything from dealing with unstructured text files to working with fuzzy logic. As a Java developer, I especially appreciate the many uses of the User Defined Java step for the more advanced scenarios. The book also introduces the many uses of Pentaho Data Integration within Pentaho’s BI Suite, allowing power BI Developers to create a flow of information from a transformation to a report or dashboard.

Chapter 6, Understanding Data Flows, may be the most important chapter in this book. Managing the merging and splitting of data within a transformation requires key insights that this book covers in detail. Having this information will allow you to take your transformation building skills to the next level.

Thanks Maria and Adrian on the wonderful piece of work! The copy I received will reside in the bullpen at Pentaho’s Headquarters here in Orlando, I’m sure many of the Engineers here will use and learn from it! Now don’t waste any more time, get your own copy today!

I’m a big fan of Google Analytics, I use it for all my personal websites to see what type of traffic I get. One of my colleagues who is also impressed with their reports wanted to know if you could make a Pentaho Report look as good as Google’s output. I quickly threw together the following report, to show that you can design just about anything in Pentaho Report Designer!

Check out the PDF and HTML rendering of the report. Feel free to use the PRPT as a template for your own reports.

Here are my top 5 recommendations for folks when designing reports like this:

Don’t be tempted to use the lines and rectangles. Instead, use padding and borders of bands and elements.

Inline Subreports allow you to pretty much layout anything, use them!

The message-field report element is very powerful, you can specify number and date formats as part of the message: $(field, date, MMM yyyy).

Make sure to test rendering in the output formats that you care about. HTML renders as a set of tables, so you can’t have overlapping objects in your report.

Take advantage of the “Paste Formatting” option, this allows you to copy colors, font sizes, etc, and will save you a lot of time.

And of course don’t forget to get a copy of Pentaho Reporting 3.5 for Java Developers :-). The book covers many topics, you can learn a lot about formula functions, chart options, shortcut keys, and much more.