Thoughts on Open Source, Analytics

Monthly Archives: January 2006

Nanoblog (which has nothing to do with Nano technology) has good advice for Commercial Open Source Companies. I know the Nanoblog author and he is working in an IT organization that is trying to make OSS work in their environment so he’s formed these opinions working with multiple OSS companies.

Some comments:

“Let go of the control on product vision.”
Product vision is what helps COSC make money… If there is zero revenue in feature X then the company shouldn’t fund that. Now, flip side, the company should not stand in the way if someone who cares nothing about revenue wants to build feature X themselves. There is nothing evil with COSC prioritizing their contribution to OSS according to their revenue as long as they aren’t dictating standards/interfaces/etc that exclude the collective community will.

“Play well with other open source products.”
This is, easier said than done. It’s tough enough to assemble products with expertise on call (companies) and thorough and complete documentation and integration “road maps.” However, the “integrator” in open source almost always bears the burden on both sides of the projects… Someone wanting to play nicely with other OSS projects would have to become quite knowledgable about t’other to be successful int hat integration.

“The community is your number one customer”
I think this makes a good sound bite, but just can not be true for a company. Right or wrong, a companies purpose it to generate profit. It’s not a company (it’s a trade org, or co op, or association, or non profit) if it serves another interest (perfectly valid right? Mozilla Foundation, yes?).

I think Nanoblog omits some of the important, mutually beneficial aspects of COSC. Open Source “projects” at their core, are interesting and useful to engineers as they put together solutions. However, it is the application or solution that is of actual value to society, people, and business. ie, the Mozilla XUL package is valuable to the engineer, but it’s value to the world is actually as Firefox. ie, TCP/IP is cool to the engineer but email changes the world. Thinking of money as a currency of reward for value, people/societies/biz aren’t willing to pay for pure engineering genius. Unless of course, it’s an act of charity. They ARE willing to pay for something that makes their life better, easier, less expensive or provides an emotional experience ($20 toilet designer brush at Target).

The commercialization of open source provides the actual benefit of the inventors work to society… it validates the inventors passion with mass adoption or value; I can think of nothing more gratifying to the inventors spirit than to “change the world” for billions of people. This actual benefit is what people are willing to provide contribute back to the company/community. This, in theory, should provide the needed capital for the inventor to continue the pursuit of passion to provide the world his/her talent. This is not always the case; some productizations of open source don’t necessarily reward the actual inventions for their worth. How much profit should the person bringing technology X to the market reap, when they did nothing more than “apply it” instead of “invent it.”

Herein lies the paradox of Commercial Open Source that is the most interesting “tech” debate of the 21st century. Commercialization of open source projects brings sustaining capital as reward to these projects (in terms of comitters, paid OSS developers, testing, QA, conferences, etc). However, commercial direction in Open Source is considered in opposition to the freedoms of OSS and the inventors spirit. Ie, commercial application of OSS (whether for internal engineers or commercial open source applications) pay the bills of the inventors while they invent. This has always been the case (ie, Bell Labs was able to function b/c of huge profits by the commercialization of technology at AT&T).

The tsunami of open source “VALUE” is just beginning. Microsoft didn’t invent the mouse, they made it valuable to you and me. etc. I think it’s very much “black and white with 1000 shades of grey” on how companies are balancing this. Some do well, others do poorly. Time will tell what makes up the real secret sauce of “commercial open source.”

Oracle Warehouse Builder 10gR2-Late, but PACKED with Features!
Oracle Warehouse Builder 10gR2 is packed with useful features and not just for the data warehouse professional. Improved ETL, expanded metadata capabilities, and advanced dimensional editors will mean a great deal to data warehouse developers. Features like model-based streams integration and the data profiling/cleansing features will even make DBAs sing Oracle Warehouse Builder praises.

As a consultant working with companies and people throughout the world I need easy ways to connect with people to be effective. Presentations to potential customers, collaborators on open source projects, and of course day to day work with customer staff to do things “together” albeit offsite.

I can honestly say the 50 USD charge for the gotomeeting.com service is money WELL SPENT and I never look upon as anything but good value. It’s a quick, user friendly, and relatively robust meeting software that allows me to screen share with up to 10 people anywhere in the world within about 30 seconds. Their pricing is awesome, and allows for unlimited meetings (included phone conferencing) for just 50 USD.

I highly recommend it for anyone working with disparate groups of collaborators and organizations (ie, not all at the same company using the “corporate” eMeeting software) or any consultant on the national or international basis.

Today was a bit of a whirlwind…. Had a bunch of constructive conversations with the Pentaho folks about their solution, their license, where their company is headed, etc. I’m most impressed with their company, and engineering staff. Their product is coming together, they’re getting used by real customers (big ones with tough problems). They haven’t announced names on many of these, but they’ll be recognizable brands and companies.

I just started having a look at Kettle this week… I’m quite impressed with it, actually. I won’t be able to go into detail on the matter right now so I’ll defer that to some later posts. Suffice to say, that I was able to get a simple ETL transformation squared away in about 10 minutes. For anything open source, this is impressive so Kudos to Matt Casters at Kettle! 🙂

Speaking of full posts, I had some interesting discussions with the Pentaho folks about their license (PPL 1.0). I want to post a more comprehensive post on the matter so I’ll defer on that as well.

We covered some of the portal integration today, and WOWSERs. Building portals in open souce tools right now is not very, ummmm, user friendly. This has nothing much to do with Pentaho per se, as they are providing pretty cool portlet reference implementations, etc. It doesn’t appear that there is any good open source portal development tools (visual development anyone?) at the mo’. This might be so because I don’t see it, so if I’m wrong please do let me know. btw, this is one area where I’d LOVE to actually BE wrong. Email me if you know of a good open source portal product.

We also looked at some of the UI customization provided. There’s some cool stuff in this… It can be used to generate say, lists of values on pages. This is necessary for creating these dynamic dashboards.
From the screenshot you can see some radio buttons and a select box that was built using the Pentaho provided “Action Sequences.” Pretty cool stuff all in all.

I had to leave a bit early (there are some people staying on till tommorrow for a bit more) but thoroughly enjoyed my time with the Pentaho team. James, Bill, Lance, Richard, and the rest of the tribe: many thanks for the hospitality and all the great information on your evolving product!

This is the first time that Pentaho has engaged partners and customers directly in this sort of classroom training. Needless to say today was a bit of a challenge in terms of getting the examples to work properly, make tweaks, do some very rough exercises to get to know the platform a bit better. I’ve met with many engineering teams across organizations, nationalities, geographies, and industries and suffice to say many have lots of issues and politics going way back. It is always nice to see a team that gets along, has a sense for that right amount of “right” versus “ship it” mentality and I think Pentaho has that balance. Their engineers have helped move Open Source BI along further than I would have thought six months ago…

On that note… the product is still quite technical. They have a workbench (refer to yesterdays post) that provides just one layer of abstraction on the XML document solution. It’s an earnest effort, but falls short of any product based “BI Solution Builder” that I’ve seen. Time of course… Time, and money. Like any product still in it’s maturing phase it will improve…Version 1.0 shares the vision, builds the evangelists in the early adoptors. Version 2.0 is the robust product, right? (Did anyone actually find Windows version 1.0 intuitive and mature)?

We created our own “Action Sequence” today… These are the composite pieces of a reporting solution (run a query, iterate and print reports, and email). While the engine appears to execute the XML based documents properly and well, building these files using the workbench took some efforts. Many in the class found it easier to pop behind the scenes and just edit the documents directly. That being said, the power behind the architecture selected is compelling. With an XML solution of about 50 lines we were able to build define a solution that:

Received a parmeter request (customer_id) from JBoss

Queried a web service to determine which business unit that account is managed by

Build a query based on the specifics of the business unit (Canada needs this query, US needs this one)

Executed that query

Built a PDF report based on the results using a JFreeReport template

Delivered it back to the browser

The core of their product is pretty robust, and I think they have significant competitve advantage (not even just Open Source) with their report delivery intelligence. Pretty cool stuff… Does anyone know of any other BI product with intelligent workflow report processing (flows, cases, loops, web services, etc)?

We covered scheduling components. This is a feature need in the platform, but I wasn’t all that jazzed about it. It uses Quartz, and follows a pretty basic schedule, restart, pause, etc. I think it’s value add to the project, but not exactly anything I get excited about.

We also discussed OLAP, Mondrian integration, and what that looks like under the Pentaho umbrella. I wasn’t expecting much, because Pentaho is clearly focused on delivering reporting features. There really isn’t much in there in this regard, apart from the already good project work to date by the Mondrian/JPivot projects. There were some simple integrations into the platform, but it really isn’t that different from what you get if used these seperately. Most of the other pieces of the platform there is a clear roadmap (we plan on building feature X in Q2), however the OLAP/Crosstab pieces there was some uncertainty other than “we know it’s not great, but we’ll continue to improve.” I’m hopeful that I might be able to make a contribution here, but I don’t exactly know how.

Heading this morning from the hotel to the Pentaho was bright, sunny, and a beautiful day in Orlando. News from Seattle was 23 days straight of rain, but that doesn’t dampen my desire to head home the end fo the week. I really love Seattle and always look forward to returning.

That being said, I rather enjoyed today working with the folks at Pentaho. Today we got into some of the details of their solution, and much of the material and documentation started to make more sense based on their plain english explanations of what pieces of the platform fit where.

At a higher level, nearly everything in the Pentaho platform executes as an “Action Sequence.” This has some significant architectural benefits that we won’t belabor here, but suffice to say that this allows for great flexibiliy in deployment options. Actually, the three products below all interacted with these Action Sequences using a different “application” method (eclipse plugin, standalone java app, and a JBoss deployed web app) all drawing from the same core libraries. At a fundamental level, the Pentaho server is metadata driven (not in database metadata) in that the Pentaho base simply implements solutions defined by a variety of XML Documents. Nothing new here (this is common I’d say, anyone feel differently?) but a good choice all the same.

What is an Action Sequence? A sequenced and paremeter driven set of Action Definitions (ie, run report, genenerate PDF, email to Joe)

What is an Action Component? An implementation of an activity in the system (ie, an Email Sender component, PDF Generation Component, etc).

The AS is the driving class (for x in resultSet, do AD1, AD2, AD3). The AD is the specific instance of a call to a component (AD1 with Email=Joe, smtpserver=mail.bayontechnologies.com, etc). The AC is the Java class that implements the interface for components (public class EmailSenderComponent implements WhateverPentahoInterfaceItActuallyIs).

OK… now that the basics are out there, let’s talk specifics. Today we got to dig into three major pieces of the system:

Report Server
This is the piece that runs on a server somewhere, that executes Pentaho solutions. It has a web interface for interacting with the Pentaho server, it has a runtime repository for running these solutions, it is able to schedule Action Sequences to be run. This is similar to Crystal Enterprise Server that receives the reports and schedules, runs, distributes them. The demo installation of this is wicked easy on Windows, as I alluded to yesterday. The installation on Linux did require some coaxing, as do many things in Linux. Clearly, the out of the box implementation works best with Windows and Linux requires some effort so I’ll ding Pentaho for that. However, I can’t really fault them because 90% of evaluators will be giving them their 10 minutes of eval time with Windows boxes so it’s a good decision for the project/company.

Some screenshots of the working server, with a handful of the reports that come with the installation:
Notice the Parameter Driven selections. We’re getting into those tommorrow, but it looks promising.
The Crosstab JFreeReport is quite limited.
There is some cool dashboard stuff too… We’ll be getting into that later along.

Pentaho Report Wizard
This is a standalone Java application that makes the basic sequence of getting a basic report “running” pretty quick. It’s in its infancy; in fact, I don’t believe that it has been released to sourceforge.net yet but I might be mistaken on that. Pentaho is planning to release this on sourceforge in a few days.

Step 1: Define your data location and your query.

Notice the lack of a query builder which will deter some from even considering this a useful wizard. Pentaho acknowledged they need that peice in here and will work at improving the wizard incrementally. However, there’s little you can’t copy and paste into the SQL area so it makes Step 1 quite powerful in getting at your data. In Oracle for example, consider the power of ‘SELECT MONTH, VALUE, GROUP, LAG (VALUE parition by GROUP order by MONTH) PRIOR_PERIOD from MY_VALUES_TABLE’. Good for the SQL Gurus.

Step 2: Select which columns are your items, which are to be “grouped” in the report.

Nothing really special here. At this point I notice that nothing in this application actually requires a fat Java Client. It’s all check boxes, select boxes, arrow buttons etc similar to some of the JSF ADF Faces components that Oracle puts out. This is a prime candidate for a community built AJAX wizard! Any takers on that one?

Step 3: Report Format Options
Missed that screenshot, oops.
This is where you can set page breaks at group boundaries etc. There is also some formatting options here with background color, justification, etc.

Step 4: Formatting Options.

Now, this is not really that advanced… It provides the “formatting options” for the wizard which doesn’t actually use a template. I believe Mike on the Pentaho team coined it the “Non-Template Template.” Basically, because you’re using a wizard you give it basic formatting things (group heading font, color, etc) and it will generate a template for you. You want more than that you gotta build your own. Incidentally, underneath the covers this wizard is building a JFreeReport defintion. Pentaho can build reports using JFreeReport, BIRT, and Jasper, however I think JFreeReport is what the team is using for the wizard.

Step 5: Preview

The options are generated by JFreeReport. Excel, PDF, and HTML. The PDF on linux blew up for me. The HTML worked all right. Didn’t try the Excel. I’ll wait for Open Document (just kidding). You’ll see, the JFreeReport actually generates a pretty decent report. Clearly this is not the “pixel perfect” solutions some commerical offerings have, but it’s really not bad.

Step 6: NICK’s BONUS STEP

You also need to save the report. 🙂 The natural last step of the process is to save the report (four files that constitute the report definition) to be used either the REPORT SERVER (above) or the workbench (below).

Pentaho Workbench:

The workbench is an Eclipse plugin that edits Pentaho solutions. Since their solutions are, for all intensive purposes, plain XML documents this is a good fit. My initial impressions are, well, just OK. Clearly this beats writing XML by hand based on their spec but it’s a pretty rough GUI when it comes to making sense out of the whole thing. This GUI will be efficient for those that know the underlying XML structures, the specifics of their Pentaho Action (Components/Sequences/Descriptors). However, for the person coming onto the platform it will be kind of daunting. Again, it should improve but for now it’s not exactly user friendly.

We used the workbench to take that report we created in the wizard and “parameterized” it. We made the SQL driven by one of our where clauses and bound it to a “request.PARAM1” item that Pentaho will set in it’s server environment. It was a little difficult understanding the context of what we did, but I have to say that when we copied it up to the server it worked brilliantly.

More tommorrow… Email me at the usual (on the right column)… I’ll also offer to bring any questions, especially those of Oracle community, to see if I can’t get those answered.

Those that regularly visit the site know that I focus my professional consulting hours on Data Warehousing, and specifically with Oracle and Oracle Warehouse Builder. However, much of my R&D time I spend researching, downloading, and kicking the tires of Open Source projects I find cool and interesting. Pentaho is clearly one that exists at the intersection between what puts bread on the table and what stimulates the mind, so I accepted their invitation for a week of training in Orlando, FL.

Day 1, like many first days, is mostly introductory, architecture blueprints, team, background, etc. We start getting into some technical details in the subsequent days, but today was mostly an overview. I am continually impressed by this company; not the product per se, because it is a 1.0 product in a very mature market that can only now start to be compared with the “big boys.” However, the company has the mojo to pull this off, I think. They are on a first name basis with CEOs of other key open source companies. Their ranks are filled with former Business Objects and Hyperion recruits. Their board members were senior VPs at Oracle, and on and on. They are building a solid company that is healthy and, from what I can see, can deliver on their vision of an open source BI stack.

What is Pentaho? Pentaho is an open source BI stack that provides the full stack of BI components: Reports, Bursting, OLAP analysis, Dashboards, BAM, etc. Lofty goal indeed… CEO Richard Daley puts it quite simply (paraphrased): We don’t want to be a disruptive technology in just Open Source BI, we want to disrupt the entire BI market place with our technology… It’s a lot of fun… It’s process centric (workflow driven) and has conceded the fact that it won’t be a silo, as the center of the universe. It pragmatically embraces the idea that BI should be part of an overall business process, and that if it is not, then you’re not getting the full value of your BI assets.

This makes profound sense, yes? If your business process of analyzing order fulfillment efficiency is the end of your process (ok, looks like the warehouse in Toronto is 3 times slower than anyone else) then you’re hosed. The process must continue to notify someone of this result, and collaborate on a solution if the intelligence is actionable.

I continue to be critical of their regard for ETL/Data Warehouse as secondary to their platform. I think they have BI covered, and are comprehensive in this regard. What I see, like others, is that the other key piece of “doing BI” is the information integration, cleansing, and transformation. If the data is unintegrated, the business context is difficult to infuse from a “straight SQL query” then you’re in the ETL and Data Warehousing business. That being said, the architecture I’ve seen put forth (haven’t been in the details yet), allows for this relatively easily. Perhaps this will be more robust that it first appears as the details of their product are run through this week.

They released a 1.0 GA in Decemeber. If you haven’t checked it out, DO! It installs in about 2 minutes on Windows (no kidding, it’s one of the best open source demo installations I’ve ever seen).