Monday, October 05, 2009

Pentaho Analyzer

This is great news for Pentaho customers, the community, and the BI world at large. While Pentaho Analysis (Mondrian) is one of its strongest components, the current OLAP viewer (based on JPivot) has been one of its weakest.

The new viewer puts Pentaho at the top of the heap, in competition with best-of-breed OLAP viewers. It is designed to be intuitive for business users (yes, those people who don't speak MDX!), is built using the latest web technologies, and integrates seamlessly with Mondrian and the rest of the Pentaho suite.

It is going to revolutionize the experience of using OLAP within the Pentaho suite.

Naturally, there are concerns. First, the new viewer is only part of Pentaho's Enterprise Edition (EE) suite. If Pentaho is committed to open source BI, why not release it open source? Second, what will happen to Pentaho Analysis Tool (PAT), the successor to JPivot being developed by the Pentaho community? I'd like to take the opportunity to answer these concerns, because I think this is news that everyone should be celebrating.

Why is the new Analyzer not open source?

There's been a lot of talk about open source business models, 'open core', good and evil, and all that. Releasing ClearView as part of Enterprise Edition is perfectly in sync with Pentaho's business model and with my intuitions about what makes sense for open source. Here's my rationale.

If you release a piece of software open source out of sheer, 'I love the world!' altruism, you won't necessarily see much benefit. Pentaho is a for-profit business, and they are savvy about leveraging the benefits of open source software. And let's not kid ourselves, there are considerable downsides to releasing something open source. Your competitors can pick up the software and incorporate your hard work into their suite. And your customers may decide that the free version is so good that they aren't going to give you any of their money.

Open source allows you to bring a component to a wider audience, an audience that will test, document and improve the component, and will support each other on the forums. Only the Community Edition (CE) components get that boost. Therefore, Pentaho's strategy is to release the core functionality in CE. That means the high-performance core of the system, the code paths that get run trillions of times an hour, and that means all the components that are necessary to build a functional and useful BI application.

In particular, people ask me whether there is a high-performance 'Mondrian on steroids' in EE. No there isn't. None of us want to maintain alternative code-paths, because the extra complexity would slow down future development. If I were to create a performance optimization in EE, the community would probably replicate that optimization in CE within a few weeks. Improving the core Mondrian system for everyone brings more people into the community, and that brings more people to EE.

And by the way, this doesn't just apply to the Pentaho Analysis part of the suite. Pentaho adds major new functionality to the suite each release, and most of that goes into open source components.

So, what's left to go into EE? Bells and whistles, things that make the product easier to use, easier to manage, and things that make your boss want to reach for his or her checkbook. And of course support, releases that are certified and indemnified, and more regular. I don't think that's a bad deal, however you look at it.

It also helps if the components are delivered under a business-friendly license like LGPL or EPL. Otherwise you will not attract contributions from OEM vendors, who are the companies with the skills to extend components as complex as Mondrian or Pentaho Data Integration (Kettle). Once again, Pentaho is taking a risk by using business-friendly licenses, because there is always a chance that Pentaho's competitors will scoop up the fruits of its labors. (As in fact they do.)

But Pentaho's faith in the open source process pays off. ClearView is proof of that. If Mondrian had not been available under a business-friendly open source license, LucidEra would probably have written it on top of another vendor's engine, and Pentaho would not have been able to use it. Incidentally, LucidEra has contributed many important enhancements to Mondrian in areas of both performance and functionality over the past three years. This has improved Mondrian for everyone, and we know that ClearView performs very well against Mondrian.

What will happen to PAT?

To restate what I said above, there is a network effect when you make a component open source. The more people that use a component, the more people are going to contribute to it. We want as many people to use Mondrian as possible, and in particular we want the right people to use it (the people who are going to make major improvements).

So, for Mondrian's continuing health as an open source component, we need the Community Edition of Mondrian to be good enough to build business applications on. For that, we need to make PAT successful.

I personally have been laying the ground work for PAT for a number of years. I spearheaded the olap4j API, knowing that the community would be more likely to write the next generation OLAP viewer if it was guaranteed to be portable across OLAP engines. Then I kicked off the halogen project, a collaboration between Pentaho developers and the community to build a viewer using olap4j and GWT. Pentaho developers contributed code and user interface design to that project, even working in their spare time when the current Pentaho sprint used up all of their 'official' cycles. And the PAT project used the halogen code, and the knowledge of the halogen developers, as a starting point.

It's not healthy to have too close a relationship between an OLAP server and viewer. There should always be room for competition, an opportunity to use a new viewer or (gasp!) different OLAP server if the 'standard' one isn't ideal. I created olap4j with competition in mind, and the experiment seems to be working: PAT can run against Mondrian's native interface, Mondrian's XMLA server, and against SQL Server Analysis Services via XMLA.

I want to make it easier to build alternative front-ends on top of olap4j, so I have been encouraging PAT developers to contribute to olap4j's query model and library of transforms. I would like to see Analyzer move to olap4j internally (it currently uses Mondrian's native API), and perhaps migrate some of the logic in Analyzer to olap4j so that we can share the costs of maintaining it with the community.

Lastly, as I realized at the recent community meetup in Barcelona, we have a great team, and we need to harness their energy. After a beer or two with PAT developers Tom and Paul, some inspiring demos from Pedro and Daniel, we hatched ideas of incorporating spark lines and writeback into PAT, and I'm sure the ideas will keep on flowing. With this much inspiration and hard work coming from the community, how can we possibly fail?

Hi Julian,Great explanation, and I hope PAT keeps going so the community can have a good analysis tool. I hope they are also inspired by Tableau.

On another note, looking at the Jasper Analysis which seems to be built on Mondrian, they seem to have connected it to Excel, ala Pentaho Spreadsheet Services. Does this work? Is it possible to connect Excel's dynamic tables to Mondrian?

This would solve alot of the problems our clients have on working with JPivot.

Sebastian, you should nag Jaspersoft about that. As far as I know, their extensions to Mondrian are not released open source. A lot of hot air about open source goodness comes from that company, but all of the code they release seems to be proprietary.

Jaspersoft ODBO Connect can be used for Mondrian, Microsoft Analysis Services and JasperAnalysis. Like most (all?) ODBO drivers, it uses XML/A to communicate to the OLAP server. It is a commercial product, and is not related to Pentaho Spreadsheet Services at all. I don't think Pentaho Spreadsheet Services exists anymore.

JasperAnalysis uses an unmodified version of Mondrian. Any mods or bug fixes Jaspersoft has done to Mondrian have been contributed back to Mondrian. The OLAP UI in JasperAnalysis is based on JPivot.

"A lot of hot air about open source goodness comes from that company, but all of the code they release seems to be proprietary." This is dead wrong. Jaspersoft has the JasperReports, iReport, JasperServer/JasperAnalysis open source projects. We have commercial editions of these, with a variety of functional enhancements, which come as "visible source" - you get the source for the Jaspersoft commercial components with the commercial subscription.

I said "All the code they release is proprietary" because I haven't seen Jaspersoft create any new features in the open source versions of their products in the last two or three years. Sure, their releases contain a lot of open source code, but it's either old code, minor bug fixes, or code written by someone else. New features written by Jaspersoft are released under a proprietary license, not open source.

JasperAnalysis is a flagship product of a self-proclaimed "open source" company, yet Jaspersoft has never contributed anything beyond bug-fixes. The code contribution log speaks for itself. Over the last two years the contributions have been no more than minor bug-fixes. Meanwhile, Pentaho continues to invest in all of its core projects, including Mondrian, releasing a lot of features, the vast majority of them under a permissive open source license.

For two companies that both call themselves "leaders in open source BI", the contrast could not be more striking.

Quote: "All the code they release is proprietary" because I haven't seen Jaspersoft create any new features in the open source versions of their products in the last two or three years.

This is patently not true. JasperReports, iReport and JasperServer/JasperAnalysis open source versions have changed dramatically over that time.

JasperAnalysis uses Mondrian and JPivot, adding value with improved UI, integration, repository, security etc. We have not had a need to add additional features into core Mondrian because it works so well, though I will be providing a major lift to the workbench soon.

I'd say the contrasts between Jaspersoft and Pentaho in terms of business model and approach to community are reducing, as Pentaho is developing a wider split between their open source and commercial versions. It is not just the Pentaho Analyzer - Pentaho v3.0 has many commercial only features.

Thanks for the insight. We're just getting into a Pentaho, and thus Mondrian, deployment. I've downloaded Pentaho 3.5RC2, but do not see this new enhanced OLAP browser. Your post hints that it will replace jPivot. Do you know when or how that will happen? Thanks!

Hi Julian, I wanted to know a little bit more about the Analyzer, specially about it's filtering capabilities. Our customers find it very dificult to search for a specific member to use as a filter, when there are thousands of them in a Dimension. Is there a better way in the Analyzer to do this that is not browsing and endless list like JPivot does?

Analyzer supports many filters including equals, not equals, contains and not contains. You can also combine them for example give me all products that contain "Hardware" (case insensitive) but exclude "Trial Hardware". When creating equals/not equals filters, the filter editor dialog will show only 200 members max but the user can use a "Find" box to search for other members.

Analyzer has many other types of date based filters like Current, Next, Previous, 90 Days Ago, 4 Weeks Ahead, YAGO, QAGO, Before, Between, etc. However, these won't make the current release.

Hi Julian -- just checking all your good work now. I have to say it's very impressive.

I don't think you're being fair to Jaspersoft, however. Lots of companies build product on top of the Mondrian community-based project, and Jaspersoft (Sherman Wood exceptionally so) contributes back. You also know that Pentaho doesn't own all the copyrights, and therefore doesn't and cannot "own" Mondrian (though they are good stewards under your administration).

I don't think Jaspersoft provides "hot air" about open source. Your justifications for commercial open source are exactly the same as Jaspersoft's (and Pentaho's and many others'). Enterprise and OEM customers appreciate the commercial open sorce model -- they know we're capable of supporting them, that we're going to be in business for the long-haul AND they can get the source code without legal delays. (Enterprise customers commonly request "closed" source code be put into escrow in case the software vendor goes out of business or forces an unwanted upgrade, for example. This is a hassle for everyone.)

Jaspersoft has smart, experienced, dedicated people working to make the community edition releases robust, stable, and easy to install. This software is delivered by senior people and they are GOOD.

So please don't denigrate the efforts of our engineers, QA, tech pubs, and the execs who insist on delivering high-quality open source community projects at the expense of closed-source bells-and-whistles. Believe me -- we invest in the community.

You are absolutely, 100% correct. The Jaspersoft subscription agreement is not open source. It is very clearly a proprietary license. Our community edition builds are released (full binary installations and open source code) under the GNU GPL. Why? We can do this because we own the copyrights. Why don't ALL open source companies follow this model? Because they can't -- they don't own all the rights. And yes, there are opinions on this: http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#ReleaseUnderGPLAndNF. I'm not going to defend either point of view. It is what it is.

Jaspersoft is a Commercial Open Source company. Supporting our community is a fundamental aspect of our business. We're not religious about who is "better" or "more" open source. We just are. And that's it. As for CSV exporter. You got me stumped.

Mary FlynnJaspersoft

P.S. Here's the code you wanted: http://jasperforge.org/scm/viewvc.php/tags/js-3.5.0/jasperserver-war-jar/src/main/java/com/jaspersoft/jasperserver/war/CSVServlet.java?root=jasperserver&view=markup. It looks like the header is quite wrong, but the source is clearly open and accessible. I'll enter a bug report so it uses the correct GPL header.

You're not making much sense to me. You say you're not religious about who is "better" or "more" open source. Are you saying that anyone who points out that an awful lot of what Jaspersoft -- a self-proclaimed open-source company -- releases is not actually open source just being pedantic? Whether something is open source actually very important.

And the open source community agrees. That's why they went to the trouble of creating the Open Source Definition, to spell out precisely what can be called open source.

I see you're trying to whip up some FUD about the fact that Pentaho doesn't own copyright. Copyright is not all it's cracked up to be. It gives the owning company power -- mainly the power to release under a license, in particular under a commercial license -- but also gives that company power to do things not in the interests of the community, such as taking the whole project closed source, or refusing to grant commercial licenses to partners that it now considers competitors. A commercial open source company can be successful without owning copyright: for instance, Redhat does not own the copyright on the Linux kernel. Linux is in good shape because its copyright is spread among thousands of contributors. Contrast that with MySQL, whose future is in doubt because Oracle now owns 100% of its copyright.

Regarding CSVServlet.java. It sounds like there was an honest mistake in the header, and thanks for fixing it. But your remark "the source is clearly open and accessible" betrays your ignorance of software licensing and open source licensing in particular. Just because a piece of code is released to the public and is accessible on a web site does not mean that it can be freely. (Sun's JDK ships with full source code but was not, until recently, open source.) It would be irresponsible for a developer to use any of that code in an application. That is why the license is so important.

The point I was trying to make in my original post was that an open source project has significant advantages over one that is not open source -- in that it attracts contributions in the form of features, bug fixes, testing, documentation and support from the community -- but that those advantages dry up pretty fast if important parts of the project are taken closed source. Commercial open source companies, including Pentaho, have to deal with that dilemma.

I am not "denigrating" Jaspersoft's engineers, QA and tech pubs. Jaspersoft's management have made the decision to put less and less code into open source, and they have every right to do that, but as less of value is placed in the community edition there will inevitably be fewer contributions from the community. As you put it, "it is what it is", or to paraphrase, "you reap what you sow".

First, I want to reiterate my original comment -- I am very impressed with the work you do and am disappointed we're in an argument. Frankly, I'm trying to get out of the tit-for-tat.

I suppose my objection is that you are a very respected architect and your words have a lot of weight. Because of that, when you put out a strong opinion about Jaspersoft, I feel there needs to be a voice to represent the other point of view. I'm not trying to convince you or anyone else that you're wrong or I'm right -- we just take different approaches.

I was a contractor working independently with several different vendors for a number of years. I took on full time work with Jaspersoft in March because I believe in the products and the company. And yes, the open source business model is difficult. You yourself made a justification for close-sourcing some of your recent code. It's no different.

What fud did I put out there re: Pentaho? I try to be respectful. I just mentioned that Pentaho does not own all the Mondrian copyrights which is absolutely significant when you look at the licensing options. And no, I'm not ignorant. That was a cheap shot.

Anyway. I honestly don't think this level of vendor bickering is good for the community. I'm not going to go out of my way to complain about other products or companies, but I will continue to provide another point of view when I feel it's necessary.