SPOTLIGHT: Commercial Open Source Business Intelligence - A Q&A with Richard Daley of Pentaho

BeyeNETWORK Spotlights focus on news, events and products in the business intelligence ecosystem that are poised to have a significant impact on the industry as a whole; on the enterprises that rely on business intelligence, analytics, performance management, data warehousing and/or data governance products to understand and act on the vital information that can be gleaned from their data; or on the providers of these mission-critical products.

Presented as a Q&A-style article, these interviews with leading voices in the industry including software vendors, end users and independent consultants are conducted by the BeyeNETWORK and present the behind-the-scene view that you won’t read in press releases.

This BeyeNETWORK spotlight features Ron Powell's interview with Richard Daley, founder and CSO of Pentaho. Ron and Richard discuss the advantages of commercial open source business intelligence and user-driven BI.

Richard, Pentaho has been a leader in business intelligence [BI] from an open source perspective. Why should an enterprise consider an open source solution versus one of the many traditional BI solutions on the market today?

Richard Daley: That's a reasonable question especially based on how many product offerings there are out there. I could bowl down three or four categories pretty easily, but value is the number one reason people come to Pentaho. They’re looking for a great value proposition. They also come to us because of our innovation, for community in terms of community contributions, and then also for the customer support and service that we provide. Looking at value, we're 90% less expensive than many of the traditional BI vendors. If we look at innovation on the direct side of the business, we were out there leading in visualization. We brought iPad and iPhone support to market faster than any of the big traditional vendors. We brought support for big data. If you think about big data being high performance data warehouses as well as Hadoop, we brought that to market faster than traditional vendors. Additionally, we provide broader and deeper support for big data initiatives. A lot of the innovation comes out of our commercial open source environment. And then for OEMs, our modern architecture makes it much easier for them to embed our technology than some of the traditional vendors.

The other thing I mentioned, and I'll touch on it again, is our customer support. In several different surveys, we were rated best in service and support over our traditional BI competitors. The bottom line is that we get our customers up and running as fast as possible and once they're in production, we keep them up and running.

Those are the key reasons why people choose commercial open source and Pentaho.

Can you tell us a little bit about the Pentaho community? How big is it now?

Richard Daley: That's a great question, and it depends on the definition of community, but I'll give you ours. The community is very vibrant in terms of actual members. The way that we track members is people who have been active within the last 60 days with some type of contribution or activity, not merely downloading something. That's strong with over 45,000 active community members. If you look at people who've signed up in our community over the last couple of years, then you're looking at a couple hundred thousand, but I think realistically you have to look at who's contributing. We provide end-to-end business intelligence from data integration through visualization, and we find more of the contributions in terms of code and QA contributions coming from, if you will, below the cover – more on the platform and the data integration side. We tend to get more contributions from that side because there are more technologists working with those pieces of product. But people who are above the platform – the analysts, end users, the administrators – provide a lot of QA, translations and different things like that. The community also provides a lot of viral sales and marketing. People find out about us by word of mouth, Google searches and things like that, and it helps build out the whole ecosystem a lot faster than a traditional enterprise company could.

The term business intelligence came into vogue back in the middle to late '90s, and since then the industry has successfully evangelized its benefits. But business intelligence has matured quite a bit and is no longer just the “territory” of power users. Can you give us some reasons why BI has moved as fast as it has and what Pentaho has done to make it more user friendly?

Richard Daley: That's a reasonable question. In terms of BI, if you look at all the surveys that were done over the last decade, whether from Gartner or Forrester or whomever, business intelligence has always been in the top three or at least in the top five top of projects and priorities for CIOs. There's an obvious reason for that. More and more companies are realizing that data is a valuable asset. How are they going to get any value out of the mass quantities of data? And even if it's not mass quantities, how are they going to get value out of whatever data they have? Data sitting inside of an RDBMS or a data warehouse isn’t going to do anything. Business intelligence is the whole process of accessing, integrating and optimizing the data so people can use the BI tools to find out, for example, where their customers are coming from, why they are coming, what are the trends and patterns, how can they get more customers, how can they drive up average selling prices, or how can they make the sales reps more productive. So in terms of why BI continues to grow, it's because data is becoming a bigger and bigger asset for all companies of all sizes.

Now, what have we been doing to help that move along? The whole reason we founded this company was to push business intelligence out to more and more people vertically as well as horizontally, not only across organizations but across geographies. Open source and commercial open source has allowed us to do that. We wanted to disrupt the space. We wanted to be able to push BI out there. The number one reason why BI projects stall after their first phase is cost, and commercial open source allows us to be more cost effective. From a tech standpoint, and I'll try to keep my answer short here, you have to make the onboarding and the ease of use of your products top-notch. That has to be first and foremost. We've invested heavily in visualization capabilities and ease of use, and that has helped us hit the growth trajectory that we're on currently.

That sounds great. What are you hearing from your customer base now? What are their biggest needs?

Richard Daley: Our customers are a mix of direct customers as well as a lot of OEMs. More and more parts of our business revenue-wise are coming from other software companies, SaaS-based companies, cloud-based software providers. I'll try to hit those two things separately.

The biggest needs our direct customers have again relate to value. I hate to say recession is a good thing, but it has forced people to look hard at receiving value for dollars spent. When the recession hit late in 2008, we had about a three- to six-month lag, but after that our leads, opportunities and business have just taken off. Companies were looking to reduce costs but still retain their functionality and capabilities. That continues on even through today, and with the recent downgrading of the US AAA credit, I think it's just going to continue to push people to look for value.

That's on the direct side. The other thing that we hear on the direct side is that people like the fact that we provide everything from data to reporting to analytics, and then cover all of those capabilities across a wide variety of data sources. We look at this as a pyramid of atomic data at the bottom, stored in something like Hadoop, then relational is the next level up through high performance and all the way up through in-memory cache. We provide access to all those layers and very broad coverage within those layers.

If you take, for example, high performance analytics, we have native drivers and support for HP Vertica, EMC Greenplum, Teradata Aster Data, VectorWise, Infobright – the whole list. We've done a lot of work and made investments for years to make sure that our customers can choose one tool that's going to span functionality and data access.

The other part is our indirect customers. The OEMs love our tech. Cost is obviously one thing, but they also love our tech because of the modern architecture and the flexibility that allows them to go through and integrate with their own environments. Especially in the last 3 quarters, we've seen a lot of SaaS-based software companies, cloud-based, flocking over to us now because of our architecture and the fact that we run great in that kind of environment. That's why they're coming to us.

So let's talk a little bit about the technology. You’ve recently released Pentaho BI 4 and you added a lot of new data visualization features. What's the reason for that and what are the benefits of visualization that make it superior to standard reporting methods?

Richard Daley: Today’s end users are looking for a very rich, interactive environment where they can quickly spot trends and visualize multiple dimensions without having to go through training and without having somebody build these things for them. For a long time, people called this self-service BI, but now it's more referred to as user driven. All these visualization capabilities are aimed at making it easier and faster for non-IT people to do these types of analysis. Why is it better than the typical reporting? Typical reporting is production reporting, operational, and even interactive reporting. But it doesn’t have that ad hoc exploration mode. If you're really going to do some discovery around your data sources, the type of visualization that we came out with in Pentaho BI 4 is geared toward exactly that capability. We handle all the reporting like I mentioned before, and now our new visualization capabilities give our users that true discovery and exploration capability.

Most executives are very visual, and BI is rapidly becoming more entrenched at the executive level. When most executives look at a standard report, they always want more. With visualization, they have the ability to see more. Trends just pop out that you can't easily spot in a technical report, don’t you think?

Richard Daley: You're absolutely right. I can tell you whenever I'm looking at reports or going into meetings, whatever it might be, if people provide me with a simple grid or even a chart of data, it's completely out of context unless I know the trend. Is this number going up or is it going down? How was it compared to the same time last year? Is something seasonal? What if I slice it geographically? Everything has to be put into context. Simply putting numbers onto a grid just doesn’t do it anymore, and even having some traffic lighting doesn’t do it either. You have to be able to really see more depth and more dimensions to that data than we've ever had before.

In the new release, you also talk about data integration-as-a-service. Obviously, back in the old days, we had a data warehouse, everything was integrated, and we had what we called the single version of the truth. But over the last three to five years, data virtualization has become very important. Is this something to answer the whole move toward data virtualization?

Richard Daley: It definitely is, and history repeats itself. A long time ago we used to just go directly against operational data stores and then came data marts and data warehouses, but now there is the need to have a mixture of the different types of data storage and data retrieval technologies as well as methodology. With data virtualization, you're not going to go through and build an entire BI platform on top of that. There are going to be certain apps, certain use cases, certain reports, things like that, where you want to pull data in from multiple sources, more in a real-time environment and feed that live into reports and analysis. And the fact that we've got the data integration technology that we've now extended, we can do these things in terms of virtualization on the fly. It provides great flexibility for our customers and our community members to choose how they want to go through and access that data – direct or in some type of a staged environment. So it's flexibility.

The other trend we're seeing too is BI has really pushed out to what we call pervasive BI or operational BI. It’s extending out into the enterprise and it almost necessitates doing things from a virtual perspective. Would you agree with that?

Richard Daley: Yes, I think the further you push it out, the more use cases you're going to find. And most likely the further out it gets, the more likely you're going to need some access for real-time data because people outside of certain use cases are going to need this in context to whatever applications they're currently working with. And if you want to have things that are in context and want to have a real-time environment, then the virtualization and the data service that we provide is definitely going to be a better fit than trying to stage data and even update it once an hour, overnight or things like that, which are great for dashboards and great for traditional BI needs. But you hit on it, as it gets pushed out, and especially closer and closer to the operational type of a work environment, you have to have access to these things in real time.

I agree that response times are key for ad hoc query. What are you doing in the areas of in-memory? Is there anything in the new release in that area?

Richard Daley: We've always had in-memory capability within our ROLAP engine in terms of being able to cache mass quantities of data. The fact is a lot of our customers are deploying larger sets of information, larger data marts and things like that, and it’s very easy for us to go through there and keep those things in cache. You'll see some future announcements coming about how we handle that. But, specifically, a lot of what we've seen there has been through our partnerships with, as I mentioned before, HP Vertica, EMC Greenplum, the Teradata Aster Data, Infobright and VectorWise.

When people are looking for great high performance environments, there are, as I said earlier, different levels of that data pyramid where use cases will drive what data warehouse technology they should use. While we can go through and provide in-memory via cache, there are also great ways we can speed the performance based on what their data set needs are. So based on how big those data sizes are, they can go through and address their performance issues and have scalability and better admin and things like that around high performance queries, much more so than with some purpose-built, in-memory technologies that are out there today.

Well, one last question for you. What do you see over the next three to five years in business intelligence? What's coming?

Richard Daley: So the crystal ball question.

The crystal ball, yes!

Richard Daley: I think what we've found is it all revolves around how we as a company or community, or even industry for that matter, can on-ramp and onboard people faster and faster into BI technology. I won't even put Pentaho on the pedestal here. I'll talk about this as an industry. We all have a lot to do in terms of how we can make it easier for people to go through the whole process of installing and/or creating an instance in a cloud and get data, access the data, whether that's on-premises or moved into the cloud or access other cloud data sources that are already there. That's where we need to go as an industry to break down the barriers and speed up that onboarding process.

We already do those things. I said I wouldn’t make this a platform for me to talk about Pentaho, but you can just tell – based on my answer about where I think the industry needs to go – that's where we're going to be pushing for the future. Obviously, our cloud technology is going to help that, our data discovery and visualization is also going to help that, and then there’s mobile as a deployment and access device. Those are the key things I see moving forward that are going to be big for the space. Those are the things that we're looking at and where we plan to break down the barriers for our customers.

That’s great Richard. I really appreciate all your input for our readers. Thank you very much.

Ron PowellRon, an independent analyst and consultant, has an extensive technology background in business intelligence, analytics and data warehousing. In 2005, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010. Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). Ron also has a wealth of consulting expertise in business intelligence, business management and marketing. He may be contacted by email at rpowell@wi.rr.com.