Open source data modelling

The open source movement continues to ripple into the business intelligence field. Now you can get hold of a data modelling tool that is open source rather than having to buy Erwin, thanks to a Canadian consulting firm called SQL Power Group. I am not familiar with this company but as a consulting firm this seems to be a sensible move, since after all they would not be maintaining the tool as a proper software product anyway, but adapting it to each client’s need on site. By making it open source they gain publicity, and may encourage others to improve the tool in which they have skills. So now you can have a data modelling tool, a database (mySQL), and assorted ETL and BI tools (Pentaho Greenplum, Jaspersoft etc). The penetration of open source is a matter of some debate. While Aberdeen Group reckon 18% of firms are now using open source BI tools, it is less clear what the level of penetration actually is within companies. It is one thing to have a small departmental pilot running, another to commit wholeheartedly to open source tools on an enterprise-wide scale. Clearly it would be a brave company who went aggressively down this path, as you are essentially taking on a major customisation and support project. Certainly you will save some licence fees, and that is not to be sneezed at, but it is less clear what trade-off there is in terms of customisation costs against these savings. I suspect that in the short term, at least, the main effect will be for enterprises to use these tools as a stick to beat traditional reporting vendors when it comes to price negotiations. This will certainly have some negative consequences of profitability for the likes of Business Objects and Cognos if the movement really takes hold and becomes a credible threat. At present I have not really seen this happening in my own experience.

I would invite anyone who has direct experience of an open source BI project to comment here on your experiences, good or bad. I could be wrong but my guess is that I am not expecting a blizzard of replies, despite the emerging interest. Interest is not the same as deployment.

Great article. I agree with another poster that most companies want some form of customization. Whether they use an expensive vendor specific product or a ‘free’ open source product really doesn’t matter.. Some companies might want less custom features than others, but I’ve yet to work on a project where it was just plug and play. Like the other person said…most consultants wouldn’t be working if nobody wanted custom options.

I think the bigger problem in IT is a lot of people start to assume they have the “be all end all” one size fits all packages. But if you’ve worked in IT long enough, you’d know it never works like that and in the end, either the company gets pissed and goes back to what they know or picks another vendor or open source choice. Or they try to find a way to customize it themselves.

Whether it’s open source or from a vendor there really is no such thing as one size fits all in IT. If you assume that’s the case, you might be in for a rude awakening when people start proclaiming your package or application isn’t very good. The end users want what they want and just because Department A from Company XYZ likes something doesn’t mean Department B from Company ABC likes the same thing. Yeah you can force it on them….but that will only result in a product nobody likes or might not even use. ANd if nobody uses it or likes it, companies sure wouldn’t push to use it and employees or contractors who move around a lot in IT sure won’t push to use it elsewhere.

If you can’t customize, essentially the product becomes useless for what most people need.

I agree that going open source is sometimes the best option in the long run, but it’s not always the best option. It really depends on the company, the end users, and the skill sets.

As far as Jonathan’s comment, he makes a lot of good points, but I disagree with his theory that the end users rarely customize their applications. I’ve yet to work on a project where something wasn’t customized. Whether it’s BI, data warehousing, web applications, cms, tracking software, data mining software, inventory applications, mapping software, or whatever else, every single end user and company wanted something customized.

A lot of these open source companies make their money because of customization. I’ve worked on a lot of projects as a consultant over the years and it really hasn’t made a difference whether the end users went with vendor specific or open source software. They all wanted something customized to fit their needs.

There is a reason why thousands of people in IT have made careers out of being consultants. If Commercial Off the Shelf or Open Source Software were just plug and play, most consultants wouldn’t have a job. But lucky for myself and the thousands of others, we customize, build, fix, and do whatever it takes to make sure the end users application works the way they want it to. Like I said, if nobody needed things customized, many people in IT wouldn’t have a job.

Maybe in your company they don’t need to customize, but I’ve worked on projects for banks, insurance companies, aerospace companies, defense contractors, government organizations, retail companies, wireless companies, and so on and every single one of them needed some kind of customization. It would be nice to think the world of IT could really be plug and play, but the reality is, the majority of applications need to be customized to fit whatever the company needs. And since that requires experts in the various technologies, whether it’s open source or vendor specific, it means I and many others keep working.

I’m the product manager for the Power*Architect tool at SQL Power. Not being on the consulting side of things here, I can’t give you any first-hand stories about BI projects based on open source software, but I know some of my colleagues on the consulting side of things could.

Although I agree with much of what you’ve written, I would challenge your assertion that fully adopting an open source toolchain amounts to committing to a major customisation and support project. In reality, the end users of open source products rarely customise them, because the need rarely exists. At the same time, it’s comforting to know that the option exists, and if the need arises, you are not locked into purchasing custom features from a single vendor.

As for support, guaranteed support never comes for free. Every open source project that I can think of (including ours) does offer best-effort forum-based support for free, and this is sufficient for most users (commercial and otherwise). However, if an organisation requires guaranteed support for all the tools they use, there is often somewhere to purchase it from. This is true for MySQL, PostgreSQL, various Linux distributions, the Pentaho suite, and even our tool.

In my own experience, the real advantage in using open source tools is that they are often planned and implemented by people who are also heavy users of the tools themselves. You can see evidence of this in the finer details of usability and convenience, which were put in by various developers of the tool over the years for their own convenience. I don’t normally quote Eric S. Raymond, but I agree with his statement: “Every good work of software starts by scratching a developer’s personal itch.”

Contrast this with typical proprietary commercial products, whose planning and development phase is carried out separately from the end users. Often there is more emphasis placed on filling up the feature list supplied by the sales or marketing departments than on the finer points of usability, which is only natural given the environment the product is developed in.

Finally, I’d like to say thank you for mentioning us in your blog. We’re really happy to see people from around the world taking note of our new open source licensing on this tool. It’s something I’ve been looking forward to for some time now!

It seems like this could be the thin edge of the wedge when it comes to commoditization of BI and data warehouse tools. Your post is also timely in that, at least according to Boris Evelson of Forrester, there seems to be quite a lot of innovation in these areas of information management (http://www.forrester.com/Research/Document/Excerpt/0,7211,42637,00.html – caveat, I have only read the executive summary).

One other thing that would mention is how over the past decade, open source advocates have often sold the risk management virtues of adopting open source (i.e. less dependency on a single vendor). Given my view of the market, I don’t know that this is as much of a concern for well established BI shops but I have no doubt that there are plenty of organizations out there which would see this as a significant advantage.

The open source BI tools have a way to go in terms of out-of-the-box usability.

Where I think we might see some relatively fast uptake is in the arena of ETL tools- particularly Talend and to a somewhat lesser extent Pentaho Kettle. These are good and more than sufficient for many deployments. We’ve done a couple of implementations in organizations using those two.

Of the two, I think Talend has the edge at the moment. It is not only very slick and has a good development and debugging UI but has quite a sophisticated architecture. I cannot speak to ultimate scalability in terms of data volume but there are an awful lot of organizations where transactional volumes are not big enough for that to be more than a minor concern.

Alex

Comments are closed.

Post navigation

Andy Hayler

Andy Hayler is a passionate and outspoken commentator on the enterprise software market. A 20-year veteran of data modelling, warehousing and integration projects, he was named a Red Herring Top 10 Innovator in 2002 for founding Kalido – an innovative information management company that provides customers with the ability to dynamically view the impact of business changes. The views expressed on this blog are Andy’s own, and do not necessarily reflect the views of The Information Difference.