My Review of the ETL Vendor Comparison Report from WWW.ETLTOOL.COM

Taking a look at the ETL product survey for 2009 from Dutch firm Passionned International – it costs 425 Euros but you can read my review for free.

There are precious few ETL tool comparisons available online because they are very hard to compile and keep up to date. Normally companies narrow it down to a handful and do either proof of concept or presentations from each vendor to work out what they want. Passionned take a systematic approach to comparing each ETL tool – though they rely on some honesty from vendors and the product users they interview. It's also a great place to get a fix on DataStage versus Informatica or Pentaho versus Talend or SAS versus Business Objects in data integration.

The report is available at www.etltool.com for 425 Euros, and if you are wondering about the exchange rate that works out to be about 90,000 Icelandic Kronurs. That gets you a copy of the report that is up to date:

Publication Date 5 May 2009

Revision Date 21 July 2009

With a product review comparing the following products and versions:

No.

List of ETL Tools

Version

ETL Vendors

1

Oracle Warehouse Builder (OWB)

11gR1

Oracle

2

Data Integrator & Services

XI 3.0

Business Objects, SAP

3

IBM Information Server (Datastage)

8.1

IBM

4

SAS Data Integration Studio

4.2

SAS Institute

5

PowerCenter

8.5.1

Informatica

6

Elixir Repertoire

7.2.2

Elixir Newest version examined

7

Data Migrator

7.6

Information Builders

8

SQL Server Integration Services

10

Microsoft

9

Talend Open Studio

3.1

Talend Newest version examined

10

DataFlow Manager

6.5

Pitney Bowes Business Insight

11

Data Integrator

8.12

Pervasive

12

Transformation Server

5.4

IBM DataMirror

13

Transformation Manager

5.2.2

ETL Solutions Ltd.

14

Data Manager/Decision Stream

8.2

IBM Cognos

15

Clover ETL

2.5.2

Javlin Newest version examined

16

ETL4ALL

4.2

IKAN

17

DB2 Warehouse Edition

9.1

IBM

18

Pentaho Data Integration

3

Pentaho

19

Adeptia Integration Server

4.9

Adeptia

This has a mix of premium (expensive) ETL tools like DataStage and Informatica PowerCenter, cheap tools like Pervasive or IKAN and open source tools like Pentaho, Clover ETL and Talend.

Oracle Data Integrator was in a previous version of the report but has bailed out from this one – although Oracle Data Warehouse remains in there. I’m sure the other data integration vendors love seeing Oracle sit out on the sidelines! Expressor Software has been invited to participate but has not yet responded which is kind of odd for a company that posts a lot on social network sites.

The product evaluations were done via vendor demonstrations and a vendor questionnaire and presentation. There was also feedback from developers and users although if you want feedback on the open source tools you can just Twitter a question.

The tools were evaluated based on:

Functionality

Ease-of-use

Connectivity

Platform support

Architecture

Market stability

Growth potential

What did I learn?

There is more information in this report than I could have compiled myself off the vendor websites. There is more information than is in the Gartner Magic Quadrant for data integration tools – there is useful information about open source products for a start! I cannot show you the results of the report, if I blabbed all over the internet they wont be able to charge money for it any more! I will release some interesting facts from the report to give you a flavour of what it holds.

The report shows some of the differences between premium ETL tools that cost a lot of money and those that don’t. There was a survey question sent to each vendor on whether the ETL function supported Symmetric Multi-processing (SMP) and Massively Parallel Computing (MPP) and some categories for Data Pipelining and Partitioning. The three most expensive ETL products on the list were the only ones to answer Yes to all these questions so you do get some value for your license price when it comes to scalability.

I found the platform, connectivity and parallel processing sections quite interesting along with the commentary about each vendor. There are a number of tables of scores and criteria and a score breakdown and a couple of paragraphs about each of the main vendors.

Possibly the most complimentary or damning measure in the report is user friendliness – if a tool doesn’t get a good score here I can’t see how a reader would consider it for further evaluation.

There is one tool that was recently given very high praise from an analyst report that gets the worst usability score of any tool in the survey.

There are cheap or open source tools that do not rate on the Gartner Magic Quadrant that rate well in this survey.

Cognos Transformer gets some bad scores and you can see why Cognos and now IBM are not too fussed about pumping money into it when they can send customers in the direction of the Information Server.

How Accurate is it?

Not all the marks are driven by vendor answers – there is some tool evaluation and user survey information in there for balance.

There is a category called “Native connections” which is a count of the connectivity to databases via a native connection or client interface (IE not ODBC). I think some vendors have a creative way of counting in this category – for example you can have several ways to connect to Oracle – API, JDBC or Load. There are also several ways to connect to Teradata. You could also count connectors for different versions of databases. I think this category should be split across the number of databases that support native insert/update connectivity and the number of databases that have built in support for bulk load and the number of databases that have built in parallel insert/update support. It doesn’t have a big impact on the overall ratings as you don’t score bonus points beyond a certain threshold.

I’m not sure I believe some of the vendor answers and I can spot a couple of them that look fudged. The functionality and usability scoring makes up for that where it comes from non vendor sources.

There are some honest evaluation of the tool, they do not shy away from pumping up or dumping on a product:

Product X is a very complete ETL tool; but it appears to be relatively expensive

A reasonable product that doesn’t excel at anything except the price.

Product Y continues to be the most user-friendly ETL tool on the market today.

Open Source de ja vu

A few months ago I posted ETL Benchmark Favours DataStage and Talend about how an ETL benchmark report from an independent tester had open source data integration vendor Talend trouncing fellow open source vendor Pentaho on performance in just about every category. In this report Talend and Clover ETL score better marks in most categories than Pentaho Data Integration (Kettle project). There are problems in Pentaho land. They are an open source BI and data integration vendor, it’s rare for a BI vendor to maintain a good data integration suite – the focus is usually on putting your best people onto the more lucrative BI development. The Pentaho KETTLE project has data discovery, profiling and cleansing on the wish list for a future release while Talend are delivering those functions right now.

Most Improved Players

Cover ETL and Talend would both head up a most improved player list for 2009 enhancements. It’s interesting to see that the five IBM software products on the list on the Information Server shows any improvement between the Spring 2007 report and the Summer 2009 report. Being acquired by IBM does not seem to do much for these data integration products and the SPSS data integration tools are now in the same basket.

Is the report worth buying?

If you have narrowed down your ETL search to two or three products you might not get much out of this report – but if you have an open mind this can help give you a run down of the main players. It is handy to have all the information gathered together in one place and to have open source compared against premium ETL – especially the list of supported platforms and ETL functions.

If you are weighing up open source versus premium ETL it’s a good comparison report as the feature comparison and usability scores give you an idea of what to look for when you run your own evaluation.

What is missing?

Not much in this report about product support – but then that’s dependant on what region and country you are in and what training and resources you can find to help you out with it. On top of reading this report you want to validate whether the tool and architecture can handle your data volumes, whether it supports your data latency (batch versus real time / micro batch / SOA) whether it is compatible for the versions of your databases and operating system, whether you have the right support staff for the underlying repository and transformation coding language (if there is one).

I would like to see more categories around ETL control and auditing and ETL governance such as automated restart, canned operational metadata reports (row counts and job statuses tracked and graphed over time). I would move the scheduler and automatic documentation into this category. I would like to see some type of score around the ease of administration – the setting of things like user ids and passwords, the verification of error and warning messages, the deployment from DEV to QA to PROD etc.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Vincent McBurney is an IBM Information Champion for Information Integration.

Related White Papers

21 Comments

Interesting article, but I wouldn't call Cognos Transformer an integration product - it doesn't even describe itself as an ETL tool. It's designed to be an OLAP modelling tool.
I'm not sure what your comment was regarding the IBM acquired products - are there some words missing from that sentence (under Most Improved Players)?

Ab Initio refused to participate in the comparison. They are notoriously secretive and I doubt we will ever see them in an open vendor evaluation or benchmark. If you are a potential customer you need to sign a non-disclosure before learning about the product. They pitch themselves as an application processing platform - more than an ETL tool - so they may not like being compared to ETL tools. I am surprised Syncsort and CoSort were not in the report given lesser known tools were included. I think it a challenges for those vendors to promote themselves as more than a sort tool!

SAS got an excellent review in the report. No mention was made of data federation/EII but from memory the vendors who have EII tools are IBM with Federation Server and I think Sybase has one. I get the impression EII has been overtaken by change data capture with IBM acquiring DataMirror and Oracle acquiring GoldenGate.

"Oracle Data Integrator was in a previous version of the report but has bailed out from this one ? although Oracle Data Warehouse remains in there. I'm sure the other data integration vendors love seeing Oracle sit out on the sidelines!"

No but I am sure they love the misleading rumors!

Integrator will be part of OWB in future releases. It's never really been a serious ETL tool by itself.

The best ETL tool is the one for which you can most easily find experienced developers. Beyond that, all other comparisons are nonsense.

For TALEND - The whole Open source stuff is all CHEATING and a LIE !! You just get the tool and which is greatly incomplete. You cant even schedule jobs, which is a must have. And if you want a scheduler then you gotta pay 15K a year. ARE YOU KIDDING ME ?? 15K a year ???

We started using it and then at the end of the project we came to know the High Price of the tool, Our clients are really unhappy. One client even asked us to move everything to SSIS and without any pay. we had to do it.

I have worked with DataStage (7.5 to 8.1 parallel), Informatica, and Ab Initio. Hands down, Ab Initio is better in every way. It has excellent performance, stable as a rock, very quick to implement medium to complex programs, maintenance of code is easier than the other tools as well. I have no idea how much more it costs than DataStage or Informatica, but the savings in the number of developers that you need to build the same application is huge! Informatica and Datastage have clunky interfaces that make it quick to build basic applications, but as soon as there is any complexity then they are a nightmare to work with. Ab Initio relies on it's own scripting language (very easy to learn), but you can implement more complex logic in no time.

Personally, I have tested a few of the top 19. I am glad that Centerprise from Astera Software made the recent lists because it is the easiest to use and the one that gave me the fewest headaches when using it.

If you guys are interested, test out their demo. After I did, I talked to a sales agent and me and two other business analysts learned how to use it quickly and we finished our data migration project in one year instead of the one year and eight months goal we had.

Thank you for for the information. I am trying to figure out which ETL tools are best for supporting real-time (or near-real-time) data warehouse requirements. I would imagine that ideally the tool would be able to read database transaction log and trickle-feed the changes to the DW. Does report from etltool.com contain this information? If not, any idea where I can find this kind of comparison?

These are normally treated as two separate products - the ETL/ELT tool and the database replication tool. I doubt you will find a good comparison of database replication in any ETL comparison. IBM InfoSphere Change Data Capture for reading database logs and supplying real time transactions. It has a plugin for DataStage so that CDC can read transactions and pass them through to DataStage to transform and load them to your Warehouse. There are not many ETL tools that can share data in memory in real time with a database replication tool for a real time Data Warehouse.

Disclaimer: Blog contents express the viewpoints of their independent authors and
are not reviewed for correctness or accuracy by
Toolbox for IT. Any opinions, comments, solutions or other commentary
expressed by blog authors are not endorsed or recommended by
Toolbox for IT
or any vendor. If you feel a blog entry is inappropriate,
click here to notify
Toolbox for IT.

Vincent McBurney is an IBM Champion for Information Integration and has been blogging for many years on InfoSphere software and ...
more

Vincent McBurney is an IBM Champion for Information Integration and has been blogging for many years on InfoSphere software and competitors in Information Management, Governance, Data Integration and Data Warehousing.
less

Receive the latest blog posts:

Share Your Perspective

Share your professional knowledge and experience with peers. Start a blog on Toolbox for IT today!