A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

*If you believe that Splunk is a data integration company, that changes these observations only a little.

On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.

Time for another catch-all post. First and saddest — one of the earliest great commenters on this blog, and a beloved figure in the Boston-area database community, was Dan Weinreb, whom I had known since some Symbolics briefings in the early 1980s. He passed away recently, much much much too young. Looking back for a couple of examples — even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in terminology.

2. When I relayed Cloudera’s comments on Hadoop adoption, I left out a couple of categories. One Cloudera called “mobile”; when I probed, that was about HBase, with an example being messaging apps.

The other was “phone home” — i.e., the ingest of machine-generated data from a lot of different devices. This is something that’s obviously been coming for several years — but I’m increasingly getting the sense that it’s actually arrived.

Informatica, Splunk, and IBM are all public companies, and correspondingly reticent to talk about product futures. Hence, anything I might suggest about product futures from any of them won’t be terribly detailed, and even the vague generalities are “the Good Lord willin’ an’ the creek don’ rise”.

Never let a rising creek overflow your safe harbor.

Anyhow:

1. Hadoop can be an awesome ETL (Extract/Transform/Load) execution engine; it can handle huge jobs and perform a great variety of transformations. (Indeed, MapReduce was invented to run giant ETL jobs.) Thus, if one offers a development-plus-execution stack for ETL processes, it might seem appealing to make Hadoop an ETL execution option. And so:

I’ve already posted that BI-plus-light-ETL vendors Pentaho and Datameer are using Hadoop in that way.

Informatica will be using Hadoop as an execution option too.

Informatica told me about other interesting Hadoop-related plans as well, but I’m not sure my frieNDA allows me to mention them at all.

IBM, however, is standing aside. Specifically, IBM told me that it doesn’t see the point of doing the same thing, as its ETL engine — presumably derived from the old Ascential product line — is already parallel and performant enough.

Enterprises should each have a variety of different analytic data stores.

Vendors — especially but not only IBM and Teradata — are acknowledging and marketing around the point that enterprises should each have a number of different analytic data stores.

In addition to having multiple analytic data management technology stacks, it is also desirable to have an agile way to spin out multiple virtual or physical relational data marts using a single RDBMS. Vendors are addressing that need.

Some observers think that the real essence of analytic data management will be in data integration, not the actual data management.

Both Pervasive Software and Cast Iron Systems told me recently of fairly pure cloud offerings. In this, they’re joining Informatica, which started offering Salesforce.com integration-as-a-service back in 2006. So far as I can tell, the three vendors are doing somewhat different things. Read more

Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.

The absence of data cleaning/transformation features that might complicate things.

Cast Iron still believes in all that.

Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read more

I’m getting a flood of press releases today, because many of the companies I write about were selected to Intelligent Enterprise’s list of 12 most influential vendors plus 36 more to watch in the areas Intelligent Enterprise covers (which seems to be pretty much the analytics-related parts of what I write about here and on Text Technologies). It looks like a pretty reasonable list, although I think they forced the issue in some of the small analytics vendors they selected, and of course anybody can quibble with some of the omissions.

Among the companies they cited, you can find topical categories here for IBM (and Cognos), Informatica, Microsoft, Netezza, Oracle, SAP/Business Objects (both), SAS, and Teradata; QlikTech; Cast Iron, Coral8, DATAllegro, HP, ParAccel, and StreamBase; and Software AG. On Text Technologies you’ll find categories for some of the same vendors, plus Attensity, Clarabridge, and Google. There also are categories for some of these vendors on the Monash Report.

Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.

There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more

I recently talked with Pervasive Software about their data integration line. A large part of Pervasive’s new business is Salesforce.com integration, including at some big-name software vendors as customer/partner switch-hitters.

I just rechecked my notes from my January talk with Cast Iron Systems. A large part of Cast Iron’s new business is also integration with Salesforce.com, Netsuite, and other SaaS vendors.