Have Acquisitions Changed the Data Quality Market?

There’s been tumult aplenty in the once-insulated data quality market.

In just the last three years, no less than four best-of-breed data quality players (the former Trillium Software, Group 1 Software, Similarity Systems, and Firstlogic) have been snapped up by larger, non-data-quality-centric vendors. That translates into a significant reordering of the data quality status quo.

Even so, how much has really changed, and just which companies emerge as the new data quality thoroughbreds?

According to market watcher Gartner Inc., the newest data quality power rankings look a lot like older data quality power rankings. The names might have changed, but the technologies—or the players associated with those technologies—have remained the same.

Consider the four former best-of-breeds: Trillium, Group 1, Similarity, and Firstlogic. All but Group 1—which was acquired by Pitney-Bowes, a mail and document management specialist that doesn’t pretend to specialize in data integration—are positioned in the leaders quadrant of Gartner’s latest data quality market survey. (Group 1, for its part, is firmly in Gartner’s “Challengers” quadrant. Trillium, like Group 1, was acquired by a non-data integration specialist: Harte-Hanks.)

The other data quality market leaders consist of DataFlux—which was acquired six years ago by parent company SAS Institute Inc.—and IBM Corp., which acquired the data quality assets of the former Ascential Software Corp. two and a half years ago.

One upshot, industry watchers indicate, is that acquisition by a larger, non-specialty vendor doesn’t necessarily foretell a reduction in its status. In some cases, in fact, acquisition means just the opposite.

Consider Business Objects—soon to be a subsidiary of SAP AG—which acquired Firstlogic more than 18 months ago. As an independent data quality vendor, Firstlogic enjoyed an enormous market presence (the company had about 2,500 customers), as well as OEM relationships with Business Objects and Informatica, among others. Firstlogic specialized mostly in customer data integration (CDI)—or customer data quality.

According to research from The Data Warehousing Institute (TDWI), customer data is still the largest overall data quality segment. However, that’s changing, says TDWI Research senior manager Philip Russom, such that other kinds of data—and particularly product data—are fast coming to the fore.

“Although customer data tasks are still the bread-and-butter of data quality, the need for quality product data continues to grow,” said Russom, in an August interview. “Many organizations are at a point where they’re moving from name/address cleansing and house-holding—the most common starting points, which focus on customer data—to product data tasks, and procurement/supplier data is a common place to start in this new area.”

Russom was speaking about Universal Data Cleanse (UDC), a new option Business Objects developed for its Data Quality XI product. While the original Data Quality XI was brimming with CDI goodies—all (or nearly all) of which were inherited from Firstlogic—it left a lot to be desired when it came to product data and financial data. UDC helps redress that. It’s also almost completely homegrown, according to Business Objects officials.

Firstlogic also had rather limited internationalization capabilities, according to Russom and other industry watchers. That was one reason Business Objects last month acquired data quality software specialist Fuzzy! Informatik, a German vendor that—with a thriving presence in the EU and a presence in both the Middle East and Africa—helps extend its international reach. Fuzzy! helps recast DataQuality XI as a formidable international competitor, according to Russom.

“The acquisition gives Business Objects a lot more functionality for handling natural languages and postal addresses in European, Middle-Eastern, and African nations,” he explains.

Informatica

Then there’s Informatica, which acquired Similarity Systems shortly after Business Objects snatched up Firstlogic. That company is currently prepping an update of its own data quality offering and officials promise forthcoming enhancements in terms of new product and financial data capabilities, as well as internationalization improvements.

Industry watchers say Informatica appears to have done right by Similarity. Not only does it continue to field a best-of-breed data quality toolset, but—as a result of its careful stewardship of the Similarity assets—it has been able to successfully sell Informatica Data Quality into its own customer base (which was once riven with Firstlogic and Trillium licenses, thanks to Informatica’s OEM deals with both companies.)

Moreover, Informatica is one of just four vendors—and is the only data integration pure-play vendor—that touts a combined data integration and data quality platform. Only Business Objects, IBM, and SAS can make the same claim.

“Through [its] acquisitions [of not only Similarity, but Evoke, which Similarity acquired in 2005], Informatica now has all the key pieces of profiling, standardization, matching, and cleansing,” write Gartner analysts Tom Friedman and Andreas Bitterer in the market watcher’s most recent Magic Quadrant survey.

“The data quality arena represents a big market opportunity for Informatica, and [its] strategy of cross-selling the [Similarity] data quality products into its large installed base appears to be working. In addition, the data quality products further enhance Informatica’s strategy for developing a comprehensive data integration and data quality offering.”

DataFlux

DataFlux is among the oldest of the best-of-breed data quality vendors. It was also one of the first data quality pure-plays to get snapped up. SAS has been careful to cultivate the appearance of DataFlux’ independence, however, and—with surging sales and a burgeoning international presence—its efforts appear to be paying off.

“DataFlux has been moving out of the large shadow of its parent company, SAS. It is demonstrating high growth and has begun to be seen as a major competitor, most recently in Europe,” Friedman and Bitterer write.

DataFlux has been seen mostly as a stand-alone data quality technology provider, the Gartner duo notes, but that, too, is changing. “[T]hrough the new customer data integration ,,, solution, and a broader data integration positioning with its Integration Server product, DataFlux will increasingly try to address needs beyond its core market and into the world of MDM,” Friedman and Bitterer say.

This could lead to conflict with parent company SAS, which—with its vaunted Enterprise ETL tool—is also cultivating data integration opportunities. This is a necessary evil, according to the Gartner pair: “To become a data quality brand name independent of SAS, DataFlux will need to strike partnerships and build a channel with other vendors in the business intelligence and data integration space, possibly companies that compete directly with SAS.”

IBM

Big Blue inherited an enormous (and still largely disparate) data quality and data integration stack from Ascential. (Ascential had built out its data quality, data profiling, and even parallel ETL processing capabilities largely by dint of acquisition.) Since that acquisition, however, Big Blue has executed on Ascential’s former Hawk project (which outlined a strategy for reconciling and integrating its disparate information integration assets) and cobbled together the makings of a formidable—and market-leading—data integration stack.

On the data quality front, too, Big Blue has been impressively thorough.

“As one of the best known brands with worldwide consulting, service, and support functions, IBM is well equipped to position its vision of data quality in organizations worldwide,” Friedman and Bitterer write.

The Gartner analysts also praise the effort IBM put into updating and enhancing the former Ascential ProfileStage (which Big Blue has rebranded Information Analyzer) and QualityStage products.

“The rearchitected products that were delivered with IBM’s release of Information Server underwent significant development to harmonize the user interfaces and administration functionality, and increase ease of use and developer productivity, which were known challenges for previous versions,” Freidman and Bitterer point out.

The irony, of course, is that even as Big Blue has integrated and enhanced the somewhat chockablock Ascential data integration and data quality portfolio, it’s seeing slower-than-expected sales on the data quality front, Gartner points out. “[W]hen [Ascential] was acquired by IBM its strong focus on data profiling and cleansing was somewhat diluted, as it fell under the large umbrella of WebSphere and was subsequently repositioned within the new Information Server,” Friedman and Bitterer observe.

“The Information Server versions of the data quality products are seeing slow adoption in the market, with virtually no customer references available that use these versions of the products in a production environment.” While disappointing, this isn’t entirely unexpected, Gartner indicates: “This is mostly because the new product version is still very new and customers with prior versions are still early in their planning and analysis stages of the upgrade effort.”

Group 1

It’s business as usual for the other once-and-future data quality best-of-breeds, too. Consider Group 1, which—as a “Challenger” in Gartner’s Magic Quadrant rankings—falls just under the “Leaders” segment. One reason for this, according to the analysis firm, is that parent company Pitney Bowes continues to focus largely on CDI.

“[Group 1] specializes in global name and address standardization and validation, matching-related capabilities—including linking and deduplication—and geocoding,” Friedman and Bitterer write. “[Group 1] has significant strength in each of these areas, and although the underlying technology can be considered domain-agnostic, customer data quality applications are Group 1’s sole focus.”

Notwithstanding its focus on CDI, Group 1 is well-positioned relative to DataFlux, IBM, Informatica, and other data quality market leaders. It boasts 2,400 customers overall (most of which are in North America), but has established a foothold in the Asia-Pacific region, with several hundred customers, according to Gartner.

More to the point, Pitney Bowes continues to do right by its Group 1 assets. “With the significant financial resources of Pitney Bowes, Group 1 continues to expand its capabilities through acquisitions—such as the 2007 addition of MapInfo, which brings further geospatial and mapping services to the portfolio,” Friedman and Bitterer point out.

“Also, [Group 1] continues to fund organic development of its core data quality technology, as evidenced by the recent addition of service-oriented capabilities, SaaS delivery models, and increased depth of integration with the Group 1 Data Flow data integration tools.”

Trillium

Erstwhile standalone player Trillium, which Harte-Hanks acquired more than two years ago, still ranks as a “Leader” in Gartner’s Magic Quadrant. That vendor—with approximately 1,600 customers—has also grown by leaps and bounds since it first came under the Harte-Hanks umbrella, Friedman and Bitterer say.

“Trillium continues to enjoy strong brand recognition and remains a market share leader,” they write. “Trillium continues to expand its focus on alternative delivery models for its data quality capabilities.” Among other Hartes-Hanks-era enhancements, Friedman and Bitterer cite Trillium’s Diamond Data IS offering, which provides data quality functionality in a SaaS hosted model. “Trillium also markets and sells directly its TS On-Demand solution, giving customers a choice of on-premise or hosted deployment of TS Quality,” they point out.

Like Group 1, Trillium focuses largely on CDI applications. This could become a problem going forward, if—as expected—the data quality market continues to embrace other domains (such as product and financial data) in addition to customer data. “While customer data will remain a mainstay of market demand, Trillium will be increasingly challenged by newer domain-agnostic competitors and pushed by its customers for greater extensibility to other data types,” Friedman and Bitterer comment.

In any event, they note, Harte-Hanks is unlikely to be caught napping: Trillium plans to introduce Universal Data Libraries—i.e., pre-built functionality for common data attributes, including units of measure, currencies, and package types—by the end of this year.