Paxata Debuts Data Quality Tools at Strata

Alex Woodie

One of the vendors hoping to use this week’s Strata + Hadoop World conference as a springboard to big data fame and fortune is Paxata, a California startup that is applying social media technologies, like pattern recognition and graph analysis, to the problem of data quality. Paxata officially came out of stealth mode today by announcing a partnership with Tableau and an $8 million B series round of funding.

Paxata was founded in 2012 by four tech veterans with the idea of creating a new generation of tools that make it easier to prepare big data sets for analysis by end users. Its founders think there’s a big gap in the market for data preparation and data quality tools–what you might think of loosely as the master data management (MDM) and ETL markets–which Paxata intends to fill with its cloud-based Adaptive Data Preparation offering.

There’s a yawning lack of functionality between the user-facing data discovery and visualization tools from the likes of Tableau and QlikTech on the one hand, and the data repositories like Hadoop and parallel in-memory databases on the other, Paxata says. The goal of its Adaptive Data Preparation offering is to give customers the same kind of free, uninhibited, and intuitive user experience that data discovery tools like Tableau and QlikTech enable, but to do so for the data quality and data prep processes.

The current crop of ETL and MDM products take too long to get results from, are too inflexible, and are too IT heavy, the company says. The end user analysts who search through data looking for gems do not have the time to wait for new data sources to be procured and prepped before adding them to the ad-hoc mix, the thinking goes. They should be able to bring new data sets to bear as they see fit, and IT ought to support that process, instead of controlling it.

Paxata says its Adaptive Data Preparation software allows analysts users to find, combine, and shape raw data as they need it, and to blend multiple data sources as part of their “stream of thought querying process.”

So, what exactly is in the product? In its corporate data sheet, the company says its R&D has focused on using a combination of technologies commonly found in the consumer search and social media spaces to perform “intelligent indexing, textual pattern recognition, and statistical graph analysis” upon raw data.

“By applying proprietary algorithms to the linguistic content of both structured and unstructured data, the Paxata solution automatically builds a comprehensive and flexible data model in the form of a graph, reflecting similarities and associations amongst data items,” the company says in its brief. “Paxata automatically detects and highlights patterns and anomalies within the data so analysts have a visual map to resolve both syntactic and semantic data qualities issues, rapidly improving the quality of large data sets.”

But the real head turner may be the way it wants to use the combined intelligence of Facebook, Twitter, and LinkedIn as a way to drive quality into its customers’ big data sets. Customers can “crowdsource” data from third parties, including the “semantic Web, social media sites, and specialist data service providers,” to bear on their data quality challenges. As more data is added, the refinements get more accurate, which is where the machine learning algorithms come into play.

The use of pattern recognition, graph analysis, and crowdsourcing is a different, but potentially promising, approach to the data quality problem. One analyst applauding Paxata’s early work is R “Ray” Wang, the principal analyst and founder of Constellation Research.

“New data management systems like Hadoop and business-centric ad-hoc analytic tools like Tableau have removed the traditional constraints around raw data retention and rapid discovery and decisions,” Wang says in a Paxata press release. “The missing link of the analytics triangle are tools that do for data preparation and data quality what Tableau and others have done for data visualization.”

Paxata has been in business for less than two years, so it’s too soon to tell how this approach will play out. The company, which has 28 employees, says it has several customers already using its software in production, including Dannon, Pabst Brewing Company, and Box. Its approach is evidentially promising enough to have attracted $8 million in a second round funding, which was led by Accel Partners. In addition to the new co-selling partnership with Tableau announced today, it has existing partnership in place with Cloudera and QlikTech.

The company is ready to enter the next stage of growth, and is using this week’s Strata conference as its coming out party. To that end, Paxata’s co-founder and CEO, Prakash Nanduri, will be participating in a panel discussion with representatives from Cisco, Cloudera, Dannon and UBS on Wednesday. As part of the soft launch, the company’s website at www.paxata.com is also expected to be updated with useful information.