The assault on the enterprise data warehouse market carries on, with the latest challenge coming in the form of a Mountain View, Calif.-based startup called Treasure Data. The company, which offers a combination Hadoop and data warehouse service hosted on the Amazon Web Services cloud, has raised $5 million from Sierra Ventures. Sierra’s past investments in the data warehouse space include Teradata and Greenplum, and new Treasure Data board member Tim Guleri was responsible for the Greenplum deals.

However, while the cloud angle immediately distinguishes Treasure Data from the legacy vendors, some might be wondering how it compares with another hot data warehouse service presently hosted with AWS — Amazon’s own Redshift service. According to Treasure Data co-founder and CTO Kazuki Ohta, the difference is pretty clear: Whereas AWS’s various big data services — S3, Redshift and Elastic MapReduce — are like Lego blocks that must be pieced together, Treasure Data offers the whole package on a single platform.

Advertisement

Data enters the Treasure Data platform straight from a relational database or from a combination of open source tools called Fluentd and MessagePack. Treasure Data Chief Architect Sadayuki Furuhashi actually created both, and the company claims they’re in use by thousands of web companies, including Pinterest, Facebook and Yahoo. Once inside Treasure Data, users can query it using SQL or Pig, run MapReduce jobs on it, and then push it out to business intelligence tools or even the “golden image” database.

Co-founder and CEO Hiro Yoshikawa explains the company as offering a different take on the traditional extract-transform-load, or ETL, experience. It allows for what he calls extract-visualize-load and extract-process-load, which essentially mean users can look at their data and analyze it before shipping it on to another system, ensuring they only send over the relevant stuff.

Treasure Data raised a $2.8 million seed round in November 2012 from angel investors that include Jerry Yang and several current and past Heroku employees. It’s also a member of Heroku co-founder James Lindenbaum’s new Heavybit accelerator for enterprise IT startups.

Yoshikawa said Treasure Data didn’t really need to raise money — it has more than 80 customers already running about 200,000 queries on 700 billion rows of data daily — but the funds will help it extend its reach outside its sweetspot of Japan and into the United States. Among its current customer base is Salesforce.com. He also said the company plans to close a follow-up funding deal next month.

That’s probably a good idea. Treasure Data’s delivery model and technologies might be disruptive, but it’s also taking on some serious incumbents such as Teradata, IBM Oracle, as well as relative — though still daunting — newcomers such as AWS, Cloudera and Hortonworks. Getting heard above their considerable noise and keeping up in terms of product development and support is going to take some serious cash.

This article was corrected at 6:00 p.m. on July 24 to remove a reference to to Toyota as a Treasure Data customer. Treasure Data had mentioned Toyota as a user, but has since informed GigaOM that Toyota is testing its product but is not yet an official customer.

I don’t see anything here that is new or novel. All the components here already exist and are widely deployed.

* Hive is the standard data warehouse built on top of Hadoop, which can be queried using SQL (HiveQL) and PIG, just like in the architecture picture above.
* Flume, Kafka and other log collection packages already are widely deployed, and FluentID is just another offering with identical functionality
* Sqoop is the gold standard for exchanging data between relational databases and Hadoop, and MessagePack is a newcomer in this territory with no discernible advantages

So what’s new, far less better, about Treasure Data? To me, it seems there is no treasure there.