Impala – A New Era for BI on Hadoop

With the recent announcement of Impala, also known as Cloudera Enterprise RTQ (Real Time Query), I expect the interest in and adoption of Hadoop to go from merely intense to crazy. We applaud Cloudera’s investment in creating Impala as it moves Hadoop a huge step forward in making Hadoop accessible using existing BI tools.

What is Impala? Simply put, it enables all of the SQL-based BI and business analytics tools that have been built over the past couple of decades to now work directly on top of Hadoop, providing interactive response times not previously attainable with Hadoop, and many times faster than Hive, the existing SQL-like alternative. And Impala provides pretty complete SQL support, including join and aggregate functions – must-have functions for analytics.

For enterprises this analytic query speed and expressiveness is huge – it means they are now much less likely to need to extract data out of Hadoop and load it into a data mart or warehouse for interactive visualization. Instead they can use their favorite business analytics tool directly against Hadoop. But of course only Pentaho provides the integrated end-to-end data integration and business analytics capability for both ingesting and processing data inside of Hadoop, as well as interactively visualizing and analyzing Hadoop data.

Over the past few months Cloudera and Pentaho have been partnering closely at all levels including marketing, sales and engineering. We are proud of the role we played in assisting Cloudera with validating and testing Impala against realistic BI workloads and use cases. Based on the extremely strong interest we’ve seen, as evidenced by the lines at our booth at the recent Strata big data conference in New York City, the combination of Pentaho’s visual development and interactive visualization for Hadoop with the break-through performance of Cloudera Impala is very compelling for a huge number of enterprises.

– Ian Fyfe, Chief Technology Evangelist, Pentaho

Reading:

Like this:

LikeLoading...

Related

This entry was posted on Friday, November 30th, 2012 at 11:20 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

Impala is certainly interesting. But there are some challenges with it in its current state. Correct me if I’m wrong, but I believe it is nowhere near supporting SQL92, has minimal metadata, and does not leverage Hadoop workflows or HDFS failover. (It’s a separate engine embedded in Hadoop that you query directly.) With Yarn, HCatalog, and other things, these will be addressed, but I’m wondering whether any companies are successfully using IMpala right now, and whether they might be better served parsing and dumping the data into a DBMS and then analyzing it there. There is a huge “Hadoop is a silver bullet” mentality out there right now. At the same time, I do see DW managers questioning whether it makes to move data into a DW if it’s already in Hadoop. That’s certainly a legitimate question and one that could turn our industry on its head.