Cloudera Makes Hadoop Real-Time with Impala

Today at the Strata + Hadoop World event in New York City Cloudera unveiled Impala, an open-source engine that extends Hadoop beyond batch analytics.

The big data platform is capable of crunching through massive data-driven workloads in petabytes scale environments, but the MapReduce parallel computing framework on which it is based only allows for batch analytics. Hadoop helps enterprises extract business insight in minutes as opposed to daysand now Cloudera has taken it the next level with real-time processing.

Impala is the product of a two year-long internal R&D cycle; it’s available under the Apache license for the open-source community and as Real-Time Query for commercial users. RTQ acts as an extension of Cloudera Enterprise, the company’s flagship Hadoop distro, and unifies batch and real-time analytics in one system.

“Mainstream enterprise adoption of Hadoop will inevitably raise expectations,” said Tony Baer, Principal Analyst for Ovum. “Enterprises have grown accustomed to interactive querying and on-the-spot analytics with their existing data warehousing and BI infrastructures and will expect no less of Hadoop. With a real-time query capability powered by its new Impala engine, Cloudera is striving to level the playing field in performance and accessibility with massively parallel SQL platforms.”

Cloudera is putting a lot of weight behind Impala, and wants customers to get in on the action right off the bat. The company’s partners – Tableau, Capgemini, Karmasphere and a few others – have all integrated the engine with their offerings, which makes it immediately usable in production environments.

The launch of Impala falls in line with Ask Bigger Questions, a new slogan that Cloudera CEO Mike Olson touted at the conference. This new ‘corporate identity’ covers a broader commitment to the channel, open-source community and most importantly real-time, the chief executive says.