Ibis on Impala: Python at Scale for Data Science

Co-founded by the respective architects of the Python pandas toolkit and Impala and now incubating in Cloudera Labs, Ibis is a new data analysis framework with the goal of enabling advanced data analysis on a 100% Python stack with full-fidelity data. With Ibis, for the first time, developers and data scientists will be able to utilize the last 15 years of advances in high-performance Python tools and infrastructure in a Hadoop-scale environment—without compromising user experience for performance. It’s exactly the same Python you know and love, only at scale!

…

In this initial (unsupported) Cloudera Labs release, Ibis offers comprehensive support for the analytical capabilities presently provided by Impala, enabling Python users to run Big Data workloads in a manner similar to that of “small data” tools like pandas. Next, we’ll extend Impala and Ibis in several ways to make the Python ecosystem a seamless part of the stack:

First, Ibis will enable more natural data modeling by leveraging Impala’s upcoming support for nested types (expected by end of 2015).

Second, we’ll add support for Python user-defined logic so that Ibis will integrate with the existing Python data ecosystem—enabling custom Python functions at scale.

Finally, we’ll accelerate performance further through low-level integrations between Ibis and Impala with a new Python-friendly, in-memory columnar format and Python-to-LLVM code generation. These updates will accelerate Python to run at native hardware speed.

This entry was posted
on Tuesday, July 21st, 2015 at 7:24 pm and is filed under Cloudera, Python.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.