Accelerating Data Science with Spark and Zeppelin

Posted on February 17th, 2016

Two presenters talked about tools designed to work with Spark and other data analysis tools.

Oleg @Hortonworks spoke about #Apache #NiFi, an interactive GUI to orchestrate the flow of data. The drag and drop interface specifies how cases are streamed across applications with limited capabilities to filter or edit the data streams. The tool handles asynchronous processes and can be used to document how each case passes from process to process.

—

Next , Vinay Shukla talked about# Zeppelin, a notebook similar to #Jupyter (iPython) for displaying simple graphics (such as bar, line, and pie charts) in a browser window. It supports multiple languages on the back end (such as #Python) and is integrated with #Spark and #Hadoop.

Support for #R and visualizations at the level of #ggplot are scheduled for introduction in the second half of 2016.

Several audience members asked questions about who would use NiFi and Zeppelin since the tools do not have the analytic power of R or other analysis tools, yet their use requires more data sophistication than when using Excel or other business presentation tools.