The MapR Data Science Refinery includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache
Zeppelin is an open source web-based data science notebook. You can use it with MapR components to conduct data discovery,
ETL, machine learning, and data visualization.

Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the
platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will
be accessed.

The MapR Data Science Refinery includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache
Zeppelin is an open source web-based data science notebook. You can use it with MapR components to conduct data discovery,
ETL, machine learning, and data visualization.

To run the Apache Zeppelin container, you must access the Zeppelin Docker image from MapR’s public repository, run
the Docker image, and access the deployed container from your web browser. From your browser, you can create Zeppelin
notebooks.

Out-of-box, the interpreters in Apache Zeppelin on MapR are preconfigured to run against different backend engines. You
may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. No additional steps are needed to
configure and run the Pig and Shell interpreters. You can configure the idle timeout threshold for interpreters.

Apache Zeppelin supports the Helium framework. Using visualization packages, you can view your data through area charts,
bar charts, scatter charts, and other displays. To use a visualization package, you must enable it through the Helium
repository browser in the Zeppelin UI. Like Zeppelin interpreters, Helium is automatically installed in your Zeppelin
container.

This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This
includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing
MapR Database and MapR Event Store For Apache Kafka.

MapR supports public APIs for MapR Filesystem, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application development purposes.

Zeppelin on MapR

The MapR Data Science Refinery includes a preconfigured Apache Zeppelin notebook,
packaged as a Docker container. Apache Zeppelin is an open source web-based data science
notebook. You can use it with MapR components to conduct data discovery, ETL, machine learning,
and data visualization.

You can run the Zeppelin container either on your laptop or on MapR edge nodes. Out of box,
the Zeppelin container image is integrated with open source data processing engines like
Apache Spark, Apache Drill, and Apache Hive, as well as with native MapR engines (MapR Filesystem, MapR Database, and MapR Event Store For Apache Kafka). Using the notebook simply requires running the Docker
image and connecting to the container through your browser.

Zeppelin provides the following benefits for your data engineering and data science use
cases:

An interactive development environment for writing, testing, and sharing data processing
code snippets

The ability to run the notebooks in a local client environment, such as on a laptop

Support for a variety of interpreters for integrating with different backend
components

Support for extensible visualization libraries

The Zeppelin notebook included with the Data Science Refinery provides additional
benefits:

A small footprint, pre-built, certified data science container that is easy to deploy and
run

An isolated environment where you can experiment with libraries and packages without
affecting other users' work

Secure authentication at the container level across a secure Web connection

Running the Zeppelin Container
To run the Apache Zeppelin container, you must access the Zeppelin Docker image from MapR’s public repository, run the Docker image, and access the deployed container from your web browser. From your browser, you can create Zeppelin notebooks.

Understanding Zeppelin Interpreters
Apache Zeppelin interpreters enable you to access specific languages and data processing backends. This section describes the interpreters you can use with MapR and the use cases they serve.

Configuring Zeppelin Interpreters
Out-of-box, the interpreters in Apache Zeppelin on MapR are preconfigured to run against different backend engines. You may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. No additional steps are needed to configure and run the Pig and Shell interpreters. You can configure the idle timeout threshold for interpreters.

Troubleshooting Zeppelin
This section describes how to resolve common problems you may encounter when using Apache Zeppelin.

Using Visualization Packages in Zeppelin
Apache Zeppelin supports the Helium framework. Using visualization packages, you can view your data through area charts, bar charts, scatter charts, and other displays. To use a visualization package, you must enable it through the Helium repository browser in the Zeppelin UI. Like Zeppelin interpreters, Helium is automatically installed in your Zeppelin container.

Using Zeppelin to Access Different Backend Engines
This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing MapR Database and MapR Event Store For Apache Kafka.

Sharing Zeppelin Notebook Content
By default, Zeppelin stores notebooks in the local file system in your container. An alternative is to store them in MapR Filesystem. This allows you to share the notebooks with other users.