The MapR Data Science Refinery includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache
Zeppelin is an open source web-based data science notebook. You can use it with MapR components to conduct data discovery,
ETL, machine learning, and data visualization.

This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This
includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing
MapR Database and MapR Event Store For Apache Kafka.

This section contains code samples for different types of Apache Spark jobs that you can run in your Apache Zeppelin notebook.
You can run these examples using either the Livy or Spark interpreter. The Spark interpreter is available starting in
the 1.1 release of the MapR Data Science Refinery.

Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the
platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will
be accessed.

The MapR Data Science Refinery includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache
Zeppelin is an open source web-based data science notebook. You can use it with MapR components to conduct data discovery,
ETL, machine learning, and data visualization.

To run the Apache Zeppelin container, you must access the Zeppelin Docker image from MapR’s public repository, run
the Docker image, and access the deployed container from your web browser. From your browser, you can create Zeppelin
notebooks.

Out-of-box, the interpreters in Apache Zeppelin on MapR are preconfigured to run against different backend engines. You
may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. No additional steps are needed to
configure and run the Pig and Shell interpreters. You can configure the idle timeout threshold for interpreters.

Apache Zeppelin supports the Helium framework. Using visualization packages, you can view your data through area charts,
bar charts, scatter charts, and other displays. To use a visualization package, you must enable it through the Helium
repository browser in the Zeppelin UI. Like Zeppelin interpreters, Helium is automatically installed in your Zeppelin
container.

This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This
includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing
MapR Database and MapR Event Store For Apache Kafka.

This section contains code samples for different types of Apache Spark jobs that you can run in your Apache Zeppelin notebook.
You can run these examples using either the Livy or Spark interpreter. The Spark interpreter is available starting in
the 1.1 release of the MapR Data Science Refinery.

This section contains an example of an Apache Spark job that uses the MapR Database Binary Connector for Apache Spark to write and read a MapR Database Binary table. You can run this example using either the Livy or Spark interpreter. The Spark interpreter is available
starting in the 1.1 release of the MapR Data Science Refinery.

This section contains examples of Apache Spark jobs that use the MapR Database OJAI Connector for Apache Spark to read and write MapR Database JSON tables. The examples use the Spark Python interpreter. The Spark interpreter is available starting in the 1.1 release
of the MapR Data Science Refinery. The Python API in the MapR Database OJAI Connector is available starting in the MEP 4.1 release.

This section contains a MapR Event Store For Apache Kafka streaming example that you can run in your Apache Zeppelin notebook using the Spark interpreter. The Spark interpreter
is available starting in the 1.1 release of the MapR Data Science Refinery.

MapR supports public APIs for MapR Filesystem, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application development purposes.

Running Spark Jobs in Zeppelin

This section contains code samples for different types of Apache Spark jobs that you
can run in your Apache Zeppelin notebook. You can run these examples using either the Livy or
Spark interpreter. The Spark interpreter is available starting in the 1.1 release of the MapR
Data Science Refinery.

Before running these examples, depending on whether you are using the Livy or Spark interpreter, make sure
you have configured the interpreter.

Note: The examples in this section use Hadoop commands
to access files in MapR Filesystem. If you have a MapR Filesystem mount point in your container, you can
replace the Hadoop commands with standard shell commands. Refer to Running Shell Commands in Zeppelin for an example of how to do this.

Running a Spark Job Using PySpark

The following example shows how to run a Spark job using Python. Make sure you have installed Python on your MapR cluster.

Before running the sample PySpark code, copy the files that the code references to
MapR Filesystem:

Running a Spark Job Using SparkR

The following SparkR code example creates a table and queries it using HiveQL. Make sure
you have installed SparkR on your MapR cluster. Set your interpreter to either
%livy.sparkr or %spark.r, depending on whether you are
using Livy or Spark.

Querying Hive Tables Using Spark SQL

The following two examples query Hive tables using Spark SQL queries. Make sure the
hive-site.xml configuration file from your Hive cluster is available in
your Zeppelin container. Hive Tables describes the
detailed steps.

If the code snippets in these examples do not specify an interpreter, specify
%livy.spark to use the Livy interpreter or %spark to use
the Spark interpreter.

Example 1

Run the following code to create Hive tables and issue various select statements
against
them: