Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will be accessed.

MapR supports public APIs for MapR Filesystem, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application-development purposes.

Managing Third-Party Libraries

Any third-party library that is required by a MapReduce program must be accessible to
the data node that processes the application.

A data node is a node in the cluster that includes the NodeManager role. You can provide
the third-party libraries when you submit the program, or you can install the
third-party libraries on each node that processes the application.

Include the third-party libraries with each program

Including the third-party libraries with each program is the preferred method.

Perform one the following operations to include the third-party jars when you submit the
program:

Package the third-party libraries with the MapReduce jar file. The benefit of this
method is that the node from which you submit the program and the node that runs
the program are not required to have the libraries files.

Use the -libjars parameter to specify the
third-party libraries on the command line. With this option, the library files are
submitted to the data node along with the program. The benefit of this method is
that the node that runs the program does not need to have the library files
installed. However, the node that submits the program must have the library files
installed.

Install the third-party libraries on each node that runs the program

You can also install the
third-party libraries on each data node. However, this may not be
preferred as there could be conflicts between library versions or
library files.

To install the third-party libraries on each data node, perform one of the following
operations:

Install the third-party libraries in the following directory on each Node Manager
node: /opt/mapr/hadoop/hadoop-2.x/share/hadoop/common

On each node with the NodeManager role, install the required third-party libraries
and then specify the location(s) of the third-party libraries with the
HADOOP_CLASSPATH env variable in the env_override.sh file. The
env_override.sh file is located in the following directory:
/opt/mapr/conf. For more information about the file, see About env_override.sh.