Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will be accessed.

The mapr-client package must be installed on each node where you will be building and running your applications. This package installs all of the MapR Libraries needed for application development regardless of programming language or type of MapR-DB table (binary or JSON).

This topic describes the methods for passing a MapR-DB table name. Binary table names can be passed by either specifying the table path in the API or by setting the table path in the core-site.xml file. JSON table names are passed by specifying the table path in the API.

As part of its support for JSON tables, MapR-DB implements the OJAI API. The OJAI API provides methods for creating, reading, updating, and deleting JSON documents in MapR-DB JSON tables. It is available in Java. MapR-DB also provides a MapR-DB JSON Client API for managing JSON tables and a MapR-DB JSON REST API for performing basic operations using HTTP calls.

Performing Bulkloads with MapReduce

You can use the HFileOutputFormat configureIncrementalLoad() method for writing custom MapReduce applications to perform bulk loads. Although the name of
the method implies that you can use it only for incremental bulk loads, the method also works
for full bulk loads, provided that the -bulkload, BULKLOAD,
or Bulkload parameter for a table is set to true, as described in Bulk Loading and MapR-DB
Tables.

If you have a custom MapReduce applications that does not use
HFileOutputFormat.configureIncrementalLoad(), simply use the path to the
MapR-DB table that you want to load. Using
HFileOutputFormat.configureIncrementalLoad() provides at least two
advantages:

This method performs a number of tasks that your application would otherwise need to do
explicitly:

Inspects the table to configure a total order partitioner

Uploads the partitions file to the cluster and adds it to the DistributedCache

Sets the number of reduce tasks to match the current number of regions

Sets the reducer up to perform the appropriate sorting (either
KeyValueSortReducer or PutSortReducer)

This method turns off Speculative Execution automatically. For details, see the note
below.

Warning: Turning off Speculative Execution

Speculative Execution of MapReduce
tasks is on by default. For custom applications that load MapR-DB binary tables, it is
recommended to turn Speculative Execution off. When it is on, the tasks that import data
might run multiple times. Multiple tasks for an incremental bulkload could insert one or
more versions of a record into a table. Multiple tasks for a full bulkload could cause loss
of data if the source data continues to be updated during the load.

If your custom MapReduce application uses
HFileOutputFormat.configureIncrementalLoad(), you do not have to turn off
Speculative Execution manually.
HFileOutputFormat.configureIncrementalLoad() turns it off automatically.
Speculative Execution is automatically turned off for MapReduce utilities such as
CopyTable and ImportTsv.

If you are writing a
custom MapReduce application that does not use the HFileOutputFormat
configureIncrementalLoad() method for bulk loading, you must turn off Speculative
Execution manually.

Turn off Speculative Execution by setting the following MapReduce version 2 parameter to
false: mapreduce.map.speculative

If the job is programmatically written, you can turn off Speculative Execution at the
code level: job.setSpeculativeExecution(false);