Administration of the MapR-DB is done primarily via the commmand line (maprcli) or with the MapR Control System (MCS).
Regardless of whether the MapR-DB table is used for binary files or JSON documents, the same types of commands are used
with slightly different parameter options. MapR-DB administration is associated with tables, columns and column families,
and table regions.

Best Practices

Disk Setup

It is not necessary to set up RAID (Redundant Array of
Independent Disks) on disks used by MapR-FS. MapR uses a script
called disksetup to set up
storage pools. In most cases, you should let MapR calculate storage
pools using the default stripe width
of two or three disks. If you anticipate a high volume of
random-access I/O, you can use the -W
option with disksetup to specify
larger storage pools of up to 8 disks each.

Setting Up MapR NFS

MapR uses version 3 of the NFS protocol. NFS version 4
bypasses the port mapper and attempts to connect to the default
port only. If you are running NFS on a non-standard port, mounts
from NFS version 4 clients time out. Use the -o
nfsvers=3 option to specify NFS version 3.

NIC Configuration

For high performance clusters, use more than one network
interface card (NIC) per node. MapR can detect multiple IP
addresses on each node and load-balance throughput
automatically.

Isolating CLDB Nodes

In a large cluster (100 nodes or more) create CLDB-only nodes to ensure high
performance. This configuration also provides additional control over the placement of
the CLDB data, for load balancing, fault tolerance, or high availability (HA). Setting
up CLDB-only nodes involves restricting the CLDB volume to its own topology and making
sure all other volumes are on a separate topology. Because both the CLDB-only path and
the non-CLDB path are children of the root topology path, new non-CLDB volumes are not
guaranteed to keep off the CLDB-only nodes. To avoid this problem, set a default volume
topology. See Setting Default Volume Topology.

Isolating ZooKeeper
Nodes

For large clusters (100 nodes or more), isolate the ZooKeeper on
nodes that do not perform any other function. Isolating the
ZooKeeper node enables the node to perform its functions without
competing for resources with other processes. Installing a
ZooKeeper-only node is similar to any typical node installation,
but with a specific subset of packages.

Warning:
Do not install the FileServer package on an isolated ZooKeeper
node in order to prevent MapR from using this node for data
storage.

Setting Up RAID on the Operating System Partition

You can set up RAID on the operating system partition(s) or
drive(s) at installation time, to provide higher operating system
performance (RAID 0), disk mirroring for failover (RAID 1), or both
(RAID 10), for example. See the following instructions from the
operating system websites:

ExpressLane

MapR provides an express path (called ExpressLane) that works in
conjunction with the Fair Scheduler. ExpressLane is for small MapReduce jobs to run when all slots
are occupied by long tasks. Small jobs are only given this special treatment when the
cluster is busy, and only if they meet the criteria specified by the following
parameters in mapred-site.xml:

Parameter

Value

Description

mapred.fairscheduler.smalljob.schedule.enable

true

Enable small job fast scheduling inside fair scheduler.
TaskTrackers should reserve a slot called ephemeral slot which is
used for smalljob if cluster is busy.

mapred.fairscheduler.smalljob.max.maps

10

Small job definition. Max number of maps allowed in small
job.

mapred.fairscheduler.smalljob.max.reducers

10

Small job definition. Max number of reducers allowed in small
job.

mapred.fairscheduler.smalljob.max.inputsize

10737418240

Small job definition. Max input size in bytes allowed for a
small job. Default is 10GB.

mapred.fairscheduler.smalljob.max.reducer.inputsize

1073741824

Small job definition. Max estimated input size for a reducer
allowed in small job. Default is 1GB per reducer.

mapred.cluster.ephemeral.tasks.memory.limit.mb

200

Small job definition. Max memory in mbytes reserved for an
ephermal slot. Default is 200mb. This value must be same on
JobTracker and TaskTracker nodes.

MapReduce jobs that appear to fit the small job definition but
are in fact larger than anticipated are killed and re-queued for
normal execution.

HBase

The HBase write-ahead log (WAL) writes many tiny records, and
compressing it would cause massive CPU load. Before using HBase,
turn off MapR compression for directories in the HBase volume
(normally mounted at /hbase. Example:

hadoop mfs -setcompression off /hbase

You can check whether compression is turned off in a directory
or mounted volume by using hadoop mfs to list the file
contents. Example:

hadoop mfs -ls /hbase

The letter Z in the output indicates compression is
turned on; the letter U indicates compression is
turned off. See hadoop
mfs for more information.

On any node where you plan to run both HBase and MapReduce, give more memory to the
FileServer than to the RegionServer so that the node can handle high throughput. For
example, on a node with 24 GB of physical memory, it might be desirable to limit the
RegionServer to 4 GB, give 10 GB to MapR-FS, and give the remainder to TaskTracker.
To change the memory allocated to each service, edit the
/opt/mapr/conf/warden.conf file. See Resource Allocation for Jobs and Applications for more information.