Split existing RpcMetrics into RpcMetrics and RpcDetailedMetrics. The new RpcDetailedMetrics has per method usage details and is available under context name “rpc” and record name “detailed-metrics”

MAPREDUCE-927 | Major | Cleanup of task-logs should happen in TaskTracker instead of the Child

Moved Task log cleanup into a separate thread in TaskTracker. Added configuration “mapreduce.job.userlog.retain.hours” to specify the time(in hours) for which the user-logs are to be retained after the job completion.

Specific exceptions are thrown from HDFS implementation and protocol per the interface defined in AbstractFileSystem. The compatibility is not affected as the applications catch IOException and will be able to handle specific exceptions that are subclasses of IOException.

new config: hadoop.security.service.user.name.key this setting points to the server principal for RefreshUserToGroupMappingsProtocol. The value should be either NN or JT principal depending if it is used in DFAdmin or MRAdmin. The value is set by the application. No need for default value.

Does not currently provide anything but uniform distribution. Uses some older depreciated class interfaces (for mapper and reducer) This was tested on 0.20 and 0.22 (locally) so it should be fairly backwards compatible.

Incremental enhancements to the JobTracker include a no-lock version of JT.getTaskCompletion events, no lock on the JT while doing i/o during job-submission and several fixes to cut down configuration parsing during heartbeat-handling.

Added a configuration property “stream.map.input.ignoreKey” to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter’s, key is always written.

MAPREDUCE-572 | Minor | If #link is missing from uri format of -cacheArchive then streaming does not throw error.

Improved streaming job failure when #link is missing from uri format of -cacheArchive. Earlier it used to fail when launching individual tasks, now it fails during job submission itself.

Processing of concatenated gzip files formerly stopped (quietly) at the end of the first substream/“member”; now processing will continue to the end of the concatenated stream, like gzip(1) does. (bzip2 support is unaffected by this patch.)

This jira introduces backward incompatibility. Existing pipes applications MUST be recompiled with new hadoop pipes library once the changes in this jira are deployed.

HDFS-1315 | Major | Add fsck event to audit log and remove other audit log events corresponding to FSCK listStatus and open calls

When running fsck, audit log events are not logged for listStatus and open are not logged. A new event with cmd=fsck is logged with ugi field set to the user requesting fsck and src field set to the fsck path.

Added mapreduce.cluster.acls.enabled as the single configuration property in mapred-default.xml that enables the authorization checks for all job level and queue level operations.

To enable authorization of users to do job level and queue level operations, mapreduce.cluster.acls.enabled is to be set to true in JobTracker’s configuration and in all TaskTrackers’ configurations.

To get access to a job, it is enough for a user to be part of one of the access lists i.e. either job-acl or queue-admins-acl(unlike before, when, one has to be part of both the lists).

Queue administrators(configured via acl-administer-jobs) of a queue can do all view-job and modify-job operations on all jobs submitted to that queue.

ClusterOwner(who started the mapreduce cluster) and cluster administrators(configured via mapreduce.cluster.permissions.supergroup) can do all job level operations and queue level operations on all jobs on all queues in that cluster irrespective of job-acls and queue-acls configured.

JobOwner(who submitted job to a queue) can do all view-job and modify-job operations on his/her job irrespective of job-acls and queue-acls.

Since aclsEnabled flag is removed from queues configuration files, “refresh of queues configuration” will not change mapreduce.cluster.acls.enabled on the fly. mapreduce.cluster.acls.enabled can be modified only when restarting the mapreduce cluster.

MAPREDUCE-1517 | Major | streaming should support running on background

Adds -background option to run a streaming job in background.

MAPREDUCE-2147 | Trivial | JobInProgress has some redundant lines in its ctor

Remove some redundant lines from JobInProgress’s constructor which was re-initializing things unnecessarily.

This provides an option to store fsimage compressed. The layout version is bumped to -25. The user could configure if s/he wants the fsimage to be compressed or not and which codec to use. By default the fsimage is not compressed.

Moved the api public Counter getCounter(Enum<?> counterName), public Counter getCounter(String groupName, String counterName) from org.apache.hadoop.mapreduce.TaskInputOutputContext to org.apache.hadoop.mapreduce.TaskAttemptContext

The permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.

Removed references to the older fs.checkpoint.* properties that resided in core-site.xml

HADOOP-6949 | Major | Reduces RPC packet size for primitive arrays, especially long[], which is used at block reporting

Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5. Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.

When Hadoop’s Kerberos integration is enabled, it is now required that either {{kinit}} be on the path for user accounts running the Hadoop client, or that the {{hadoop.kerberos.kinit.command}} configuration option be manually set to the absolute path to {{kinit}}.

Confirmed that problem of finding ivy file occurs w/o patch with ant 1.7, and not with patch (with either ant 1.7 or 1.8). Other unit tests are still failing the test steps themselves on my laptop, but that is not due not finding the ivy file.