Configure the
ResourceManager for Work-preserving Restart

Configure YARN to preserve the work of running applications in the event of a
ResourceManager restart.

Configure ResourceManager Work-preserving Restart.

Work-preserving ResourceManager restart ensures that applications continuously function
during a ResourceManager restart with minimal impact to end-users. The overall concept
is that the ResourceManager preserves application queue state in a pluggable state
store, and reloads that state on restart. While the ResourceManager is down,
ApplicationMasters and NodeManagers continuously poll the ResourceManager until it
restarts. When the ResourceManager comes back online, the ApplicationMasters and
NodeManagers re-register with the newly started ResourceManger. When the ResourceManager
restarts, it also recovers container information by absorbing the container statuses
sent from all NodeManagers. Thus, no work will be lost due to a ResourceManager
crash-reboot event.

To configure work-preserving restart for the ResourceManager, set the following
properties in the yarn-site.xml file.

Property:

yarn.resourcemanager.recovery.enabled

Value:

true

Description:

Enables ResourceManager restart. The default value is false. If this
configuration property is set to true, running applications will resume
when the ResourceManager is restarted.

FileSystemRMStateStore Configuration

The following properties apply only if
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
has been specified as the state-store in the
yarn.resourcemanager.store.class property.

Property:

yarn.resourcemanager.fs.state-store.uri

Value:

<hadoop.tmp.dir>/yarn/system/rmstore

Description:

The URI pointing to the location of the file system path where the RM state will be
stored (e.g. hdfs://localhost:9000/rmstore). The default value is
<hadoop.tmp.dir>/yarn/system/rmstore.

LeveldbRMStateStore Configuration

The following properties apply only if
org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
has been specified as the state-store in the
yarn.resourcemanager.store.class property.

The ZooKeeper session timeout in milliseconds. This configuration is used by the
ZooKeeper server to determine when the session expires. Session expiration happens when
the server does not hear from the client (i.e. no heartbeat) within the session timeout
period specified by this property. The default value is 10 seconds.