A job service level agreement (SLA) defines the terms and conditions in which a job will be processed.
A job carries two distinct SLAs, one which defines a contract between the job and the JPPF server, the other defining a different contract between the job and the JPPF client.

Server and client SLAs have common attributes, which specify:

the characteristics of the nodes it can run on (server side), or of the channels it can be sent through (client side): the job execution policy

the time at which a job is scheduled to start

an expiration date for the job

The attributes specific to the server side SLA are:

the priority of a job

whether it is submitted in suspended state

the maximum number of nodes it can run on

whether the job is a standard or broadcast job

whether the server should immediately cancel the job, if the client that submitted it is disconnected

The attributes specific to the client side SLA are:

the maximum number of channels it can be sent through

A job SLA is represented by the interface JobSLA for the server side SLA, and by the interface JobClientSLA for the client side SLA.
It can be accessed from a job using the related getters and setters:

1.1 Execution policy

An execution policy is an object that determines whether a particular set of JPPF tasks can be executed on a JPPF node (for the server-side SLA) or if it can be sent via a communication channel (for the client-side).
It does so by applying the set of rules (or tests) it is made of, against a set of properties associated with the node or channel.

For a fully detailed description of how to create and use execution policies, please read the Execution policies section of this development guide.

2 Server side SLA attributes

public class JobSLA extends JobCommonSLA<JobSLA> {
// Job priority
public int getPriority()
public JobSLA setPriority(int priority)
// Maximum number of nodes the job can run on
public int getMaxNodes();
public JobSLA setMaxNodes(int maxNodes);
// max number of groups of master/slaves nodes the job can run on at any given time
public int getMaxNodeProvisioningGroups()
public JobSLA setMaxNodeProvisioningGroups(int maxNodeProvisioningGroups)
// Whether the job is initially suspended
public boolean isSuspended()
public JobSLA setSuspended(boolean suspended)
// whether the job is a broadcast job
public boolean isBroadcastJob();
public JobSLA setBroadcastJob(boolean broadcastJob)
// whether the job should be canceled by the server when the client is disconnected
public boolean isCancelUponClientDisconnect()
public JobSLA setCancelUponClientDisconnect(boolean cancelUponClientDisconnect)
// expiration schedule for any subset of the job dispatched to a node
public JPPFSchedule getDispatchExpirationSchedule()
public JobSLA setDispatchExpirationSchedule(JPPFSchedule schedule)
// number of times a dispatched task can expire before it is finally cancelled
public int getMaxDispatchExpirations()
public JobSLA setMaxDispatchExpirations(int max)
// class path associated with the job
public ClassPath getClassPath()
public JobSLA setClassPath(ClassPath classpath)
// max number of task resubmits and whether to apply it upon node error
public int getMaxTaskResubmits()
public JobSLA setMaxTaskResubmits(int maxResubmits)
public boolean isApplyMaxResubmitsUponNodeError()
public JobSLA setApplyMaxResubmitsUponNodeError(boolean applyMaxResubmitsUponNodeError)
// global grid execution policy
public ExecutionPolicy getGridExecutionPolicy()
public JobSLA setGridExecutionPolicy(ExecutionPolicy policy)
// desired node configuration
public JPPFNodeConfigSpec getDesiredNodeConfiguration()
public JobSLA setDesiredNodeConfiguration(JPPFNodeConfigSpec nodeConfigurationSpec)
}

2.1 Job priority

The priority of a job determines the order in which the job will be executed by the server.
It can be any integer value, such that if jobA.getPriority() > jobB.getPriority() then jobA will be executed before jobB.
There are situations where both jobs may be executed at the same time, for instance if there remain any available nodes for jobB after jobA has been dispatched.
Two jobs with the same priority will have an equal share (as much as is possible) of the available grid nodes.

The priority attribute is also manageable, which means that it can be dynamically updated, while the job is still executing, using the JPPF administration console or the related management APIs. The default priority is zero.

2.2 Maximum number of nodes

The maximum number of nodes attribute determines how many grid nodes a job can run on, at any given time.
This is an upper bound limit, and does not guarantee that always this number of nodes will be used, only that no more than this number of nodes will be assigned to the job.
This attribute is also non-distinctive, in that it does not specify which nodes the job will run on.
The default value of this attribute is equal to Integer.MAX_VALUE, i.e. 231-1.

The resulting assignment of nodes to the job is influenced by other attributes, especially the job priority and an eventual execution policy.

The maximum number of nodes is a manageable attribute, which means it can be dynamically updated, while the job is still executing, using the JPPF administration console or the related management APIs.

Example usage:

JPPFJob job = new JPPFJob();
// this job will execute on a maximum of 10 nodes
job.getSLA().setMaxNodes(10);

2.3 Maximum number of node provisioning groups

A node provisioning group designates a set of nodes made of one master node and its provisioned slave nodes, if it has any.
The SLA allows restricting a job execution to a maximum number of master node groups.
This SLA attribute is useful whenever you want to take advantage of the fact that, by definition, a master and its slave nodes all run on the same machine, for instance to exploit data locality properties.
This attribute's default value of is Integer.MAX_VALUE (231-1).

Note that this attribute does not specifically restrict the total number of nodes the job can run on, since each master node can have any number of slaves. For this, you also need to set the maximum number of nodes attribute.
Additionally, this attribute has no effect on the selection of nodes that are neither master nor slave, such as offline nodes.

Example usage:

JPPFJob job = new JPPFJob();
// only execute on a single group of master/slaves at a time
job.getSLA().setMaxNodeProvisioningGroups(1);
// further restrict to only the slave nodes in the provisioning group
job.getSLA().setExecutionPolicy(new Equal("jppf.node.provisioning.slave", true));

2.4 Initial suspended state

A job can be initially suspended. In this case, it will remain in the server's queue until it is explicitly resumed or canceled, or if it expires (if a timeout was set), whichever happens first.
A job can be resumed and suspended again any number of times via the JPPF administration console or the related management APIs.

Example usage:

JPPFJob job = new JPPFJob();
// this job will be submitted to the server and will remain suspended until
// it is resumed or cancelled via the admin console or management APIs
job.getSLA().setSuspended(true);

2.5 Broadcast jobs

A broadcast job is a specific type of job, for which each task will be be executed on all the nodes currently present in the grid.
This opens new possibilities for grid applications, such as performing maintenance operations on the nodes or drastically reducing the size of a job that performs identical tasks on each node.

With regards to the job SLA, a job is set in broadcast mode via a boolean indicator, for which the interface JobSLA provides the following accessors:

With respect to the dynamic aspect of a JPPF grid, the following behavior is enforced:

a broadcast job is executed on all the nodes connected to the driver, at the time the job is received by the JPPF driver. This includes nodes that are executing another job at that time

if a node dies or disconnects while the job is executing on it, the job is canceled for this node

if a new node connects while the job is executing, the broadcast job will not execute on it

a broadcast job does not return any results, i.e. it returns the tasks in the same state as they were submitted

Additionally, if local execution of jobs is enabled for the JPPF client, a broadcast job will not be executed locally. In other words, a broadcast job is only executed on remote nodes.

2.6 Canceling a job upon client disconnection

By default, if the JPPF client is disconnected from the server while a job is executing, the server will automatically attempt to cancel the job's execution on all nodes it was dispatched to, and remove the job from the server queue.
You may disable this behavior on a per-job basis, for example if you want to let the job execute until completion but do not need the execution results.

This property is set once for each job, and cannot be changed once the job has been submitted to the server, i.e. it is not dynamically manageable.

2.7 Expiration of job dispatches

Definition: a job dispatch is the whole or part of a job that is dispatched by the server to a node.

The server-side job SLA enables specifying whether a job dispatch will expire, along with the behavior upon exipration. This is done with a combination of two attributes: a dispatch expiration schedule, which specifies when the dispatch will expire, and a maximum number of expirations after which the tasks in the dispatch will be cancelled instead of resubmitted. By default, a job dispatch will not expire and the number of expirations is set to zero (tasks are cancelled upon the first expiration, if any).

One possible use for this mechanism is to prevent resource-intensive tasks from bloating slow nodes, without having to cancel the whole job or set timeouts on inidividual tasks.

Example usage:

JPPFJob job = new JPPFJob();
// job dispatches will expire if they execute for more than 5 seconds
job.getSLA().setDispatchExpirationSchedule(new JPPFSchedule(5000L));
// dispatched tasks will be resubmitted at most 2 times before they are cancelled
job.getSLA().setMaxDispatchExpirations(2);

2.8 Setting a class path onto the job

The classpath attribute of the job SLA allows sending library files along with the job and its tasks.
Out of the box, this attribute is notably used with offline nodes, to work around the fact that offline nodes do no have remote class loading capabilities.
The class path attribute, by default empty but not null, is accessed with the following methods:

public class JobSLA extends JobCommonSLA<JobSLA> {
// get / set the class path associated with the job
public ClassPath getClassPath();
public JobSLA setClassPath(ClassPath classpath);
}

We can see that a class path is represented by the ClassPath interface, defined as follows:

Note that one of the add(...) methods uses a ClassPathElement as parameter,
while the others use one or two Location objects (see the Location API section).
These methods are equivalent. For the last two, JPPF will internally create instances of a default implementation of ClassPathElement (class ClassPathElementImpl).
It is preferred to avoid creating ClassPathElement instances, as it makes the code less cumbersome and independent from any specific implementation.

The add(...) method which takes a boolean attribute copyToExistingFile allows you to specify whether the target location should be downloaded and/or copied from the source location, if it already exists on the node's file system.
As the attribute name indicates, this only applies to target locations that are files, that is, either instances of FileLocation
or URLLocation instances with a "file" URL protocol (e.g "file:/home/user/mylib.jar").

public interface ClassPathElement extends Serializable {
// get the source (relative to the client) location of this element
Location<?> getSourceLocation();
// get the target (relative to the node) location of this element, if any
Location<?> getTargetLocation();
// whether to copy to an already existing file target
boolean isCopyToExistingFile();
// perform a validation of this classpath element
boolean validate();
}

JPPF provides a default implementation ClassPathElementImpl which does not perform any validation, that is, its validate() method always returns true.

Finally, here is an example of how this can all be put together:

JPPFJob myJob = new JPPFJob();
ClassPath classpath = myJob.getSLA().getClassPath();
// wrap a jar file into a FileLocation object
Location jarLocation = new FileLocation(“libs/MyLib.jar”);
// copy the jar file in memory
Location location = jarLocation.copyTo(new MemoryLocation(jarLocation.size());
// or another way to do this:
location = new MemoryLocation(jarLocation.toByteArray());
// add it as classpath element
classpath.add(location);
// add another jar to download from Maven Central,
// which will be copied onto the node's local file system
Location<URL> source = new MavenCentralLocation("org.jppf:jppf-common:6.0");
Location<String> target = new FileLocation("templib/jppf-common-6.0.jar");
// don't download from maven central and copy to a file if the jar file already exists
classpath.add(source, target, false);
// tell the node to reset the tasks classloader with this new class path
classpath.setForceClassLoaderReset(true);

2.9 Maximum number of tasks resubmits

As we have seen in the "resubmitting a task" section, tasks have the ability to schedule themselves for resubmission by the server.
The job server-side SLA allows you to set the maximum number of times this can occur, with the following accessors:

public class JobSLA extends JobCommonSLA<JobSLA> {
// get the naximum number of times a task can resubmit itself
// via AbstractTask.setResubmit(boolean)
public int getMaxTaskResubmits()
// set the naximum number of times a task can resubmit itself
public JobSLA setMaxTaskResubmits(int maxResubmits)
// Determine whether the max resubmits limit for tasks is also applied
// when tasks are resubmitted due to a node error
public boolean isApplyMaxResubmitsUponNodeError()
// Specify whether the max resubmits limit for tasks should also be applied
// when tasks are resubmitted due to a node error
public JobSLA setApplyMaxResubmitsUponNodeError(boolean applyMaxResubmitsUponNodeError);
}

The default value for the maxTaskResubmits attribute is 1, which means that by default a task can resubmit itself at most once.
Additionally, this attribute can be overriden by setting the maxResubmits attribute of individual tasks.

The applyMaxResubmitsUponNodeError flag is set to false by default. This means that, when the tasks are resubmitted due to a node connection error, the resubmit will not count with regards to the limit.
To change this behavior, setApplyMaxResubmitsUponNodeError(true) must be called explicitely.

Example usage:

public class MyTask extends AbstractTask<String> {
@Override public void run() {
// unconditional resubmit could lead to an infinite loop
setResubmit(true);
// the result will only be kept after the max number of resubmits is reached
setResult("success");
}
}
JPPFJob job = new JPPFJob();
job.add(new MyTask());
// tasks can be resubmitted 4 times, meaning they can execute up to 5 times total
job.getSLA().setMaxTaskResubmits(4);
// resubmits due to node errors are also counted
job.getSLA().setApplyMaxResubmitsUponNodeError(true);
// ... submit the job and get the results ...

2.10 Disabling remote class loading during job execution

Jobs can specify whether remote class loader lookups are enabled during their execution in a remote node.
When remote class loading is disabled, lookups are only performed in the local classpath of each class loader in the class loader hierarchy, and no remote resource requests are sent to the server or client. This is done with the following accessors:

Note 1: when remote class loading is disabled, the classes that the JPPF node normally loads from the server cannot be loaded remotely either.
It is thus required to have these classes in the node's local classpath, which is usally done by adding the "jppf-server.jar" and "jppf-common.jar" files to the node's classpath.

Note 2: if a class is not found while remote class loading is disabled, it will remain not found, even if the next job specifies that remote class loading is enabled.
This is due to the fact that the JPPF class loaders maintain a cache of classes not found to avoid unnecessary remote lookups. To avoid this behavior, the task class loader should be reset before the next job is executed.

2.11 Grid policy

Jobs can also specify an execution policy that will be evaluated against the server and the totality of its nodes, instead of just against individual nodes as for the SLA's execution policy attribute we saw earlier in this documentation.

This grid policy is defined as a normal execution policy with two differences:

For example, to express and set the policy "execute the job when the server has at least 2 GB of avaialble heap memory and at least 3 nodes with more than 4 processing threads each", we would code something like this:

2.12 Specifying the desired node configuration

It is possible for a job to specify the configuration of the nodes it needs to run on and force eligible nodes to update their configuration accordingly and restart for the configuration changes to take place.
The specified configuration includes all existing JPPF properties, in particular "jppf.java.path" and "jppf.jvm.options", which allow specifiying the JVM and its options for running the node after restart.
It also includes any custom, application-defined property than can be expressed in a configuration file.

public class JobSLA extends JobCommonSLA<JobSLA> {
// Get the configuration of the node(s) this job should be executed on
public JPPFNodeConfigSpec getDesiredNodeConfiguration()
// Set the configuration of the node(s) this job should be executed on
public JobSLA setDesiredNodeConfiguration(JPPFNodeConfigSpec nodeConfigurationSpec)
}

The desired node configuration is specified as a JPPFNodeConfigSpec object, defined as follows:

public class JPPFNodeConfigSpec implements Serializable {
// Initialize this object with a desired configuration and a restart flag set to true
public JPPFNodeConfigSpec(TypedProperties desiredConfiguration)
throws IllegalArgumentException
// Initialize this object with a desired configuration and restart flag
public JPPFNodeConfigSpec(TypedProperties desiredConfiguration, boolean forceRestart)
throws IllegalArgumentException
// Get the desired JPPF configuration of each node
public TypedProperties getConfiguration()
// Determine whether to force the restart of a node after reconfiguring it
public boolean isForceRestart()
}

The configuration attribute specifies the properties that will be overriden or added to the node configuration. In terms of node selection, the JPPF server will prioritize the nodes whose configuration most closely matches the desired one,
by computing a similarity score which relies on the distances between the string values of the desired and actual properties. Only the properties specified in the configuration attribute are compared.

The forceRestart flag determines whether a node should be restarted when it matches exactly the desired configuration. If set to true, the nodes will always be restarted.
Otherwise, nodes that exactly match the desired configuration will not be restarted.

It is important to note that this SLA attribute is evaluated in combination with the other attrbiutes of the job SLA. In particular, it should not be confused with the execution policy, which is used to first filter eligible nodes,
whereas the desired node configuration is applied to eligble nodes and triggers a configuration change and restart in those nodes.

There are restrictions as to the kind of nodes that can be affected by this SLA attribute: since a configuration change and restart of the node is triggered, this can only be done with manageable nodes, which excludes offline nodes and Android nodes.
Furthermore, it does not apply to server-local nodes, since the node restart would also cause the server to be restarted.

Lastly, it is strongly advised to use this SLA attribute in combination with the maximum number of nodes and a job expiration:
since the reconfiguration and restart is very disruptive for the nodes, it has a non-trivial impact on performance, so you might want to limit the number of nodes that are restarted.
Also, between the request for the node reconfiguration and the time the node becomes available after restart, the server reserves the node for the specific job involved.
Setting an expiration timeout on the job ensures that the node can be reused for other jobs, should anything wrong happen. In effect, the server will remove all reservations for this job whenever it is cancelled or expires.

2.13 Specifying the job persistence

Job persistence in the driver is specified via the persistenceSpec attribute of the SLA:

public class JobSLA extends JobCommonSLA<JobSLA> {
// Get the specification of the job persistence in the driver
public PersistenceSpec getPersistenceSpec()
}

This attribute is an instance of the class PersistenceSpec, defined as follows:

public class PersistenceSpec implements Serializable {
// Determine whether the job is persisted in the driver. Defaults to false
public boolean isPersistent()
// Specify whether the job is persisted in the driver
public PersistenceSpec setPersistent(boolean persistent)
// Whether the driver should automatically execute the persisted job upon restart.
// Defaults to false
public boolean isAutoExecuteOnRestart()
// Specify whether the driver should automatically execute the job after a restart
public PersistenceSpec setAutoExecuteOnRestart(boolean autoExecuteOnRestart)
// Whether the job should be deleted from the store upon completion. Defaults to true
public boolean isDeleteOnCompletion()
// Determine whether the job should be deleted from the store upon completion
public PersistenceSpec setDeleteOnCompletion(boolean deleteOnCompletion)
}

Instances of this class manage three boolean flags:

the "persistent" flag determines whether the job is persisted at all. By default, it is set to false.

the "delete on completion" flag determines whether the job should be removed from the store when it completes. This flag is set to true by default.

the "auto execute on restart" flag tells a driver that, upon restart, it should automatically resubmit the job's unexecuted tasks until the job completes. This flag is set to false by default.

The following example shows how we would configure a persistent job that should be automatically executed upon driver restart and deleted from the store upon completion:

3 Client side SLA attributes

A client-side SLA is described by the interface JobClientSLA, defined as:

public interface JobClientSLA extends JobCommonSLA<JobClientSLA> {
// The maximum number of channels the job can be sent through,
// including the local executor if any is configured
public int getMaxChannels();
public JobClientSLA setMaxChannels(int maxChannels);
}

Note: since JPPF clients do not have a management interface, none of the client-side SLA attributes are manageable.

3.1 Maximum number of execution channels

The maximum number of channels attribute determines how many server connections a job can be sent through, at any given time. This is an upper bound limit, and does not guarantee that this number of channels will always be used. This attribute is also non-specific, since it does not specify which channels will be used.

Using more than one channel for a job enables faster I/O between the client and the server, since the job can be split in multiple chunks and sent to the server via multiple channels in parallel.

Note 1: when the JPPF client is configured with a single server connection, this attribute has no effect.

Note 2: when local execution is enabled in the JPPF client, the local executor counts as one (additional) channel.

Note 3: the resulting assignment of channels to the job is influenced by other attributes, especially the execution policy.

Example usage:

JPPFJob job = new JPPFJob();
// use 2 channels to send the job and receive the results
job.getClientSLA().setMaxChannels(2);