Summary

When starting a job, Upstart does not currently take the state of the system into account, and will start all jobs simultaneously. This specification proposes a method of adding resource locks, that prevent contending jobs from starting together and instead starts them in serial.

Rationale

Jobs frequently consume a large amount of a particular resource, the most common example being the CPU. It is often faster to start these jobs in serial than run them in parallel, especially where the resource is something with a seek penalty, for example a disk.

Use cases

Various jobs consume large amounts of the CPU time, they should be able to declare this and lock out other CPU-consuming jobs until they are complete.

File system checks are performed by partition, or mapped block device; yet there is a massive performance penalty for attempting to check multiple partitions on the same disk simultaneously.

Scope

This scope of this specification is limited to the implementation of resource locks within the existing process state machine.

Design

Resources will be named by a simple string, and will have an associated floating point value.

cpu 2.0

The cpu resource will be built-in, and set to the number of processors available on the system.

The available amount of other resources may be configured by the system administrator in init.conf:

resource bacon 2.0

Otherwise all resource availability defaults to 1.0

A new uses job definition stanza will be added to the configuration file syntax. This defines how much of a resource the job consumes.

uses cpu 0.5

The amount of resource is optional, if not present it defaults to 1.0

uses cpu

This stanza undergoes environment variable expansion:

uses $DEVICE

Upstart will not move the job from the waiting state if it consumes more resource than is available.

Implementation

Code

The JobConfig structure will gain a new array of resource names that it consumes and the amount it will consume; likewise the Job structure will gain an array of resource names that it is actually going to consume.

A global Resources list will exist holding the current level of resource available for each known.

On creating a new job, the resource names will be expanded and stored in the new instance structure.

Before changing the state, all of these resources will be checked to ensure that they do not exist or if the amount required is subtracted, they do not become less than zero. If true, any missing resources are added with 1.0, then the global resource amount of all is subtracted and the job state is changed.

If false (insufficient resource exists), the job will remain in the waiting state.

When a job reaches the waiting state again, the resources consumed will be added back to the global resource amount. The jobs stuck in the waiting state will be iterated, and if the required resource is now free, will have their state changed.

This will likely require a function to determine whether a job can state change, *sigh*