checkjob

checkjob displays detailed job
state information and diagnostic output for
a specified job. Detailed information is available for queued, blocked, active,
and recently completed jobs.
The checkjob command shows the master job of an array as well as a summary of array sub-jobs, but does not display all sub-jobs. Use checkjob -v to display all job-array sub-jobs.

Access

This command can be run by level 1-3 Moab
administrators for any job. Also, end users can use checkjob to view the
status of their own jobs.

Arguments

--flags

Format:

--flags=future

Default:

---

Description:

Evaluates future eligibility of job (ignore current
resource state and usage limitations).

Checks job access to specified node and preemption status with regards to jobs located on that
node.

Example:

> checkjob -n node113 6235

-q (QoS)

Format:

<QOSID>

Default:

---

Description:

Checks job access to specified QoS <QOSID>.

Example:

> checkjob -q special 6235

-r (Reservation)

Format:

<RSVID>

Default:

---

Description:

Checks job access to specified reservation
<RSVID>.

Example:

> checkjob -r orion.1 6235

-v (Verbose)

Format:

Default:

N/A

Description:

Sets verbose mode. If the job is part of an array, the -v option shows pertinent array information before the job-specific information (see Example 2 and Example 3 for differences between standard output and -v output).

Specifying the double verbose ("-v -v") displays additional information about the job. See the Output table for details.

Example:

> checkjob -v 6235

Details

This command allows any Moab administrator
to check the detailed status and resource requirements of a active, queued, or recently
completed job. Additionally,
this command performs numerous diagnostic checks and determines if and where the job
could potentially run. Diagnostic checks include policy violations, reservation constraints,
preemption status, and job to resource mapping. If a job cannot run, a text
reason is provided along with a summary of how many nodes are and are not
available. If the -v flag is specified, a node by node summary of resource
availability will be displayed for idle jobs.

Job Eligibility

If a job cannot run, a text reason is provided along with a summary of
how many nodes are and are not available. If the -v flag is specified, a
node by node summary of resource availability will be displayed for idle jobs.
For job level eligibility issues, one of the following reasons will be
given:

Reason

Description

job has hold in place

one or more job holds are currently in place

insufficient idle procs

there are currently not adequate processor resources
available to start the job

idle procs do not meet requirements

adequate idle processors are available but these do not
meet job requirements

start date not reached

job has specified a minimum start date which is
still in the future

expected state is not idle

job is in an unexpected state

state is not idle

job is not in the idle state

dependency is not met

job depends on another job reaching a certain state

rejected by policy

job start is prevented by a throttling policy

If a job cannot run on a particular node, one of the following 'per
node' reasons will be given:

Class

Node does not allow required job class/queue

CPU

Node does not possess required processors

Disk

Node does not possess required local disk

Features

Node does not possess required node features

Memory

Node does not possess required real memory

Network

Node does not possess required network
interface

State

Node is not Idle or Running

Reservation Access

The -r flag can be used to provide detailed
information about job access to a specific reservation

Preemption Status

If a job is marked as a preemptor and the -v and -n flags are specified,
checkjob will perform a job by job analysis for all jobs on the specified node
to determine if they can be preempted.

Output

The checkjob command displays the
following job attributes:

Attribute

Value

Description

Account

<STRING>

Name of account associated with job

Actual Run Time

[[[DD:]HH:]MM:]SS

Length of time job actually ran.

This info is only displayed in simulation mode.

Allocated Nodes

Square bracket delimited list of node and processor
ids

List of nodes and processors allocated to job

Applied Nodeset**

<STRING>

Nodeset used for job's node allocation

Arch

<STRING>

Node architecture required by job

Attr

square bracket delimited list of job attributes

Job Attributes (i.e.
[BACKFILL][PREEMPTEE])

Available Memory**

<INTEGER>

The available memory requested by job. Moab displays the relative or exact value by returning a comparison symbol (>, <, >=, <=, or ==) with the value (i.e. Available Memory <= 2048).

Available Swap**

<INTEGER>

The available swap requested by job. Moab displays the relative or exact value by returning a comparison symbol (>, <, >=, <=, or ==) with the value (i.e. Available Swap >= 1024).

Average Utilized Procs*

<FLOAT>

Average load balance for a job

Avg Util Resources Per Task*

<FLOAT>

BecameEligible

<TIMESTAMP>

The date and time when the job moved from Blocked to Eligible.

Bypass

<INTEGER>

Number of times a lower priority job with a later submit
time ran before the job

CheckpointStartTime**

[[[DD:]HH:]MM:]SS

The time the job was first checkpointed

Class

[<CLASS NAME> <CLASS COUNT>]

Name of class/queue required by job and number of class
initiators required per task.

RESID specifies the reservation id, TIME1 is the
relative start time, TIME2 the relative end time, TIME3 the duration of the
reservation

Req

[<INTEGER>] TaskCount: <INTEGER> Partition:
<partition>

A job
requirement for a single type of resource followed by the number of tasks
instances required and the appropriate partition

StartCount

<INTEGER>

Number of times job has been started by Moab

StartPriority

<INTEGER>

Start priority of job

StartTime

<TIME>

Time job was started by the resource management
system

State

One of Idle, Starting, Running, etc

Current Job State

SubmitTime

<TIME>

Time job was submitted to resource management
system

Swap

<INTEGER>

Amount of swap disk required by job (in MB)

Task Distribution*

Square bracket delimited list of nodes

Time Queued

Total Requested Nodes**

<INTEGER>

Number of nodes the job requested

Total Requested Tasks

<INTEGER>

Number of tasks requested by job

User

<STRING>

Name of user submitting job

Utilized Resources Per Task*

<FLOAT>

WallTime

[[[DD:]HH:]MM:]SS of [[[DD:]HH:]MM:]SS

Length of time job has been running out of the specified
limit

In the above table, fields marked with an asterisk (*) are only displayed
when set or when the -v flag is specified. Fields marked with two asterisks (**) are only displayed when set or when the -v -v flag is specified.