Training sessions on high performance computing are offered every semester. See the CSCAR WEBSITE for information and schedule.

Nitro

Overview

Nitro is designed to schedule thousands to millions of tasks very quickly. It works in conjunction with Torque, so you will create a file containing a list of tasks to perform that Nitro will execute, then you will submit a single PBS job that executes Nitro, which in turn will run through the list of tasks.

From Adaptive Computing’s website: “Nitro facilitates the execution of small compute tasks on a very large scale and without the overhead of individual scheduler jobs. Instead of creating individual jobs, Nitro combines all of the compute tasks into a single file. The file is then sent to Nitro as part of a job, and Nitro distributes the compute tasks across the allocated nodes. Tasks are executed on multiple threads on each compute node. Since the overhead of managing these tasks is small, most of the allocated compute resources can be spent executing the desired tasks.”

How to Use Nitro

To run jobs using Nitro, you need a PBS script and a Nitro Task File. In your PBS script, you will define job resource requirements and specify the location of your Nitro Task File. Then you will execute Nitro.

Example Files

To get started, you can download an example PBS and Nitro Task file with

git clone https://bitbucket.org/umarcts/nitro-examples.git

PBS Script

The PBS script for Nitro jobs is almost identical to a normal PBS script, with the following differences. You should specify the number of processors that you would like in groups of four. This enables Nitro to be efficient while allowing you to submit to a known number of processors in our heterogenous environment. You should also request the generic resource “nitro” for each processor. Lastly, you should specify your memory request using pmem (per processor memory) rather than mem. This will ensure that each processor has enough memory.

When you submit your PBS job ( $ qsub nitro.pbs), you’ll have a single PBS job for all of your Nitro tasks.

Monitoring Your Nitro Job

nitrostat

You can monitor the status of your Nitro job using the command nitrostat ####### (where ###### is your PBS job id). You must use the full name of the job (1234.nyx.arc-ts.umich.edu, not just the numeric part of the job id). The output will give you statistics of the run including the number of tasks, completion percentage, number of successes and failures, and the load average of the nodes running the job (note, the example job was not computationally intensive; you should expect to see a high load average if your jobs are fully utilizing the compute nodes).

Joblog and Tasklog

Unless you specify otherwise, Nitro will track and log in $HOME/nitro/full_pbs_jobid/ (e.g., /home/uniqname/nitro/17435106.nyx.arc-ts.umich.edu). Here you will find the joblog (which looks exactly like the output of nitrostat), the tasklog, which shows the stats of each task, and a directory of logs.

Advanced Options

Job Recovery

If your job has exited before it completed all of its tasks, you can restart the job from the last task completed. You simply need to set the NITROJOBID environment variable to the full JOBID of your job and resubmit.

$ export NITROJOBID=16725656.nyx.arc-ts.umich.edu
$ qsub nitro.pbs

Setting Options In your Nitro Task File

You can set additional options in your task file, if your tasks need more cores, memory, etc.