http://support.beocat.ksu.edu/BeocatDocs/api.php?action=feedcontributions&user=Daveturner&feedformat=atomBeocat - User contributions [en]2019-03-21T21:09:47ZUser contributionsMediaWiki 1.31.0http://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=458AdvancedSlurm2019-03-19T21:05:03Z<p>Daveturner: /* Scratch file system */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system may be faster than /bulk or /homes since each file written will access fewer disks.<br />
In order to use scratch, you first need to make a directory for yourself. <br />
Scratch is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system may get purged after 30 days. <br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;B&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/B&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
export DMTCP_CHECKPOINT_DIR=$ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
You will need to run several tests to verify that DMTCP is working properly with your application.<br />
First, run a short test without DMTCP and another with DMTCP with the checkpoint interval set to 5 minutes<br />
by adding the line &lt;B&gt;export DMTCP_CHECKPOINT_INTERVAL=300&lt;/B&gt; to your script. Then use &lt;B&gt;kstat -d 1&lt;/B&gt; to<br />
check that the memory in both runs is close to the same. Also use this information to calculate the time <br />
that each checkpoint takes. In most cases I've seen times less than a minute for checkpointing that will normally<br />
be done once each hour. If your application is taking more time, let us know. Sometimes this can be sped up<br />
by simply turning off compression by adding the line &lt;B&gt;export DMTCP_GZIP=0&lt;/B&gt;. Make sure to remove the<br />
line where you set the checkpoint interval to 300 seconds so that the default time of once per hour will be used.<br />
<br />
After verifying that your code completes using DMTCP and does not take significantly more time or memory, you<br />
will need to start a run then &lt;B&gt;scancel&lt;/B&gt; it after the first checkpoint, then resubmit the same script to make <br />
sure that it restarts and runs to completion. If you are working with an array job script, the last is to try a few<br />
array tasks at once to make sure there is no conflict between the jobs.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=457Installed software2019-03-01T21:53:14Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, run &lt;tt&gt;module avail&lt;/tt&gt;<br />
<br />
Alternatively, we update our [[ModuleList]] whenever we get a chance.<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; fosscuda: GNU Compiler Collection (GCC) based compiler toolchain based on FOSS with CUDA support.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support. '''DEPRECATED'''<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. '''DEPRECATED'''<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain/' to see the versions we have:<br />
$ module spider iomkl/<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI/':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module spider R/):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java/):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python/)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using MPI with Python within a job ====<br />
Here is a simple job script using MPI with Python<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
export PYTHONDONTWRITEBYTECODE=1<br />
PYTHON_BINARY=$(which python)<br />
mpirun ${PYTHON_BINARY} ~/path/to/your/mpi/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .<br />
<br />
You will need to set up a python virtual environment and load the &lt;B&gt;nltk&lt;/B&gt; package <br />
before you run the first time.<br />
<br />
module load Python<br />
mkdir -p ~/virtualenvs<br />
cd ~/virtualenvs<br />
virtualenv spark-test<br />
source ~/virtualenvs/spark-test/bin/activate<br />
pip install nltk<br />
pip install numpy<br />
deactivate<br />
<br />
To run the sample code interactively, load the Python and Spark modules,<br />
source your python virtual environment, change to the sample directory, fire up pyspark, <br />
then execute the sample code.<br />
<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
module load Spark<br />
cd ~/spark-test<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl/):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL/<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=456Installed software2019-03-01T21:52:13Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, run &lt;tt&gt;module avail&lt;/tt&gt;<br />
<br />
Alternatively, we update our [[ModuleList]] whenever we get a chance.<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; fosscuda: GNU Compiler Collection (GCC) based compiler toolchain based on FOSS with CUDA support.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support. '''DEPRECATED'''<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. '''DEPRECATED'''<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain/' to see the versions we have:<br />
$ module spider iomkl/<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI/':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module spider R/):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java/):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python/)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using MPI with Python within a job ====<br />
Here is a simple job script using MPI with Python<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
export PYTHONDONTWRITEBYTECODE=1<br />
PYTHON_BINARY=$(which python)<br />
mpirun ${PYTHON_BINARY} ~/path/to/your/mpi/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .<br />
<br />
You will need to set up a python virtual environment and load the &lt;B&gt;nltk&lt;/B&gt; package <br />
before you run the first time.<br />
<br />
module load Python<br />
mkdir -p ~/virtualenvs<br />
cd ~/virtualenvs<br />
virtualenv spark-test<br />
source ~/virtualenvs/spark-test/bin/activate<br />
pip install nltk<br />
pip install numpy<br />
deactivate<br />
<br />
<br />
To run the sample code interactively, load the Python and Spark modules,<br />
source your python virtual environment, change to the sample directory, fire up pyspark, <br />
then execute the sample code.<br />
<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
module load Spark<br />
cd ~/spark-test<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl/):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL/<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=455Installed software2019-03-01T21:43:17Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, run &lt;tt&gt;module avail&lt;/tt&gt;<br />
<br />
Alternatively, we update our [[ModuleList]] whenever we get a chance.<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; fosscuda: GNU Compiler Collection (GCC) based compiler toolchain based on FOSS with CUDA support.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support. '''DEPRECATED'''<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. '''DEPRECATED'''<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain/' to see the versions we have:<br />
$ module spider iomkl/<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI/':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module spider R/):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java/):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python/)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using MPI with Python within a job ====<br />
Here is a simple job script using MPI with Python<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
export PYTHONDONTWRITEBYTECODE=1<br />
PYTHON_BINARY=$(which python)<br />
mpirun ${PYTHON_BINARY} ~/path/to/your/mpi/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .<br />
<br />
You will need to set up a python virtual environment and load the &lt;B&gt;nltk&lt;/B&gt; package <br />
before you run the first time.<br />
<br />
module load Python<br />
mkdir -p ~/virtualenvs<br />
cd ~/virtualenvs<br />
virtualenv spark-test<br />
source ~/virtualenvs/spark-test/bin/activate<br />
pip install nltk<br />
deactivate<br />
<br />
<br />
To run the sample code interactively, load the Python and Spark modules,<br />
source your python virtual environment, change to the sample directory, fire up pyspark, <br />
then execute the sample code.<br />
<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
module load Spark<br />
cd ~/spark-test<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl/):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL/<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=454Installed software2019-03-01T20:43:24Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, run &lt;tt&gt;module avail&lt;/tt&gt;<br />
<br />
Alternatively, we update our [[ModuleList]] whenever we get a chance.<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; fosscuda: GNU Compiler Collection (GCC) based compiler toolchain based on FOSS with CUDA support.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support. '''DEPRECATED'''<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK. '''DEPRECATED'''<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain/' to see the versions we have:<br />
$ module spider iomkl/<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI/':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module spider R/):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java/):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python/)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using MPI with Python within a job ====<br />
Here is a simple job script using MPI with Python<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
export PYTHONDONTWRITEBYTECODE=1<br />
PYTHON_BINARY=$(which python)<br />
mpirun ${PYTHON_BINARY} ~/path/to/your/mpi/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .<br />
<br />
You will need to set up a python virtual environment and load the &lt;B&gt;nltk&lt;/B&gt; package <br />
before you run the first time.<br />
<br />
module load Python<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
virtualenv spark-test<br />
source ~/virtualenvs/spark-test/bin/activate<br />
pip install nltk<br />
deactivate<br />
<br />
<br />
To run the sample code interactively, load the Python and Spark modules,<br />
source your python virtual environment, change to the sample directory, fire up pyspark, <br />
then execute the sample code.<br />
<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
module load Spark<br />
cd ~/spark-test/Shakespeare<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl/):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL/<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=449Installed software2019-02-27T21:52:56Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, see [[ModuleList]]<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; gcccuda: GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; gompic: GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain' to see the versions we have:<br />
$ module spider iomkl<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module -r spider '^R$'):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .<br />
<br />
The sample code requires 'nltk' and 'numpy' packages, so the first time you run it, you need to create the virtualenv and install these packages.<br />
<br />
module load Python<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
virtualenv spark-test<br />
source ~/virtualenvs/spark-test/bin/activate<br />
pip install nltk<br />
pip install numpy<br />
<br />
On any subsequent runs, you can then just enter that virtualenv without running all of the above commands:<br />
<br />
module load Python<br />
source ~/virtualenvs/spark-test/bin/activate<br />
<br />
Then load the Spark module (Python should already be loaded from above), change to the sample directory, fire up pyspark, and run the sample code.<br />
<br />
module load Spark<br />
cd ~/spark-test/Shakespeare<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Main_Page&diff=446Main Page2019-02-25T22:55:16Z<p>Daveturner: /* What is Beocat? */</p>
<hr />
<div>== What is Beocat? ==<br />
Beocat is the [[wikipedia:High-performance_computing|High-Performance Computing (HPC)]] cluster at [http://www.ksu.edu Kansas State University]. It is run by the Institute for Computational Research, which is a function of the [http://www.cs.ksu.edu/ Computer Science] department. Beocat is available to any educational researcher in the state of Kansas (and his or her collaborators) without cost. Priority access is given to those researchers who have contributed resources.<br />
<br />
Beocat is actually comprised of several different cluster computing systems<br />
* &quot;Beocat&quot;, as used by most people is a [[wikipedia:Beowulf cluster|Beowulf cluster]] of CentOS Linux servers coordinated by the [https://slurm.schedmd.com/ Slurm] job submission and scheduling system. Our [[Compute Nodes]] (hardware) and [[installed software]] have separate pages on this wiki. The current status of this cluster can be monitored by visiting [http://ganglia.beocat.ksu.edu/ http://ganglia.beocat.ksu.edu/].<br />
* A small [[wikipedia:Openstack|Openstack]] cloud-computing infrastructure<br />
<br />
== How Do I Use Beocat? ==<br />
First, you need to get an account by visiting [https://account.beocat.ksu.edu/ https://account.beocat.ksu.edu/] and filling out the form. In most cases approval for the account will be granted in less than one business day, and sometimes much sooner. When your account has been approved, you will be added to our [[LISTSERV]], where we announce any changes, maintenance periods, or other issues.<br />
<br />
Once you have an account, you can access Beocat via SSH and can transfer files in or out via SCP or SFTP (or [https://www.globus.org/ Globus Connect] using the endpoint ''beocat#beocat''). If you don't know what those are, please see our [[LinuxBasics]] page. If you are familiar with these, connect your client to headnode.beocat.ksu.edu and use your K-State eID credentials to login.<br />
<br />
As mentioned above, we use Slurm for job submission and scheduling. If you've never worked with a batch-queueing system before, submitting a job is different than running on a standalone Linux machine. Please see our [[SlurmBasics]] page for an introduction on how to submit your first job. If you are already familiar with Slurm, we also have an [[AdvancedSlurm]] page where we can adjust the fine-tuning. If you're new to HPC, we highly recommend the [http://www.oscer.ou.edu/education.php Supercomputing in Plain English (SiPE)] series by OU. In particular, the older course's streaming videos are an excellent resource, even if you do not complete the exercises.<br />
<br />
&lt;H4&gt;Get an account at [https://account.beocat.ksu.edu/ https://account.beocat.ksu.edu/]&lt;BR&gt;<br />
Read about [[Installed software]] and languages&lt;BR&gt;<br />
Learn about Slurm at [[SlurmBasics]] and [[AdvancedSlurm]]&lt;BR&gt;<br />
&lt;/H4&gt;<br />
<br />
== Writing and Installing Software on Beocat ==<br />
* If you are writing software for Beocat and it is in an installed scripting language like R, Perl, or Python, please look at our [[Installed software]] page to see what we have available and any usage guidelines we have posted there.<br />
* If you need to write compiled code such as Fortran, C, or C++, we offer both GNU and Intel compilers. See our [[FAQ]] for more details.<br />
* In either case, we suggest you head to our [[Tips and Tricks]] page for helpful hints.<br />
* If you wish to install software in your home directory, we have a [[Training Videos#Installing_files_in_your_Home_Directory|video]] showing how to do this.<br />
<br />
== How do I get help? ==<br />
You're in our support Wiki now, and that's a great place to start! We highly suggest that before you send us email, you visit our [[FAQ]]. If you're just getting started our [[Training Videos]] might be useful to you.<br />
<br />
If your answer isn't there, you can email us at [mailto:beocat@cs.ksu.edu beocat@cs.ksu.edu]. ''Please'' send all email to this address and not to any of our staff directly. This will ensure your support request gets entered into our tracker, and will get your questions answered as quickly as possible. Please keep the subject line as descriptive as possible and include any pertinent details to your problem (i.e. job ids, commands run, working directory, program versions,.. etc). If the problem is occurring on a headnode, please be sure to include the name of the headnode. This can be found by running the &lt;tt&gt;hostname&lt;/tt&gt; command.<br />
<br />
We are also available on IRC on the [http://freenode.net/using_the_network.shtml freenode chat servers] in the channel #beocat. This is useful ''especially'' if you have a quick question, as you'd be surprised the times when at least one of us is around. If you do have a question be sure to mention '''m0zes''' and/or '''kylehutson''' in your message, and it should grab our attention. Available from a web browser [[Special:WebChat|here.]]<br />
<br />
For in person help, we offer a weekly open support session as mentioned in our calendar down below. Alternatively, we can often schedule a time to meet with you individually. You just need to send us an e-mail and provide us with the details we asked for above.<br />
<br />
&lt;H4&gt;<br />
Again, when you email us at [mailto:beocat@cs.ksu.edu beocat@cs.ksu.edu] please give us the job ID number, the path and script name for the job, and a full description of the problem. It may also be useful to include the output to 'module list'.<br />
&lt;/H4&gt;<br />
<br />
== Twitter ==<br />
We now have [https://twitter.com/KSUBeocat Twitter]. Follow us to find out the latest from Beocat, or tweet to us to find answers to quick questions. This won't replace the mailing list for major announcements, but will be used for more minor notices.<br />
Here are some recent tweets:<br />
&lt;ShoogleTweet limit=&quot;6&quot;&gt;KSUBeocat&lt;/ShoogleTweet&gt;<br />
<br />
== How do I get priority access ==<br />
We're glad you asked! Contact [mailto:dan@ksu.edu Dr. Dan Andresen] to find out how contributions to Beocat will prioritize your access to Beocat.<br />
<br />
== Policies ==<br />
You can find our policies [[Policy|here]]<br />
<br />
== Credits and Accolades ==<br />
See the published credits and other accolades received by Beocat [[Credits|here]]<br />
<br />
== Upcoming Events ==<br />
{{#widget:Google Calendar<br />
|id=hek6gpeu4bg40tdb2eqdrlfiuo@group.calendar.google.com<br />
|color=711616<br />
|view=AGENDA<br />
}}</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=445Installed software2019-02-25T22:52:22Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, see [[ModuleList]]<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; gcccuda: GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; gompic: GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain' to see the versions we have:<br />
$ module spider iomkl<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module -r spider '^R$'):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 -mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare .<br />
<br />
Then load the Spark and Python modules, fire up pyspark, and run the sample code.<br />
<br />
module load Spark Python<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
# If there is no Spark Context (not running interactive from pyspark), create it<br />
try:<br />
sc<br />
except NameError:<br />
from pyspark import SparkConf, SparkContext<br />
conf = SparkConf().setMaster(&quot;local&quot;).setAppName(&quot;App&quot;)<br />
sc = SparkContext(conf = conf)<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=444Installed software2019-02-25T22:49:27Z<p>Daveturner: /* Spark */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, see [[ModuleList]]<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; gcccuda: GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; gompic: GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain' to see the versions we have:<br />
$ module spider iomkl<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module -r spider '^R$'):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
Spark is a programming language for large scale data processing.<br />
It can be used in conjunction with Python, R, Scala, Java, and SQL.<br />
Spark can be run on Beocat interactively or through the Slurm queue.<br />
<br />
To run interactively, you must first request a node or nodes from the Slurm queue.<br />
The line below requests 1 node and 1 core for 24 hours and if available will drop<br />
you into the bash shell on that node.<br />
<br />
srun -J srun -N 1 -n 1 -t 24:00:00 -mem=10G --pty bash<br />
<br />
We have some sample python based Spark code you can try out that came from the <br />
exercises and homework from the PSC Spark workshop. <br />
<br />
mkdir spark-test<br />
cd spark-test<br />
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare .<br />
<br />
Then load the Spark and Python modules, fire up pyspark, and run the sample code.<br />
<br />
module load Spark Python<br />
pyspark<br />
&gt;&gt;&gt; exec(open(&quot;shakespeare.py&quot;).read())<br />
<br />
You can work interactively from the pyspark prompt (&gt;&gt;&gt;) in addition to running scripts as above.<br />
<br />
The Shakespeare directory also contains a sample sbatch submit script that will run the <br />
same shakespeare.py code through the Slurm batch queue. <br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=shakespeare<br />
#SBATCH --mem=10G<br />
#SBATCH --time=01:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
# Load Spark and Python (version 3 here)<br />
module load Spark<br />
module load Python<br />
<br />
spark-submit shakespeare.py<br />
<br />
When you run interactively, pyspark initializes your spark context &lt;B&gt;sc&lt;/B&gt;.<br />
You will need to do this manually as in the sample python code when you want<br />
to submit jobs through the Slurm queue.<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=443Installed software2019-02-25T22:27:57Z<p>Daveturner: /* Most Commonly Used Software */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, see [[ModuleList]]<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; gcccuda: GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; gompic: GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain' to see the versions we have:<br />
$ module spider iomkl<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module -r spider '^R$'):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;R&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments (the virtual environment name should be in the output):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
ls ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Activate one of these<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
source ~/virtualenvs/test/bin/activate<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment test<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://spark.apache.org/ Spark] ===<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
&lt;syntaxhighlight lang=&quot;MATLAB&quot;&gt;<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
&lt;/syntaxhighlight&gt;<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
&lt;/syntaxhighlight&gt;<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)<br />
# load the comsol module on the headnode<br />
module load COMSOL<br />
# export your comsol license as mentioned above, and tell the scheduler to run the software<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== .NET Core ===<br />
==== Load .NET ====<br />
mozes@[eunomia] ~ $ module load dotNET-Core-SDK<br />
==== create an application ====<br />
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application<br />
mozes@[eunomia] ~ $ mkdir Hello<br />
<br />
mozes@[eunomia] ~ $ cd Hello<br />
<br />
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet new console<br />
The template &quot;Console Application&quot; was created successfully.<br />
<br />
Processing post-creation actions...<br />
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...<br />
Restoring packages for /homes/mozes/Hello/Hello.csproj...<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.<br />
Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.<br />
Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.<br />
<br />
Restore succeeded.<br />
<br />
==== Edit your program ====<br />
mozes@[eunomia] ~/Hello $ vi Program.cs<br />
==== Run your .NET application ====<br />
mozes@[eunomia] ~/Hello $ dotnet run<br />
Hello World!<br />
==== Build and run the built application ====<br />
mozes@[eunomia] ~/Hello $ dotnet build<br />
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core<br />
Copyright (C) Microsoft Corporation. All rights reserved.<br />
<br />
Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.<br />
Hello -&gt; /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll<br />
<br />
Build succeeded.<br />
0 Warning(s)<br />
0 Error(s)<br />
<br />
Time Elapsed 00:00:02.86<br />
<br />
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll<br />
Hello World!<br />
<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=442AdvancedSlurm2019-02-05T23:58:48Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;B&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/B&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
export DMTCP_CHECKPOINT_DIR=$ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
You will need to run several tests to verify that DMTCP is working properly with your application.<br />
First, run a short test without DMTCP and another with DMTCP with the checkpoint interval set to 5 minutes<br />
by adding the line &lt;B&gt;export DMTCP_CHECKPOINT_INTERVAL=300&lt;/B&gt; to your script. Then use &lt;B&gt;kstat -d 1&lt;/B&gt; to<br />
check that the memory in both runs is close to the same. Also use this information to calculate the time <br />
that each checkpoint takes. In most cases I've seen times less than a minute for checkpointing that will normally<br />
be done once each hour. If your application is taking more time, let us know. Sometimes this can be sped up<br />
by simply turning off compression by adding the line &lt;B&gt;export DMTCP_GZIP=0&lt;/B&gt;. Make sure to remove the<br />
line where you set the checkpoint interval to 300 seconds so that the default time of once per hour will be used.<br />
<br />
After verifying that your code completes using DMTCP and does not take significantly more time or memory, you<br />
will need to start a run then &lt;B&gt;scancel&lt;/B&gt; it after the first checkpoint, then resubmit the same script to make <br />
sure that it restarts and runs to completion. If you are working with an array job script, the last is to try a few<br />
array tasks at once to make sure there is no conflict between the jobs.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=441AdvancedSlurm2019-02-05T23:48:50Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;B&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/B&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
export DMTCP_CHECKPOINT_DIR=$ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
You will need to run several tests to verify that DMTCP is working properly with your application.<br />
First, run a short test without DMTCP and another with DMTCP with the checkpoint interval set to 5 minutes<br />
by adding the line &lt;B&gt;export DMTCP_CHECKPOINT_INTERVAL=300&lt;/B&gt; to your script. Then use &lt;B&gt;kstat -d 1&lt;/B&gt; to<br />
check that the memory in both runs is close to the same. Also use this information to calculate the time <br />
that each checkpoint takes. In most cases I've seen times less than a minute for checkpointing that will normally<br />
be done once each hour. If your application is taking more time, let us know. Sometimes this can be sped up<br />
by simply turning off compression by adding the line &lt;B&gt;export DMTCP_GZIP=0&lt;/B&gt;.<br />
<br />
After verifying that your code completes using DMTCP and does not take significantly more time or memory, you<br />
will need to start a run then scancel it after the first checkpoint, then resubmit the same script to make sure that<br />
it restarts and runs to completion. If you are working with an array job script, the last is to try a few<br />
array jobs at once to make sure there is no conflict between runs.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=440AdvancedSlurm2019-02-05T23:41:39Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;I&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/I&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
You will need to run several tests to verify that DMTCP is working properly with your application.<br />
First, run a short test without DMTCP and another with DMTCP with the checkpoint interval set to 5 minutes<br />
by adding the line &lt;B&gt;export DMTCP_CHECKPOINT_INTERVAL=300&lt;/B&gt; to your script. Then use &lt;B&gt;kstat -d 1&lt;/B&gt; to<br />
check that the memory in both runs is close to the same. Also use this information to calculate the time <br />
that each checkpoint takes. In most cases I've seen times less than a minute for checkpointing that will normally<br />
be done once each hour. If your application is taking more time, let us know. Sometimes this can be sped up<br />
by simply turning off compression.<br />
<br />
After verifying that your code completes using DMTCP and does not take significantly more time or memory, you<br />
will need to start a run then scancel it after the first checkpoint, then resubmit the same script to make sure that<br />
it restarts and runs to completion. If you are working with an array job script, the last is to try a few<br />
array jobs at once to make sure there is no conflict between runs.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=439AdvancedSlurm2019-02-05T23:31:32Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;I&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/I&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=438AdvancedSlurm2019-02-05T23:30:54Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;t&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;/t&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=437AdvancedSlurm2019-02-05T23:29:56Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used to start from a checkpoint if the job has failed and been rescheduled.<br />
If you are putting this in an array script, then add the Slurm array task ID to the end of the ckeckpoint directory name<br />
like &lt;t&gt;ckptdir=ckpt-$SLURM_ARRAY_TASK_ID&lt;t&gt;.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
ckptdir=ckpt<br />
mkdir -p $ckptdir<br />
<br />
if ! ls -1 $ckptdir | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch --no-coordinator mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart from $ckptdir to continue from a checkpoint&quot;<br />
dmtcp_restart $ckptdir/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Graphical Applications ==<br />
Some applications are graphical and need to have some graphical input/output. We currently accomplish this with X11 forwarding<br />
=== Connecting with an X11 client ===<br />
==== Windows ====<br />
If you are running Windows, we recommend MobaXTerm as your file/ssh manager, this is because it is one relatively simple tool to do everything. MobaXTerm also automatically connects with X11 forwarding enabled.<br />
==== Linux/OSX ====<br />
Both Linux and OSX can connect in an X11 forwarding mode. Linux will have all of the tools you need installed already, OSX will need [https://www.xquartz.org/ XQuartz] installed.<br />
<br />
Then you will need to change your 'ssh' command slightly:<br />
<br />
ssh -Y eid@headnode.beocat.ksu.edu<br />
<br />
The '''-Y''' argument tells ssh to setup X11 forwarding.<br />
=== Starting an Graphical job ===<br />
All graphical jobs, by design, must be interactive, so we'll use the srun command. On a headnode, we run the following:<br />
# load an X11 enabled application<br />
module load Octave<br />
# start an X11 job, sbatch arguments are accepted for srun as well, 1 node, 1 hour, 1 gb of memory<br />
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 octave --gui<br />
<br />
Because these jobs are interactive, they may not be able to run at all times, depending on how busy the scheduler is at any point in time. '''--pty --x11''' are required arguments setting up the job, and '''octave --gui''' is the command to run inside the job.<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=409AdvancedSlurm2018-09-19T21:12:38Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used<br />
to start from a checkpoint if the job has failed and been rescheduled.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
if ! ls -1 ckpt | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart to continue from a checkpoint&quot;<br />
dmtcp_restart ckpt/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-bioinfo.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=408AdvancedSlurm2018-09-19T21:12:13Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used<br />
to start from a checkpoint if the job has failed and been rescheduled.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
if ! ls -1 ckpt | grep -c dmtcp_restart_script &gt; /dev/null<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart to continue from a checkpoint&quot;<br />
dmtcp_restart ckpt/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;--gres=killable:1&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;scancel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, the group gets access to a &quot;partition&quot; giving you priority on those nodes.<br />
<br />
In most situations, the scheduler will automatically add those priority partitions to the jobs as submitted. You should not need to include a partition list in your job submission.<br />
<br />
There are currently just a few exceptions that we will not automatically add:<br />
* ksu-chem-mri.q<br />
* ksu-gen-bioinfo.q<br />
* ksu-gen-gpu.q<br />
* ksu-gen-highmem.q<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
ksu-gen-highmem.q<br />
<br />
If you have access to those any of the non-automatic partitions, and have need of the resources in that partition, you can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=ksu-gen-highmem.q<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=403AdvancedSlurm2018-09-14T03:59:55Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if for example the node you are running on fails. <br />
This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used<br />
to start from a checkpoint if the job has failed and been rescheduled.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
if [ ! $SLURM_RESTART_COUNT ]<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart to continue from a checkpoint&quot;<br />
dmtcp_restart ckpt/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=402AdvancedSlurm2018-09-14T03:58:12Z<p>Daveturner: /* Checkpoint/Restart using DMTCP */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
DMTCP is Distributed Multi-Threaded CheckPoint software that will checkpoint your application without modification, and<br />
can be set up to automatically restart your job from the last checkpoint if your job fails. This has been tested successfully<br />
on Beocat for some scalar and OpenMP codes, but has failed on all MPI tests so far. We would like to encourage users to<br />
try DMTCP out if their non-MPI jobs run longer than 24 hours. If you want to try this, please contact us first since we are still<br />
experimenting with DMTCP.<br />
<br />
The sample job submission script below shows how dmtcp_launch is used to start the application, then dmtcp_restart is used<br />
to start from a checkpoint if the job has failed and been rescheduled.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=gromacs<br />
#SBATCH --mem=50G<br />
#SBATCH --time=24:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=4<br />
<br />
module purge<br />
module load GROMACS/2016.4-foss-2017beocatb-hybrid<br />
module load DMTCP<br />
module list<br />
<br />
if [ ! $SLURM_RESTART_COUNT ]<br />
then<br />
echo &quot;Using dmtcp_launch to start the app the first time&quot;<br />
dmtcp_launch mpirun -np 1 -x OMP_NUM_THREADS=4 gmx_mpi mdrun -nsteps 50000 -ntomp 4 -v -deffnm 1ns -c 1ns.pdb -nice 0<br />
else<br />
echo &quot;Using dmtcp_restart to continue from a checkpoint&quot;<br />
dmtcp_restart ckpt/*.dmtcp<br />
fi<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=401AdvancedSlurm2018-09-14T03:50:47Z<p>Daveturner: /* Running jobs interactively */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=gpu:nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=gpu:nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Checkpoint/Restart using DMTCP ==<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=399AdvancedSlurm2018-09-10T22:29:17Z<p>Daveturner: /* CUDA */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=nvidia_geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the &lt;tt&gt;--partition=ksu-gen-gpu.q&lt;/tt&gt; group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=398AdvancedSlurm2018-09-10T22:26:39Z<p>Daveturner: /* CUDA */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. 'kstat -g' will show you the GPU nodes and the jobs running on them. To request a GPU node, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; for example to request 1 GPU for your job. You can also request a given type of GPU (kstat -g -l to show types) by using &lt;tt&gt;--gres=nvidia-geforce_gtx_1080_ti:1&lt;/tt&gt; for a 1080Ti GPU on the Wizards or Dwarves, &lt;tt&gt;--gres=nvidia_quadro_gp100:1&lt;/tt&gt; for the P100 GPUs on Wizard20-21 that are best for 64-bit codes like Vasp, or &lt;tt&gt;--gres=nvidia_geforce_gtx_980_ti:1&lt;/tt&gt; for the older 980Ti GPUs on Dwarf38-39. Most of these GPU nodes are owned by various groups. If you want access to GPU nodes and your group does not own any, we can add you to the ksu-gen-gpu.q group that has priority on Dwarf38-39.<br />
<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
''Intra''node jobs run on many cores in the same node. These jobs can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or any programming language that has the concept of ''threads''. Often, your program will need to know how many cores you want it to use, and many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--nodes=1 --cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
''Inter''node jobs can utilize many cores on one or more nodes. Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--nodes=''n'' --ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--nodes=10 --ntasks=100&lt;/tt&gt; will give you a total of 100 cores across 10 nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=avx&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=avx2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## There is one strict rule for guaranteeing Slurm reads all of your options:<br />
## Do not put *any* lines above your resource requests that aren't either:<br />
## 1) blank. (no other characters)<br />
## 2) comments (lines must begin with '#')<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share<br />
chgrp ksu-cis-hpc share<br />
chmod g+rx share<br />
chmod o-rwx share<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below.<br />
<br />
groups<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job (using '''scancel ''jobid''''') and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Training_Videos&diff=397Training Videos2018-09-10T22:11:02Z<p>Daveturner: /* Beocat Class 2013/05/28 */</p>
<hr />
<div>== Video Demonstrations ==<br />
These demonstrations are freely available to help anyone who needs assistance with Beocat. More will be added as the need/topics arrise. Most of these videos will be best viewed a 720p or greater resolution.<br />
=== What is Beocat/Gaining access to Beocat ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=-hgfes7KGxU<br />
}}<br />
=== Logging in and Transferring Files ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=pNlE5GkYaZY<br />
}}<br />
=== Globus Connect ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=SFH23DZiE-U<br />
}}<br />
=== Installing files in your Home Directory ===<br />
&lt;HTML5video width=&quot;800&quot; height=&quot;600&quot; autoplay=&quot;false&quot;&gt;beocat_homedir_install&lt;/HTML5video&gt;<br />
== Beocat Class 2018/02/5-8 ==<br />
This is the first training session utilizing our new scheduler and node environment (Slurm and CentOS)<br />
=== Day One -- Beginners ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=rM6Yi1zrhvk}}<br />
=== Day Two -- Advanced ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=u5m730yMGXk}}<br />
== All Beocat Videos ==<br />
These are available on our [http://www.youtube.com/channel/UCfRY7ZCiAf-EzEqJXEXcPrw YouTube Channel]</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Training_Videos&diff=396Training Videos2018-09-10T22:10:36Z<p>Daveturner: /* Beocat Class 2014/09/30 */</p>
<hr />
<div>== Video Demonstrations ==<br />
These demonstrations are freely available to help anyone who needs assistance with Beocat. More will be added as the need/topics arrise. Most of these videos will be best viewed a 720p or greater resolution.<br />
=== What is Beocat/Gaining access to Beocat ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=-hgfes7KGxU<br />
}}<br />
=== Logging in and Transferring Files ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=pNlE5GkYaZY<br />
}}<br />
=== Globus Connect ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=SFH23DZiE-U<br />
}}<br />
=== Installing files in your Home Directory ===<br />
&lt;HTML5video width=&quot;800&quot; height=&quot;600&quot; autoplay=&quot;false&quot;&gt;beocat_homedir_install&lt;/HTML5video&gt;<br />
== Beocat Class 2018/02/5-8 ==<br />
This is the first training session utilizing our new scheduler and node environment (Slurm and CentOS)<br />
=== Day One -- Beginners ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=rM6Yi1zrhvk}}<br />
=== Day Two -- Advanced ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=u5m730yMGXk}}<br />
== Beocat Class 2013/05/28 ==<br />
We switched our scheduler and linux versions from SGE to Slurm and Gentoo to CentOS at the end of 2017. These are out of date.<br />
=== Overview of Linux and HPC ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=xcTy5jPza0M<br />
}}<br />
=== HPC/Parallel Programming ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=V62tHhGlx1g<br />
}}<br />
=== Beocat-specific tools ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=CsuvnyIgT4A<br />
}}<br />
=== Advanced tools (a.k.a. Getting the most out of Beocat) and Hadoop ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=svCJKx2LE38<br />
}}<br />
== All Beocat Videos ==<br />
These are available on our [http://www.youtube.com/channel/UCfRY7ZCiAf-EzEqJXEXcPrw YouTube Channel]</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Training_Videos&diff=395Training Videos2018-09-10T22:10:00Z<p>Daveturner: /* Beocat Class 2016/02/22-24 */</p>
<hr />
<div>== Video Demonstrations ==<br />
These demonstrations are freely available to help anyone who needs assistance with Beocat. More will be added as the need/topics arrise. Most of these videos will be best viewed a 720p or greater resolution.<br />
=== What is Beocat/Gaining access to Beocat ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=-hgfes7KGxU<br />
}}<br />
=== Logging in and Transferring Files ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=pNlE5GkYaZY<br />
}}<br />
=== Globus Connect ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=SFH23DZiE-U<br />
}}<br />
=== Installing files in your Home Directory ===<br />
&lt;HTML5video width=&quot;800&quot; height=&quot;600&quot; autoplay=&quot;false&quot;&gt;beocat_homedir_install&lt;/HTML5video&gt;<br />
== Beocat Class 2018/02/5-8 ==<br />
This is the first training session utilizing our new scheduler and node environment (Slurm and CentOS)<br />
=== Day One -- Beginners ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=rM6Yi1zrhvk}}<br />
=== Day Two -- Advanced ===<br />
{{#evt:service=youtube|id=https://www.youtube.com/watch?v=u5m730yMGXk}}<br />
== Beocat Class 2014/09/30 ==<br />
We switched our scheduler and linux versions from SGE to Slurm and Gentoo to CentOS at the end of 2017. These are out of date.<br />
=== Overview of Linux and HPC ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=K5Ui3fZu4EQ<br />
}}<br />
=== HPC/Parallel Programming ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=cTl65grbU1w<br />
}}<br />
=== Beocat-specific tools ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=kkfZEyO7RE4<br />
}}<br />
== Beocat Class 2013/05/28 ==<br />
We switched our scheduler and linux versions from SGE to Slurm and Gentoo to CentOS at the end of 2017. These are out of date.<br />
=== Overview of Linux and HPC ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=xcTy5jPza0M<br />
}}<br />
=== HPC/Parallel Programming ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=V62tHhGlx1g<br />
}}<br />
=== Beocat-specific tools ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=CsuvnyIgT4A<br />
}}<br />
=== Advanced tools (a.k.a. Getting the most out of Beocat) and Hadoop ===<br />
{{#evt:<br />
service=youtube<br />
|id=https://www.youtube.com/watch?v=svCJKx2LE38<br />
}}<br />
== All Beocat Videos ==<br />
These are available on our [http://www.youtube.com/channel/UCfRY7ZCiAf-EzEqJXEXcPrw YouTube Channel]</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=Installed_software&diff=394Installed software2018-09-10T21:41:28Z<p>Daveturner: /* MatLab compiler */</p>
<hr />
<div>== Drinking from the Firehose ==<br />
For a complete list of all installed modules, see [[ModuleList]]<br />
<br />
== Toolchains ==<br />
A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.<br />
<br />
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.<br />
<br />
These toolchains include (you can run 'module keyword toolchain'):<br />
; foss: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; gcccuda: GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.<br />
; gmvapich2: GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.<br />
; gompi: GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.<br />
; gompic: GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.<br />
; goolfc: GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.<br />
; iomkl: Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL &amp; OpenMPI.<br />
<br />
You can run 'module spider $toolchain' to see the versions we have:<br />
$ module spider iomkl<br />
* iomkl/2017a<br />
* iomkl/2017b<br />
* iomkl/2017beocatb<br />
<br />
If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':<br />
$ module list<br />
Currently Loaded Modules:<br />
1) icc/2017.4.196-GCC-6.4.0-2.28<br />
2) binutils/2.28-GCCcore-6.4.0<br />
3) ifort/2017.4.196-GCC-6.4.0-2.28<br />
4) iccifort/2017.4.196-GCC-6.4.0-2.28<br />
5) GCCcore/6.4.0<br />
6) numactl/2.0.11-GCCcore-6.4.0<br />
7) hwloc/1.11.7-GCCcore-6.4.0<br />
8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
9) iompi/2017b<br />
10) imkl/2017.3.196-iompi-2017b<br />
11) iomkl/2017b<br />
<br />
As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.<br />
<br />
With software we provide, the toolchain used to compile is always specified in the &quot;version&quot; of the software that you want to load.<br />
<br />
If you mix toolchains, inconsistent things may happen.<br />
== Most Commonly Used Software ==<br />
=== [http://www.open-mpi.org/ OpenMPI] ===<br />
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':<br />
<br />
* OpenMPI/2.0.2-GCC-6.3.0-2.27<br />
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27<br />
* OpenMPI/2.1.1-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-GCC-7.2.0-2.29<br />
* OpenMPI/2.1.1-gcccuda-2017b<br />
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28<br />
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29<br />
<br />
=== [http://www.r-project.org/ R] ===<br />
We currently provide (module -r spider '^R$'):<br />
* R/3.4.0-foss-2017beocatb-X11-20170314<br />
<br />
==== Packages ====<br />
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.<br />
<br />
==== Installing your own R Packages ====<br />
To install your own module, login to Beocat and start R interactively<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
module load R<br />
R<br />
&lt;/syntaxhighlight&gt;<br />
Then install the package using<br />
&lt;syntaxhighlight lang=&quot;rsplus&quot;&gt;<br />
install.packages(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as &quot;USA (KS)&quot;.<br />
<br />
After installing you can test before leaving interactive mode by issuing the command<br />
&lt;syntaxhighlight lang=&quot;rsplus&quot;&gt;<br />
library(&quot;PACKAGENAME&quot;)<br />
&lt;/syntaxhighlight&gt;<br />
==== Running R Jobs ====<br />
<br />
You cannot submit an R script directly. '&lt;tt&gt;sbatch myscript.R&lt;/tt&gt;' will result in an error. Instead, you need to make a bash [[AdvancedSGE#Running_from_a_qsub_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
<br />
# Now lets do some actual work. This starts R and loads the file myscript.R<br />
module load R<br />
R --no-save -q &lt; myscript.R<br />
&lt;/syntaxhighlight&gt;<br />
<br />
Now, to submit your R job, you would type<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
sbatch submit-R.sbatch<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.java.com/ Java] ===<br />
We currently provide (module spider Java):<br />
* Java/1.8.0_131<br />
* Java/1.8.0_144<br />
<br />
=== [http://www.python.org/about/ Python] ===<br />
We currently provide (module spider Python)<br />
* Python/2.7.13-foss-2017beocatb<br />
* Python/2.7.13-GCCcore-7.2.0-bare<br />
* Python/2.7.13-iomkl-2017a<br />
* Python/2.7.13-iomkl-2017beocatb<br />
* Python/3.6.3-foss-2017b<br />
* Python/3.6.3-foss-2017beocatb<br />
* Python/3.6.3-iomkl-2017beocatb<br />
<br />
If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.<br />
<br />
==== Setting up your virtual environment ====<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# Load Python<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
&lt;/syntaxhighlight&gt;<br />
(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)<br />
* Create a location for your virtual environments (optional, but helps keep things organized)<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
mkdir ~/virtualenvs<br />
cd ~/virtualenvs<br />
&lt;/syntaxhighlight&gt;<br />
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that &lt;code&gt;virtualenv --help&lt;/code&gt; has many more useful options.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
virtualenv test<br />
&lt;/syntaxhighlight&gt;<br />
* Lets look at our virtual environments<br />
&lt;pre&gt;<br />
% ls ~/virtualenvs<br />
test<br />
&lt;/pre&gt;<br />
* Activate one of these<br />
&lt;pre&gt;<br />
%source ~/virtualenvs/test/bin/activate<br />
&lt;/pre&gt;<br />
(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)<br />
* You can now install the python modules you want. This can be done using &lt;tt&gt;pip&lt;/tt&gt;.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
pip install numpy biopython<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==== Using your virtual environment within a job ====<br />
Here is a simple job script using the virtual environment testp2<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
module load Python/3.6.3-iomkl-2017beocatb<br />
source ~/virtualenvs/test/bin/activate<br />
export PYTHONDONTWRITEBYTECODE=1<br />
python ~/path/to/your/python/script.py<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== [http://www.perl.org/ Perl] ===<br />
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.<br />
<br />
If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):<br />
* Perl/5.26.0-foss-2017beocatb<br />
* Perl/5.26.0-iompi-2017beocatb<br />
<br />
==== Submitting a job with Perl ====<br />
Much like R (above), you cannot simply '&lt;tt&gt;sbatch myProgram.pl&lt;/tt&gt;', but you must create a [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|submit script]] which will call perl. Here is an example:<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --mem-per-cpu=1G<br />
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)<br />
#SBATCH --time=0-0:15:00<br />
# Now lets do some actual work. <br />
module load Perl<br />
perl /path/to/myProgram.pl<br />
&lt;/syntaxhighlight&gt;<br />
<br />
=== Octave for MatLab codes ===<br />
<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
The 64-bit version of Octave can be loaded using the command above. Octave can then be used<br />
to work with MatLab codes on the head node and to submit jobs to the compute nodes through the<br />
sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support<br />
everything that MatLab itself does.<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=octave<br />
#SBATCH --output=octave.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load Octave/4.2.1-foss-2017beocatb-enable64<br />
<br />
octave &lt; matlab_code.m<br />
<br />
=== MatLab compiler ===<br />
<br />
Beocat also has a &lt;B&gt;single-user license&lt;/B&gt; for the MatLab compiler and the most common toolboxes<br />
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,<br />
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox, <br />
Global Optimization Toolbox, and the Bioinformatics Toolbox.<br />
<br />
Since we only have a &lt;B&gt;single-user license&lt;/B&gt;, this means that you will be expected to develop your MatLab code<br />
with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you<br />
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as<br />
you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and<br />
load the mcr module to run the resulting MatLab executable.<br />
<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -o matlab_executable_name<br />
<br />
If you have addpath() commands in your code, you will need to wrap them in an &quot;if ~deployed&quot; block and tell the<br />
compiler to include that path via the -I flag.<br />
<br />
% wrap addpath() calls like so:<br />
if ~deployed<br />
addpath('./another/folder/with/code/')<br />
end<br />
<br />
NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code<br />
you unfortunately may need to wait for up to 30 minutes to compile your own code.<br />
<br />
Compiling with additional paths:<br />
<br />
module load MATLAB<br />
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name<br />
<br />
Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You<br />
can have multiple -I arguments in your compile command.<br />
<br />
Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:<br />
<br />
#!/bin/bash -l<br />
#SBATCH --job-name=matlab<br />
#SBATCH --output=matlab.o%j<br />
#SBATCH --time=1:00:00<br />
#SBATCH --mem=4G<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=1<br />
<br />
module purge<br />
module load mcr<br />
<br />
./matlab_executable_name<br />
<br />
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these<br />
files to the compiled archive via the -a flag. See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation]. You can either target specific .mex files or entire directories.<br />
<br />
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,<br />
we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an<br />
abbreviated example from a current user:<br />
<br />
#!/bin/bash -l<br />
<br />
module load MATLAB<br />
<br />
cd matlabPyrTools/MEX/<br />
<br />
# compile mex files<br />
mex upConv.c convolve.c wrap.c edges.c<br />
mex corrDn.c convolve.c wrap.c edges.c<br />
mex histo.c<br />
mex innerProd.c<br />
<br />
cd ../..<br />
<br />
mcc -m mongrel_creation.m \<br />
-I ./matlabPyrTools/MEX/ \<br />
-I ./matlabPyrTools/ \<br />
-I ./FastICA/ \<br />
-a ./matlabPyrTools/MEX/ \<br />
-a ./texturesynth/ \<br />
-o mongrel_creation_binary<br />
<br />
Again, we only have a &lt;B&gt;single-user license&lt;/B&gt; for MatLab so the model is to develop and debug your MatLab code<br />
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without<br />
limits on Beocat. <br />
<br />
For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html<br />
<br />
=== COMSOL ===<br />
Beocat has no license for COMSOL. If you want to use it, you must provide your own.<br />
<br />
module spider COMSOL<br />
----------------------------------------------------------------------------<br />
COMSOL: COMSOL/5.3<br />
----------------------------------------------------------------------------<br />
Description:<br />
COMSOL Multiphysics software, an interactive environment for modeling<br />
and simulating scientific and engineering problems<br />
<br />
This module can be loaded directly: module load COMSOL/5.3<br />
<br />
Help:<br />
<br />
Description<br />
===========<br />
COMSOL Multiphysics software, an interactive environment for modeling and <br />
simulating scientific and engineering problems<br />
You must provide your own license.<br />
export LM_LICENSE_FILE=/the/path/to/your/license/file<br />
*OR*<br />
export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME<br />
e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu<br />
<br />
More information<br />
================<br />
- Homepage: https://www.comsol.com/<br />
==== Graphical COMSOL ====<br />
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following (in two sessions):<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# session one<br />
ssh headnode.beocat.ksu.edu<br />
srun --time=24:00:00 --mem-per-cpu=1G /bin/sleep 86400<br />
<br />
# session two<br />
ssh -Y headnode.beocat.ksu.edu<br />
ssh -Y ${host where job is running}<br />
module load COMSOL<br />
# export your comsol license as mentioned above<br />
comsol -3drend sw<br />
&lt;/syntaxhighlight&gt;<br />
== Installing my own software ==<br />
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.<br />
<br />
In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.<br />
<br />
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=368AdvancedSlurm2018-03-01T19:29:59Z<p>Daveturner: /* Openly sharing files on the web */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. We have a very small number of nodes which have GPUs installed. To request one of these gpus on of of these nodes, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; to your sbatch command-line.<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
Intranode jobs which run on many cores in the same node are easier to code and can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or Java's threads. Many times, your program will need to know how many cores you want it to use. Many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--ntasks=100&lt;/tt&gt; will give you 100 cores on any number of nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=AVX&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=AVX2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share&lt;BR&gt;<br />
chgrp ksu-cis-hpc share&lt;BR&gt;<br />
chmod g+rx share&lt;BR&gt;<br />
chmod o-rwx share&lt;BR&gt;<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below replacing your user name for mine.<br />
<br />
getent group | grep -v _users | grep -v _owners | grep daveturner | cut -d ':' -f 1<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
cd&lt;BR&gt;<br />
mkdir public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~your_user_name<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=367AdvancedSlurm2018-03-01T19:27:44Z<p>Daveturner: /* Sharing files within your group */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. We have a very small number of nodes which have GPUs installed. To request one of these gpus on of of these nodes, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; to your sbatch command-line.<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
Intranode jobs which run on many cores in the same node are easier to code and can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or Java's threads. Many times, your program will need to know how many cores you want it to use. Many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--ntasks=100&lt;/tt&gt; will give you 100 cores on any number of nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=AVX&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=AVX2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share&lt;BR&gt;<br />
chgrp ksu-cis-hpc share&lt;BR&gt;<br />
chmod g+rx share&lt;BR&gt;<br />
chmod o-rwx share&lt;BR&gt;<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below replacing your user name for mine.<br />
<br />
getent group | grep -v _users | grep -v _owners | grep daveturner | cut -d ':' -f 1<br />
<br />
If your group does not own any nodes, you can still request a group name and manage the participants yourself.<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
mkdir /homes/daveturner/public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~daveturner<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''<br />
<br />
As it is unsupported, this is an excercise left to the reader. A starting point is &lt;tt&gt;man scontrol&lt;/tt&gt;<br />
== Killable jobs ==<br />
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as &quot;killable.&quot; If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.<br />
<br />
The way you would designate your job as killable is to add &lt;tt&gt;-p killable.q&lt;/tt&gt; to the '''&lt;tt&gt;sbatch&lt;/tt&gt; or &lt;tt&gt;srun&lt;/tt&gt;''' arguments. This could be either on the command-line or in your script file.<br />
<br />
''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to &lt;tt&gt;qdel&lt;/tt&gt; the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.<br />
<br />
== Scheduling Priority ==<br />
Some users are members of projects that have contributed to Beocat. When those users have contributed nodes, you will need to include your project's &quot;partition&quot; in your job submission to be able to use those nodes.<br />
<br />
To determine the partitions you have access to, run &lt;tt&gt;sinfo -hso '%P'&lt;/tt&gt;<br />
That will return a list that looks something like this:<br />
killable.q<br />
batch.q<br />
some-other-partition.q<br />
<br />
You can then alter your &lt;tt&gt;#SBATCH&lt;/tt&gt; lines to include your new partition:<br />
#SBATCH --partition=some-other-partition.q,batch.q<br />
or<br />
#SBATCH --partition=some-other-partition.q,batch.q,killable.q<br />
You can include 'killable.q' if you would like, reasons for doing so are available [[AdvancedSlurm#Killable_jobs|here]]<br />
<br />
== Job Accounting ==<br />
Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.<br />
=== sacct ===<br />
This data can usually be used to diagnose two very common job failures.<br />
==== Job debugging ====<br />
It is simplest if you know the job number of the job you are trying to get information on.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
# if you know the jobid, put it here:<br />
sacct -j 1122334455 -l<br />
# if you don't know the job id, you can look at your jobs started since some day:<br />
sacct -S 2017-01-01<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===== My job didn't do anything when it ran! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|218||218||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||12||00:00:00||FAILED||2:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=12,mem=1G,node=1||cpu=12,mem=1G,node=1<br />
|-<br />
|218.batch||218.batch||batch||||137940K||dwarf37||0||137940K||1576K||dwarf37||0||1576K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||1.36G||0||0||0||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
|-<br />
|218.0||218.0||qqqqstat||||204212K||dwarf37||0||204212K||1420K||dwarf37||0||1420K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||12||00:00:00||FAILED||2:0||196.52M||Unknown||Unknown||Unknown||1Gn||0||0||dwarf37||65534||0||0.00M||dwarf37||0||0.00M||||||||cpu=12,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.<br />
<br />
===== My job ran but didn't finish! =====<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|220||220||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:01:27||TIMEOUT||0:0||||Unknown||Unknown||Unknown||1Gn||||||||||||||||||||||||cpu=1,mem=1G,node=1||cpu=1,mem=1G,node=1<br />
|-<br />
|220.batch||220.batch||batch||||370716K||dwarf37||0||370716K||7060K||dwarf37||0||7060K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:28||CANCELLED||0:15||1.23G||0||0||0||1Gn||0||0.16M||dwarf37||0||0.16M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
|-<br />
|220.0||220.0||sleep||||204212K||dwarf37||0||107916K||1000K||dwarf37||0||620K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:01:27||CANCELLED||0:15||1.54G||Unknown||Unknown||Unknown||1Gn||0||0.05M||dwarf37||0||0.05M||0.00M||dwarf37||0||0.00M||||||||cpu=1,mem=1G,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).<br />
{{Scrolling table/top}}<br />
{{Scrolling table/mid}}<br />
!JobID!!JobIDRaw!!JobName!!Partition!!MaxVMSize!!MaxVMSizeNode!!MaxVMSizeTask!!AveVMSize!!MaxRSS!!MaxRSSNode!!MaxRSSTask!!AveRSS!!MaxPages!!MaxPagesNode!!MaxPagesTask!!AvePages!!MinCPU!!MinCPUNode!!MinCPUTask!!AveCPU!!NTasks!!AllocCPUS!!Elapsed!!State!!ExitCode!!AveCPUFreq!!ReqCPUFreqMin!!ReqCPUFreqMax!!ReqCPUFreqGov!!ReqMem!!ConsumedEnergy!!MaxDiskRead!!MaxDiskReadNode!!MaxDiskReadTask!!AveDiskRead!!MaxDiskWrite!!MaxDiskWriteNode!!MaxDiskWriteTask!!AveDiskWrite!!AllocGRES!!ReqGRES!!ReqTRES!!AllocTRES<br />
|-<br />
|221||221||slurm_simple.sh||batch.q||||||||||||||||||||||||||||||||||||1||00:00:00||CANCELLED by 0||0:0||||Unknown||Unknown||Unknown||1Mn||||||||||||||||||||||||cpu=1,mem=1M,node=1||cpu=1,mem=1M,node=1<br />
|-<br />
|221.batch||221.batch||batch||||137940K||dwarf37||0||137940K||1144K||dwarf37||0||1144K||0||dwarf37||0||0||00:00:00||dwarf37||0||00:00:00||1||1||00:00:01||CANCELLED||0:15||2.62G||0||0||0||1Mn||0||0||dwarf37||65534||0||0||dwarf37||65534||0||||||||cpu=1,mem=1M,node=1<br />
{{Scrolling table/end}}<br />
If you look at the column showing State, we see it was &quot;CANCELLED by 0&quot;, then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column &quot;MaxRSS&quot; and we see that the memory granted was less than the memory we tried to use, thus the job was &quot;CANCELLED&quot;.</div>Daveturnerhttp://support.beocat.ksu.edu/BeocatDocs/index.php?title=AdvancedSlurm&diff=366AdvancedSlurm2018-03-01T19:25:03Z<p>Daveturner: /* Sharing files within your group */</p>
<hr />
<div>== Resource Requests ==<br />
Aside from the time, RAM, and CPU requirements listed on the [[SlurmBasics]] page, we have a couple other requestable resources:<br />
Valid gres options are:<br />
gpu[[:type]:count]<br />
fabric[[:type]:count]<br />
Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command<br />
&lt;tt&gt;srun --gres=help&lt;/tt&gt;<br />
=== Fabric ===<br />
We currently offer 3 &quot;fabrics&quot; as request-able resources in Slurm. The &quot;count&quot; specified is the line-rate (in Gigabits-per-second) of the connection on the node.<br />
==== Infiniband ====<br />
First of all, let me state that just because it sounds &quot;cool&quot; doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add &lt;tt&gt;--gres=fabric:ib:1&lt;/tt&gt; to your sbatch command-line.<br />
==== ROCE ====<br />
ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add &lt;tt&gt;--gres=fabric:roce:1&lt;/tt&gt; to your sbatch command-line.<br />
<br />
==== Ethernet ====<br />
Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: &lt;tt&gt;1, 10, 40, and 100&lt;/tt&gt;. To select nodes with 40Gbps and above, you could specify &lt;tt&gt;--gres=fabric:eth:40&lt;/tt&gt; on your sbatch command-line. Since ethernet is used to connect to the file server, this can be used to select nodes that have fast access for applications doing heavy IO. The Dwarves and Heroes have 40 Gbps ethernet and we measure single stream performance as high as 20 Gbps, but if your application<br />
requires heavy IO then you'd want to avoid the Moles which are connected to the file server with only 1 Gbps ethernet.<br />
<br />
=== CUDA ===<br />
[[CUDA]] is the resource required for GPU computing. We have a very small number of nodes which have GPUs installed. To request one of these gpus on of of these nodes, add &lt;tt&gt;--gres=gpu:1&lt;/tt&gt; to your sbatch command-line.<br />
== Parallel Jobs ==<br />
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.<br />
=== Intranode jobs ===<br />
Intranode jobs which run on many cores in the same node are easier to code and can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or Java's threads. Many times, your program will need to know how many cores you want it to use. Many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '&lt;tt&gt;--cpus-per-task=n&lt;/tt&gt;' or '&lt;tt&gt;--nodes=1 --ntasks-per-node=n&lt;/tt&gt;', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.<br />
<br />
=== Internode (MPI) jobs ===<br />
Communicating between nodes is trickier than talking between cores on the same node. The specification for doing so is called &quot;[[wikipedia:Message_Passing_Interface|Message Passing Interface]]&quot;, or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI, but MPI also allows an application to run on multiple cores within a node. You can tell if you have an MPI-enabled program because its directions will tell you to run '&lt;tt&gt;mpirun ''program''&lt;/tt&gt;'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '&lt;tt&gt;--cpus-per-task=''n''&lt;/tt&gt;', you would use '&lt;tt&gt;--nodes=''n'' --tasks-per-node=''m''&lt;/tt&gt;' ''or'' '&lt;tt&gt;--ntasks=''o''&lt;/tt&gt;' for your sbatch request, where ''n'' is the number of nodes you want, ''m'' is the number of cores per node you need, and ''o'' is the total number of cores you need.<br />
<br />
Some quick examples:<br />
<br />
&lt;tt&gt;--nodes=6 --ntasks-per-node=4&lt;/tt&gt; will give you 4 cores on each of 6 nodes for a total of 24 cores.<br />
<br />
&lt;tt&gt;--ntasks=40&lt;/tt&gt; will give you 40 cores spread across any number of nodes.<br />
<br />
&lt;tt&gt;--ntasks=100&lt;/tt&gt; will give you 100 cores on any number of nodes.<br />
<br />
== Requesting memory for multi-core jobs ==<br />
Memory requests are easiest when they are specified '''per core'''. For instance, if you specified the following: '&lt;tt&gt;--tasks=20 --mem-per-core=20G&lt;/tt&gt;', your job would have access to 400GB of memory total.<br />
== Other Handy Slurm Features ==<br />
=== Email status changes ===<br />
One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: &lt;tt&gt;--mail-user&lt;/tt&gt; and &lt;tt&gt;--mail-type&lt;/tt&gt;.<br />
==== --mail-type ====<br />
&lt;tt&gt;--mail-type&lt;/tt&gt; is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following<br />
{| class=&quot;wikitable&quot;<br />
!Option!!Explanation<br />
|-<br />
| NONE || This disables event-based mail<br />
|-<br />
| BEGIN || Sends a notification when the job begins<br />
|-<br />
| END || Sends a notification when the job ends<br />
|-<br />
| FAIL || Sends a notification when the job fails.<br />
|-<br />
| REQUEUE || Sends a notification if the job is put back into the queue from a running state<br />
|-<br />
| STAGE_OUT || Burst buffer stage out and teardown completed<br />
|-<br />
| ALL || Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT<br />
|-<br />
| TIME_LIMIT || Notifies if the job ran out of time<br />
|-<br />
| TIME_LIMIT_90 || Notifies when the job has used 90% of its allocated time<br />
|-<br />
| TIME_LIMIT_80 || Notifies when the job has used 80% of its allocated time<br />
|-<br />
| TIME_LIMIT_50 || Notifies when the job has used 50% of its allocated time<br />
|-<br />
| ARRAY_TASKS || Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job<br />
|}<br />
<br />
==== --mail-user ====<br />
&lt;tt&gt;--mail-user&lt;/tt&gt; is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the [https://acount.beocat.ksu.edu/user Account Request Page]. It is specified with the following arguments to sbatch: &lt;tt&gt;--mail-user=someone@somecompany.com&lt;/tt&gt;<br />
<br />
=== Job Naming ===<br />
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '&lt;tt&gt;-J ''JobName''&lt;/tt&gt;' sbatch directive.<br />
<br />
=== Separating Output Streams ===<br />
Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '&lt;tt&gt;--output&lt;/tt&gt;' and '&lt;tt&gt;--error&lt;/tt&gt;'.<br />
<br />
{| class=&quot;wikitable&quot;<br />
! option !! default !! example<br />
|-<br />
| --output || slurm-%j.out || slurm-206.out<br />
|-<br />
| --error || slurm-%j.out || slurm-206.out<br />
|}<br />
&lt;tt&gt;%j&lt;/tt&gt; above indicates that it should be replaced with the job id.<br />
<br />
=== Running from the Current Directory ===<br />
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '&lt;tt&gt;-cwd&lt;/tt&gt;' directive to change to the &quot;current working directory&quot; you used when submitting the job.<br />
=== Running in a specific class of machine ===<br />
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag &quot;--constraint=dwarves&quot; to select any of those machines.<br />
<br />
=== Processor Constraints ===<br />
Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.<br />
<br />
&lt;tt&gt;--contraint&lt;/tt&gt; tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have &quot;AVX&quot; processor extensions. To do that, you would specify &lt;tt&gt;--constraint=avx&lt;/tt&gt; on you ''&lt;tt&gt;sbatch&lt;/tt&gt;'' '''or''' ''&lt;tt&gt;srun&lt;/tt&gt;'' command lines.<br />
Using &lt;tt&gt;--constraint=AVX&lt;/tt&gt; will prohibit your job from running on the Mages while &lt;tt&gt;--contraint=AVX2&lt;/tt&gt; will eliminate the Elves as well as the Mages.<br />
<br />
=== Slurm Environment Variables ===<br />
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
CUDA_VISIBLE_DEVICES=NoDevFiles<br />
ENVIRONMENT=BATCH<br />
GPU_DEVICE_ORDINAL=NoDevFiles<br />
HOSTNAME=dwarf37<br />
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint<br />
SLURM_CLUSTER_NAME=beocat<br />
SLURM_CPUS_ON_NODE=1<br />
SLURM_DISTRIBUTION=cyclic<br />
SLURMD_NODENAME=dwarf37<br />
SLURM_GTIDS=0<br />
SLURM_JOB_CPUS_PER_NODE=1<br />
SLURM_JOB_GID=163587<br />
SLURM_JOB_ID=202<br />
SLURM_JOBID=202<br />
SLURM_JOB_NAME=slurm_simple.sh<br />
SLURM_JOB_NODELIST=dwarf37<br />
SLURM_JOB_NUM_NODES=1<br />
SLURM_JOB_PARTITION=batch.q,killable.q<br />
SLURM_JOB_QOS=normal<br />
SLURM_JOB_UID=163587<br />
SLURM_JOB_USER=mozes<br />
SLURM_LAUNCH_NODE_IPADDR=10.5.16.37<br />
SLURM_LOCALID=0<br />
SLURM_MEM_PER_NODE=1024<br />
SLURM_NNODES=1<br />
SLURM_NODEID=0<br />
SLURM_NODELIST=dwarf37<br />
SLURM_NPROCS=1<br />
SLURM_NTASKS=1<br />
SLURM_PRIO_PROCESS=0<br />
SLURM_PROCID=0<br />
SLURM_SRUN_COMM_HOST=10.5.16.37<br />
SLURM_SRUN_COMM_PORT=37975<br />
SLURM_STEP_ID=0<br />
SLURM_STEPID=0<br />
SLURM_STEP_LAUNCHER_PORT=37975<br />
SLURM_STEP_NODELIST=dwarf37<br />
SLURM_STEP_NUM_NODES=1<br />
SLURM_STEP_NUM_TASKS=1<br />
SLURM_STEP_TASKS_PER_NODE=1<br />
SLURM_SUBMIT_DIR=/homes/mozes<br />
SLURM_SUBMIT_HOST=dwarf37<br />
SLURM_TASK_PID=23408<br />
SLURM_TASKS_PER_NODE=1<br />
SLURM_TOPOLOGY_ADDR=due1121-prod-core-40g-a1,due1121-prod-core-40g-c1.due1121-prod-sw-100g-a9.dwarf37<br />
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node<br />
SLURM_UMASK=0022<br />
SRUN_DEBUG=3<br />
TERM=screen-256color<br />
TMPDIR=/tmp<br />
USER=mozes<br />
&lt;/syntaxhighlight&gt;<br />
Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.<br />
<br />
Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.<br />
<br />
== Running from a sbatch Submit Script ==<br />
No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
<br />
## A Sample sbatch script created by Kyle Hutson<br />
##<br />
## Note: Usually a '#&quot; at the beginning of the line is ignored. However, in<br />
## the case of sbatch, lines beginning with #SBATCH are commands for sbatch<br />
## itself, so I have taken the convention here of starting *every* line with a<br />
## '#', just Delete the first one if you want to use that line, and then modify<br />
## it to your own purposes. The only exception here is the first line, which<br />
## *must* be #!/bin/bash (or another valid shell).<br />
<br />
## Specify the amount of RAM needed _per_core_. Default is 1G<br />
##SBATCH --mem-per-cpu=1G<br />
<br />
## Specify the maximum runtime in DD-HH:MM:SS form. Default is 1 hour (1:00:00)<br />
##SBATCH --time=1:00:00<br />
<br />
## Require the use of infiniband. If you don't know what this is, you probably<br />
## don't need it.<br />
##SBATCH --gres=fabric:ib:1<br />
<br />
## GPU directive. If You don't know what this is, you probably don't need it<br />
##SBATCH --gres:gpu:1<br />
<br />
## number of cores/nodes:<br />
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled<br />
## fairly quickly. If you need a job that requires more than that, you might<br />
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in<br />
## getting your job scheduled in a reasonable amount of time. Default is<br />
##SBATCH --cpus-per-task=1<br />
##SBATCH --cpus-per-task=12<br />
##SBATCH --nodes=2 --tasks-per-node=1<br />
##SBATCH --tasks=20<br />
<br />
## Constraints for this job. Maybe you need to run on the elves<br />
##SBATCH --constraint=elves<br />
## or perhaps you just need avx processor extensions<br />
##SBATCH --constraint=avx<br />
<br />
## Output file name. Default is slurm-%j.out where %j is the job id.<br />
##SBATCH --output=MyJobTitle.o%j<br />
<br />
## Split the errors into a seperate file. Default is the same as output<br />
##SBATCH --error=MyJobTitle.e%j<br />
<br />
## Name my job, to make it easier to find in the queue<br />
##SBATCH -J MyJobTitle<br />
<br />
## And finally, we run the job we came here to do.<br />
## $HOME/ProgramDir/ProgramName ProgramArguments<br />
<br />
## OR, for the case of MPI-capable jobs<br />
## mpirun $HOME/path/MpiJobName<br />
<br />
## Send email when certain criteria are met.<br />
## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to<br />
## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage<br />
## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent<br />
## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),<br />
## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send<br />
## emails for each array task). Multiple type values may be specified in a<br />
## comma separated list. Unless the ARRAY_TASKS option is specified, mail<br />
## notifications on job BEGIN, END and FAIL apply to a job array as a whole<br />
## rather than generating individual email messages for each task in the job<br />
## array.<br />
##SBATCH --mail-type=ALL<br />
<br />
## Email address to send the email to based on the above line.<br />
## Default is to send the mail to the e-mail address entered on the account<br />
## request form.<br />
##SBATCH --mail-user myemail@ksu.edu<br />
&lt;/syntaxhighlight&gt;<br />
<br />
== File Access ==<br />
Beocat has a variety of options for storing and accessing your files. <br />
Every user has a home directory for general use which is limited in size, has decent file access performance,<br />
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance<br />
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast<br />
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a<br />
RAM disk are the best options.<br />
<br />
===Home directory===<br />
<br />
Every user has a &lt;tt&gt;/homes/''username''&lt;/tt&gt; directory that they drop into when they log into Beocat. <br />
The home directory is for general use and provides decent performance for most file IO. <br />
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk<br />
directory, and there is a limit of 100,000 files in each subdirectory in your account.<br />
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.<br />
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally <br />
delete something it can be recovered.<br />
<br />
===Bulk directory===<br />
<br />
Each user also has a &lt;tt&gt;/bulk/''username''&lt;/tt&gt; directory where large files should be stored.<br />
File access is the same speed as for the home directories, and the same limit of 100,000 files<br />
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,<br />
but the files there will not be backed up. They are still redundantly stored so you don't need to<br />
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.<br />
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about<br />
purchasing some hard disks for archival storage.<br />
<br />
===Scratch file system===<br />
<br />
The /scratch file system will soon be using the Lustre software which is much faster than the<br />
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a<br />
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number<br />
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them<br />
during runs. Once runs are completed, any files that need to be kept should be moved to your home<br />
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than<br />
the home and bulk file systems in part because it does not redundantly store files by striping them<br />
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre<br />
we will post the difference in file access rates.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
mkdir /scratch/$USER<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===Local disk===<br />
<br />
If you are running on a single node, it may also be faster to access your files from the local disk<br />
on that node. Each job creates a subdirectory /tmp/job# where '#' is the job ID number on the<br />
local disk of each node the job uses. This can be accessed simply by writing to /tmp rather than<br />
needing to use /tmp/job#. <br />
<br />
You may need to copy files to<br />
local disk at the start of your script, or set the output directory for your application to point<br />
to a file on the local disk, then you'll need to copy any files you want off the local disk before<br />
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion<br />
of the job or when it aborts. When we get the scratch file system working with Lustre, it may<br />
end up being faster than accessing local disk so we will post the access rates for each. Use 'kstat -l -h'<br />
to see how much /tmp space is available on each node.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files to the tmp directory if needed<br />
cp $input_files /tmp<br />
<br />
# Make an 'out' directory to pass to the app if needed<br />
mkdir /tmp/out<br />
<br />
# Example of running an app and passing the tmp directory in/out<br />
app -input_directory /tmp -output_directory /tmp/out<br />
<br />
# Copy the 'out' directory back to the current working directory after the run<br />
cp -rp /tmp/out .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===RAM disk===<br />
<br />
If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the <br />
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request <br />
memory for your job. Below is an example of how to use the RAM disk.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Copy input files over if necessary<br />
cp $any_input_files /dev/shm/<br />
<br />
# Run the application, possibly giving it the path to the RAM disk to use for output files<br />
app -output_directory /dev/shm/<br />
<br />
# Copy files from the RAM disk to the current working directory and clean it up<br />
cp /dev/shm/* .<br />
&lt;/syntaxhighlight&gt;<br />
<br />
===When you leave KSU===<br />
<br />
If you are done with your account and leaving KSU, please clean up your directory, move any files<br />
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your<br />
account. The easiest way to move your files to your supervisor's account is for them to set up<br />
a subdirectory for you with the appropriate write permissions. The example below shows moving <br />
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will <br />
continue even if the window you are doing the move from gets disconnected.<br />
<br />
&lt;syntaxhighlight lang=bash&gt;<br />
# Supervisor:<br />
mkdir /bulk/$USER/$STUDENT_USERNAME<br />
chmod ugo+w /bulk/$USER/$STUDENT_USERNAME<br />
<br />
# Student:<br />
nohup mv /homes/$USER/data /bulk/$SUPERVISOR_USERNAME/$USER &amp;<br />
&lt;/syntaxhighlight&gt;<br />
<br />
==File Sharing==<br />
<br />
This section will cover methods of sharing files with other users within Beocat and on remote systems.<br />
<br />
===Securing your home directory===<br />
<br />
By default your home directory is accessible to other users on Beocat for reading but not writing. If you do not want others to have any<br />
access to files in your home directory, you can set the permissions to restrict access to just yourself.<br />
<br />
chmod go-rwx /homes/your_user_name<br />
<br />
This removes read, write, and execute permission to everyone but yourself. Be aware that it may make it more difficult for us to help you out when<br />
you run into problems.<br />
<br />
===Sharing files within your group===<br />
<br />
By default all your files and directories have a 'group' that is your user name followed by _users as 'ls -l' shows.<br />
In my case they have the group of daveturner_users.<br />
If your working group owns any nodes on Beocat, then you have a group name that can be used to securely share<br />
files with others within your group. Below is an example of creating a directory called 'share', changing the group<br />
to ksu-cis-hpc (my group is ksu-cis-hpc so I submit jobs to --partition=ksu-cis-hpc.q), then changing the permissions to restrict access to <br />
just that group.<br />
<br />
mkdir share&lt;BR&gt;<br />
chgrp ksu-cis-hpc share&lt;BR&gt;<br />
chmod g+rx share&lt;BR&gt;<br />
chmod o-rwx share&lt;BR&gt;<br />
<br />
This will give people in your group the ability to read files in the 'share' directory. If you also want<br />
them to be able to write or modify files in that directory then use 'chmod g+rwx' instead.<br />
<br />
If you want to know what groups you belong to use the line below replacing your user name for mine.<br />
<br />
getent group | grep -v _users | grep -v _owners | grep daveturner | cut -d ':' -f 1<br />
<br />
===Openly sharing files on the web===<br />
<br />
If you create a 'public_html' directory on your home directory, then any files put there will be shared <br />
openly on the web. There is no way to restrict who has access to those files.<br />
<br />
mkdir /homes/daveturner/public_html<br />
<br />
Then access the data from a web browser using the URL:<br />
<br />
http://people.beocat.ksu.edu/~daveturner<br />
<br />
This will show a list of the files you have in your public_html subdirectory.<br />
<br />
===Globus===<br />
<br />
Kyle will put some Globus stuff here<br />
<br />
== Array Jobs ==<br />
One of Slurm's useful options is the ability to run &quot;Array Jobs&quot;<br />
<br />
It can be used with the following option to sbatch.<br />
<br />
<br />
--array=n[-m[:s]]<br />
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm<br />
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be<br />
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option<br />
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.<br />
<br />
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.<br />
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each<br />
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.<br />
<br />
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The<br />
number of tasks in a array job is unlimited.<br />
<br />
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).<br />
<br />
slurm-%A_%a.out<br />
<br />
%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID<br />
<br />
=== Examples ===<br />
==== Change the Size of the Run ====<br />
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:<br />
<br />
I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.<br />
<br />
My original script looks like this:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
RUNSIZE=50<br />
#RUNSIZE=100<br />
#RUNSIZE=150<br />
#RUNSIZE=200<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.<br />
<br />
With Array Jobs the script can be written like so:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=50-200:50<br />
RUNSIZE=$SLURM_ARRAY_TASK_ID<br />
app1 $RUNSIZE dataset.txt<br />
&lt;/syntaxhighlight&gt;<br />
I then submit that job, and Slurm understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.<br />
<br />
==== Choosing a Dataset ====<br />
A slightly more complex use of Array Jobs is the following:<br />
<br />
I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.<br />
<br />
Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
DATASET=dataset.txt<br />
scriptnum=0<br />
while read LINE<br />
do<br />
echo &quot;app2 $LINE&quot; &gt; ${scriptnum}.sh<br />
sbatch ${scriptnum}.sh<br />
scriptnum=$(( $scriptnum + 1 ))<br />
done &lt; $DATASET<br />
&lt;/syntaxhighlight&gt;<br />
Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.<br />
<br />
&lt;syntaxhighlight lang=&quot;bash&quot;&gt;<br />
#!/bin/bash<br />
#SBATCH --array=1:5000<br />
app2 `sed -n &quot;${SLURM_ARRAY_TASK_ID}p&quot; dataset.txt`<br />
&lt;/syntaxhighlight&gt;<br />
This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.<br />
<br />
Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.<br />
<br />
To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.<br />
<br />
== Running jobs interactively ==<br />
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: &lt;tt&gt;--pty bash&lt;/tt&gt;. If no node is available with your resource requirements, srun will tell you something like the following:<br />
srun --pty bash<br />
srun: Force Terminated job 217<br />
srun: error: CPU count per node can not be satisfied<br />
srun: error: Unable to allocate resources: Requested node configuration is not available<br />
Note that, like sbatch, your interactive job will timeout after your allotted time has passed.<br />
<br />
== Connecting to an existing job ==<br />
You can connect to an existing job using &lt;B&gt;srun&lt;/B&gt; in the same way that the &lt;B&gt;MonitorNode&lt;/B&gt; command<br />
allowed us to in the old cluster. This is essentially like using ssh to get into the node where your job is running which<br />
can be very useful in allowing you to look at files in /tmp/job# or in running &lt;B&gt;htop&lt;/B&gt; to view the <br />
activity level for your job.<br />
<br />
srun --jobid=# --pty bash where '#' is the job ID number<br />
<br />
== Altering Job Requests ==<br />
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job and resubmit it with the right parameters. '''If your job do