What is fairshare scheduling?

Fairshare scheduling is our way of making sure that every user in an LSF cluster gets his or her fair share of the CPU resources over time. If we used “first-come, first-served” scheduling instead, a user could submit 500 jobs at once and everybody else who submitted their jobs after that would have to wait until those 500 jobs had finished before their own jobs could run.

Each LSF user has a dynamic priority that is based on the number of shares assigned to that user (usually 1), the dynamic priority formula, and the amount of CPU time and run time used recently by all of that user’s jobs. As you use more resources, your dynamic priority decreases. As your jobs finish, your dynamic priority increases. Note that CPU time used recently is weighted more heavily that CPU time used in the past.

The higher your dynamic priority, the more likely it is that your pending job will be the next one to be run by LSF.

How do I tell LSF to put my job output in a file instead of sending it to me by email?

You can redirect your job’s standard output to a file using the “-o” option on the “bsub” command. When you do this, LSF appends to this file the job summary information you would normally receive in email after the completion of the job.

Tip: You can make the name of the output file unique by including %J somewhere in the name. LSF will substitute the JobID for %J when the job runs. If, for example, the JobID is 10000 then in the name of the output, the special character %J is replaced by 10000 and in the working directory the output file out.10000 will be created.

The following LSF job submission command is an example of using the “-o” option with %J:

bsub -q queue -R resource -o out.%J executable

How can I use the LSF job output to check the amount of memory my job is using?

Notice the “Max Memory” listed under the Resource usage summary section. This number indicates the maximum amount of memory used by your job. Note that for MPI jobs this number is summed across all processes so you will have to divide by the number of MPI processes (i.e. the value specified by the “-n” flag in your submission).

Is there a way to determine roughly how much memory a matrix in my program is using?

Suppose you are working with a 2700 by 30000 matrix and that you are using double precision (i.e. 8 bytes). Then your matrix would use

(8 bytes) * (2700*30000) / (2^30 bytes/GB) = 6.0 GB

Note. A gigabyte (GB) is 2^30 or 1073741824 (sometimes 1*10^9 is used) bytes. This is just a conversion factor to convert the size from bytes to gigabytes.

What is the purpose of the new LSF memory limit for the KillDevil cluster and how will it affect me?

The KillDevil compute cluster is a large shared resource employed by many researchers across campus. Increasingly we are seeing large memory jobs exhausting the memory on a node, disrupting the running of the cluster, and adversely impacting the jobs of other users. One of the goals of the KillDevil cluster is to enable the running of large memory jobs, however, in order to do this effectively without jobs disrupting one another we will have to implement memory limits.

Default memory limit:

The default memory limit, if you don’t specify one, is set to 4 GB. This means if any one process of your job exceeds 4 GB in its resident set size then the job will be terminated by LSF and you can expect to get a message such as the following in your job output:

TERM_LIMIT: job killed after reaching LSF memory usage limit.

Note. Most users’ jobs are not exceeding the 4 GB threshold and they will not have to change anything in their job submission command.

The following is an important exception to the 4 GB default memory limit:

If you indicate exclusive use of a node, this limit will be upped to 46 GB.

Changing the default memory limit:

To change the default memory limit (of 4 GB) submit your job using the “-M m” flag in your LSF job submission command (i.e. bsub command). Here m is an integer number in GB and indicates what you have specified for your memory requirements. For example, the following command

bsub –M 10 –q day …

reserves 10 GB of memory for you (and therefore will only start on a node that has at least 10 GB free). You can specify bsub options in any order. Please note that you can not use decimal numbers or units for the “-M m” flag.

Note. Do not specify memory requirements for your job in other ways. In particular, a resource string such as “rusage[mem=xx]” will be silently ignored. Always, use the –M flag to specify the memory limit in your LSF job submission command if you do not want to use the default memory limit.

Important tip:Do not unnecessarily set –M to a very large value. There are two reasons for this. The reason is that internally the memory you specified will also be used to schedule and reserve memory. If you ask for X GB then your job will pend until a node is found with X GB free. If you set this artificially high then your job will pend unnecessarily. Secondly, X GB of memory will be reserved for your job and thus if you do not need this much, the resources will not be available for other jobs to use which reduces the throughput for everyone.

Determining how much memory your job is using:

You can find out the maximum memory of your job by looking at the “Resource usage summary” section of the LSF output where it reports “Max Memory“. For an example of this, see above.

Note that for MPI jobs this number is summed across all processes so you will have to divide by the number of MPI processes (i.e. the value specified by the “-n” flag in your submission). We have noticed that the “Max Memory” value may not be reported for smaller memory jobs and for short running jobs and appears to be subject to some sampling error as well.