Why do I get a warning about feupdateenv not being implimented?

You may see this warning when using the Intel compilers on Guillimin, particularly when compiling MPI code using mpi wrappers such as mpicc or mpif90:

/software/compilers/Intel/2013-5-13.1.3/lib/intel64/libimf.so: warning: warning: feupdateenv is not implemented and will always fail

Intel links MPI programs with it's math library libimf. This library does not implement a function from the C math library, feupdateenv, and so it produces this warning to inform users. The warning is mostly harmless and can usually be ignored. To be certain that there will be no problems, you may wish to link your programs against both the C math library and the Intel math library (this may not suppress the warning).

mpicc -lm -limf ...

Why is my job blocked?

More information about blocked jobs can be found by using the 'checkjob -v jobID' command. Often, there is a line near the bottom of the output with a 'BLOCK MSG':

The most common cause of blocked jobs is a violation of our MAXPS limit, indicating that your group has scheduled too many outstanding processor seconds at the same time. Descriptions of this and other scheduler policies (MAXIJOB, MAXPROC, etc.) are available on our Moab scheduling policies documentation page.

Jobs may also be blocked because of user-defined dependencies. In this case, there will not be a BLOCK MSG, but there will be a note with more information:

If your job is in a BatchHold state, there is a problem with the job submission that is causing the scheduler to repeatedly fail to schedule the job. Often, this is because the user has requested more processors per node than are available on the nodes. Please double check your submission parameters. If you think the job should run as it is, try using the 'releasehold jobID' command.

Please
This email address is being protected from spambots. You need JavaScript enabled to view it.
if you can't find an explanation for your blocked job in the checkjob -v output.

How to submit a job from a running job?

From a worker node, just use the qsub command. You should no longer connect to gm-schrmat:

qsub -A <RAPid> -q <queue_name> /path/to/script/<script_name>

Specifying a RAPid is always compulsory when using qsub on a worker node.

How to activate email notifications?

In your PBS script:

#PBS -M your_email_address
#PBS -m abe

You can enable any combination of messages you need (a, b, e, ab, ae, be, abe):

a: abort - when a job fails

b: begin - when a job starts

e: end - when a job ends

What does Exit_status=### mean?

Every job on our system returns an Exit_status code upon completion. These codes are listed in the PBS Epilogue information printed in your job's output file. This code can be used to identify possible problems that may have occurred. Exit_status=0 usually indicates a successful job. Here is a list of some of the most common exit codes and what they mean (bold indicates that the Exit_status is relatively frequent on Guillimin):

Negative error codes usually point to a failure of the scheduler or the nodes. For these errors, please contact us with the jobID (
This email address is being protected from spambots. You need JavaScript enabled to view it.
). Examples:

Exit_status

Description

-11

JOB_EXEC_RERUN: Job was rerun

-10

JOB_EXEC_FAILUID: Invalid UID/GID for job

-4

JOB_EXEC_INITABT : Job aborted on MOM initialization

-3

JOB_EXEC_RETRY: job execution failed, do retry

-2

JOB_EXEC_FAIL2 : Job exec failed, after files, no retry

-1

JOB_EXEC_FAIL1 : Job exec failed, before files, no retry

Exit codes between 0 and 127 indicate the exit code given by the last command in the job script. Examples:

0

Job Success!

1

General error

Exit codes between 128 and 173 indicate that the process ended due to receiving a signal. Examples:

Exit_status

Description

128

Invalid argument to exit()

131

SIGQUIT: ctrl-\, core dumped

132

SIGILL: Malformed, unknown, or priviledged instruction

133

SIGTRAP: Debugger breakpoint

134

SIGABRT: Process itself called abort

135

SIGBUS: Bus error (on Guillimin: often a file system issue)

136

SIGFPE: Bad arithmetic operation (e.g. division by zero)

137

SIGKILL (e.g. kill -9 command)

139

SIGSEGV: Segmentation Fault

143

SIGTERM (probably not canceljob or oom)

151

SIGURG: Urgent condition on socket

Exit codes between 174 and 253 indicate a "Fatal error signal".

Exit codes larger than 253:

254

Command invoked cannot execute

255

Command not found, possible path problem

265

SIGKILL (e.g. kill -9 command), possible out-of-memory error

271

SIGTERM (e.g. canceljob or oom), possible memory error

What is the best way to share files with my colleagues?

It is best to keep any shared files in your group's project space, and use your home directory only for personal files. Anyone who needs access to shared files in a group's project space should consider becoming a member of that group by applying for a new role in the CCDB portal. Once your new role is approved by the groups' principal investigator, you will have access to the group's project space and other resource allocations. It is ultimately the responsibility of each user and/or group to correctly set the permissions on their folders and files. However, we are happy to help if you have any questions.

We recommend using your home directory for personal, non-shared files. However, you may wish to have shared files which make use of our backup policy for home directories (by default we do not back up project spaces). In this case, you may choose to set up access control lists (ACLs) on your home directory or to change the permissions. We advise against making the permissions on your home directory too open, as you may unintentionally expose private information to other users.

How can I contact McGill HPC?

Our main email address :
This email address is being protected from spambots. You need JavaScript enabled to view it.

To contact a specific member of our staff, please visit our staff page.

What information should I include when requesting support?

The following pieces of information are useful for diagnosing problems:

JobID number of any jobs that experienced a problem

Full path to:

Submission script for the job

Output and error files

Error and warning messages observed, if any

If the problem occurred while working on a login node:

Login node host name (e.g. lg-1r17-n03)

Time and date of the problem

Detailed description of the problem

Expected behaviour of your job or task

Actual behaviour of your job or task

JavaScript is currently disabled.Please enable it for a better experience of Jumi.