Della and Tigress

The DELLA processing and TIGRESS data storage servers of the High Performance Computing center of Princeton are our analytical powerhouses and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through the IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. The PU website has more information on basic usage and tutorials if you are interested.

You should familiarize yourself with some basic Unix commands by doing a few tutorials. Here is also a nice website with a large number of linux commands.

Login

ssh netid@della.princeton.edu --- to secure login

If you are on wifi, you need to use VPN for secure access!! This makes it possible to ssh remotely from Small World Coffee!

slogin netid@della.princeton.edu --- to secure login

uname -a --- to learn about the server

passwd --- to change the default password you are given

logout (or control+D) --- to logout

Rules

Della is only to be used to execute code via a formal job submission program (qsub command)

You only have 1GB of space on your Della home directory

You have 500GB of SCRATCH space on your Della home directory

Tigress is for storage of all data! Write all output to this server, as well

Use qsub to_run.sh to run jobs on Della. Make sure your active part of the to_run.sh (being the ./batchAwk.sh) points to the right directories on Tigress that contain either more scripts or the data.

Usage

qsub --- to submit a script (e.g. jobs_to_run.sh) on Della which can point to a perl/python/R/shell scripts on Tigress that does the actual work

Job length: Initially estimate 2x the amount of time you think your job will take to complete. You can refine this value over time.

Test queue

1 hour limit

2 job maximum per user and NOT to be used for production mode

Short queue

24 hour limit

40 job maximum

Medium queue

72 hour limit

16 jobs maximum per user

432 total cores

qstat --- to check the job progress on Della

You can ssh into any node once you have the node ID from your qsub to check on the job status using traditional commands:

htop --- use to view real-time CPU usage

top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.

File permissions
File permissions may be the nagging factor keeping you from writing to an outfile, or changing the path of an executable. If you need additional help, please see this tutorial on permissions and talk to Dr. vonHoldt on any issues that come up for directory/file access. We want to prevent any raw data from being overwritten but yet not thwart your analysis.

R basics

Additionally, with high throughput genome sequence data, we often need modules that are implemented in R's Bioconductor. Here is a great website and course material from a short course on using R and Bioconductor

The -n flag to xargs specifies how many arguments at a time to supply to the given command. -n 1 tells xargs to supply 1 argument to the command. The command will be invoked repeatedly until all input is exhausted.

This means you can also use xargs for a command that needs two or more arguments.

For instance you could use this to supply read group information to the picard AddReadGroups command.