File storage and I/O

There are 5 main user file spaces available on Genepool/Phoebe (soon to be 4 once House is retired on December 20, 2013).

$HOME

Your home directory on Genepool mounted across all NERSC systems. You should refer to this home directory as $HOME where ever possible. If you had an old home directory at JGI, on /house or on the netapps we recommend you refer to this old home directory as $OLD_HOME. You should not change the environment variable $HOME. For more details on quotas, dotfile initialization and back ups, please see our global homes page.

Projectb

projectb is a 2.7PB GPFS based file system for the JGI's active projects. There are two distinct user spaces in the projectb filesystem: projectb/sandbox and projectb/scratch. The projectb filesystem is available on genepool, hopper and carver. The projectb filesystem is only mounted on NERSC computer systems.

projectb Scratch

projectb Sandbox

Location

/global/projectb/scratch/<username>

/global/projectb/sandbox/<program>

Quota

20TB, 5M inodes by default; 40TB upon request

Defined by agreement with the JGI Management

Backups

Not backed up

Not backed up

File Purging

Files not accessed for 90 days are automatically deleted

Files are not automatically purged

projectb "Scratch" and "Sandbox" space is intended for staging and running JGI calculations on the NERSC systems, including genepool, hopper, and carver. On genepool, the projectb scratch space is the recommended filesystem for performing file IO during all your calculations. If you have access to the genepool resource, you should have space on projectb scratch. If you don't, please file a ticket at http://help.nersc.gov. The Sandbox areas were allocated by program. If you have questions about your program's space, please see your group lead.

DnA (Data n' Archive)

DnA is a 1PB GPFS based file system for the JGI's archive, shared databases and project directories.

DnA Projects

DnA Shared

DnA DM Archive

Location

/global/dna/projectdirs/

/global/dna/shared

/global/dna/dm_archive

Quota

5TB default

Defined by agreement with the JGI Management

Defined by agreement with the JGI Management

Backups

Daily, only for projectdirs with quota <= 5TB

Backed up by JAMO

Backed up by JAMO

Files are not automatically purged

Files are not automatically purged

Purge policy set by users of the JAMO system

Files are not automatically purged

The intention of the DnA "Project" and "Shared" space is to be a place for data that is needed by multiple people collaborating on a project to allow for easy access for data sharing. The "Project" space is owned and managed by the JGI. The "Shared" space is a collaborative effort between the JGI and NERSC.

The "DM Archive" is a data repository maintained by the JAMO system. Files are stored here when migrated using the JAMO system. The files can remain in this space for as long as the user specifies. Any file that is in the "DM Archive" has also been placed in the HPSS tape archive. This section of the file system is owned by the JGI data management team.

$SCRATCH

Each user has a "scratch" directory. Scratch directories are NOT backed up and file are purged if they have not been accessed for 90 days. Access your scratch directory with the environment variable "$SCRATCH" for example:

cd $SCRATCH

Scratch environment variables:

Environment Variable

Value

NERSC Systems

$SCRATCH

Best-connected file system

All NERSC computational systems

$BSCRATCH

/global/projectb/scratch/<username>

genepool, hopper, carver

$GSCRATCH

/global/scratch/sd/<username>

All NERSC computational systems

$GSCRATCH points to your Global scratch space, and $BSCRATCH points to your projectb scratch space if you have a BSCRATCH allocation. $SCRATCH will always point to the best-connected scratch space available for the NERSC machine you are accessing. For example, on genepool $SCRATCH will point to $BSCRATCH, whereas on carver $SCRATCH will point to $GSCRATCH.

The intention of scratch space is for staging, running, and completing your calculations on NERSC systems. Thus these filesystems are designed to allow wide-scale file reading and writing from many compute nodes. The scratch filesystems are not intended for long-term file storage or archival, and thus data is not backed-up, and files not accessed for 90 days will be automatically purged.

Other file systems

Other file systems are also be mounted on Genepool:

SeqFS - file system used exclusively by the Illumina sequencers, SDM and Instrumentation groups at the JGI.

/usr/common (/global/common/genepool) - is a file system where NERSC staff build software for user applications. This is the principal site for the modular software installations.

/global/scratch - is a GPFS based file system that is accessible on almost all of NERSC's other compute systems used by all the other NERSC users. The scratch/sandbox portions of projectb should be favored by JGI users instead of /global/scratch.

/global/project - is a GPFS based file system that is accessible on almost all of NERSC's other compute systems used by all the other NERSC users. The projectdir portion of projectb should be favored by JGI users instead of /global/project.