Data Collections

A community dataset space allows Bridges users from different grants to share data in a common space. Bridges hosts both public and private datasets, providing rapid access for individuals, collaborations and communities with appropriate protections.

Data collections are stored on pylon5, Bridges' persistent file system. The space they use counts toward the Bridges storage allocation for the grant hosting them.

If you would like to host a data collection on Bridges, let us know what you need by completing the Community Dataset Request form. If your data collection has security or compliance requirements, please contact compliance@psc.edu.

Publicly available datasets

Some data collections are available to anyone with a Bridges' account. They include:

Natural Languge Tool Kit Data

NLTK comes with many corpora, toy grammars, trained models, etc. A complete list of the available data is posted at: http://nltk.org/nltk_data/

Available on Bridges at /pylon5/datasets/community/nltk

MNIST

Dataset of handwritten digits used to train image processing systems.

Available on Bridges at /pylon5/datasets/community/mnist

Genomics Data

Several genomics datasets are publicly available.

BLAST

The BLAST databases can be accessed through the environment variable BLASTDB after loading the BLAST module.

RepBase

Repbase is the most commonly used database of repetitive DNA elements. You must register with RepBase at http://www.girinst.org and send proof of registration to genomics@psc.edu in order to use the Repbase database.

Other genomics datasets

Other available datasets are typically used with a particular genomics package. These include:

HPC Essentials

New on Bridges

Filesystems upgrades mean changes in usage for pylon5 and pylon2.Read more

Pittsburgh Supercomputing Center

PSC is a joint effort of Carnegie Mellon University and the University of Pittsburgh. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry and is a leading partner in XSEDE (Extreme Science and Engineering Discovery Environment), the National Science Foundation cyber-infrastructure program.