Amazon’s Web Services offers enormous potential for people who need to process, store, and share large amounts of data.

And it’s a huge boon for bioinformatics. It’s cost effective and it’s fasta. Hah. Get it? It’s “>fasta”. Archiving and sharing data has never been easier.

Here’s a quick tutorial on creating an Elastic Block Store volume that you can share with your colleagues.

1. Create a volume

From the AWS Management Console, click on the EC2 tab, then on “Elastic Block Store > Volumes”

Click on “Create Volume”.

Pick an appropriate size for your volume. For EBS volumes that I am going to use to store and archive data, I create a volume 1.5 times the size of the data. This lets me store an unpacked version and a packed version simultaneously, making it easy to update data at a later date.

Add some informative tags.

2. Attach the volume to an EC2 instance.

From the Volumes window in the Management Console, select the new volume, then right click and Select “Attach”. I attach devices starting at

3. Format the volume.

Once you’ve created and mounted a volume, you’ll need to attach it to an EC2 instance. Fire one up and SSH in.

ssh -i @yourdns.amazonaws.com
> sudo mkfs.ext3 /dev/sdf

Mount points are available at /dev/sdf through /dev/sdp.

4. Mount the volume

> sudo mkdir /mnt/data
> sudo mount -t ext3 /dev/sdf /mnt/data

If you are potentially going to be dealing with many versions of data overtime, you might want to version your mount points. This will allow you to attach multiple EBS volumes at different sensible directories:

About Todd Harris

Do you have a data management, analysis, or visualization problem you need some help with? Do you need to connect with the best people to build out your team of data scientists, bioinformaticians, or curators? Drop me a line -- I'd be happy to chat with you about your project.

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.