Google has some pretty amazing big data computational “hammers” that they have been applying to search and video data for a long time. In this workshop we take those same hammers and apply them to whole genome sequences.

This will create a virtual machine on Google Cloud Platform with a locked down network (only SSH port 22 open). Your local machine will securely connect to the VM via an ssh tunnel.

Within the docker container the directory /home/rstudio/data will correspond to directory /mnt/data on the virtual machine. This is where the persistent data disk is attached to the VM. Store important files there. Docker containers are stateless, so if the container restarts for any reason, then files you created within the container will be lost.

Bioconductor maintains Docker containers with R, Bioconductor packages, and RStudio Server all ready to go! Its a great way to set up your R environment quickly and start working. The instructions are below but if you want to learn more, see http://www.bioconductor.org/help/docker/.

If you prefer to setup R manually instead, click here to Show/Hide Instructions

# Install BiocInstaller.source("http://bioconductor.org/biocLite.R")# See http://www.bioconductor.org/developers/how-to/useDevel/useDevel()# Install devtools which is needed for the special use of biocLite() below.biocLite("devtools")# Install the workshop material.biocLite("googlegenomics/bioconductor-workshop-r",build_vignettes=TRUE,dependencies=TRUE)