Researchers today have the ability to generate an incredible amount of biological data. Once you have this data, the next step is to refine it and analyze it for meaning. Whether you are developing your own algorithms or running common tools and workflows, you now have a large number of software packages to help you out.

Here we make a few recommendations for what technologies to consider. Your technology choice should be based on your own needs and experience. There is no “one size fits all” solution.

If your experience and preferred approach is to write Python (or Ruby or Perl) and orchestrate execution with a little bit of shell scripting, or you need to run some off the shelf command-line tools, then dsub may be the right answer for you. dsub is targeted towards computational biologists who often have experience with submitting tasks to a job scheduler (such as Grid Engine, Slurm, or LSF) after developing and testing algorithms on their workstation or laptop.

The Broad Institute has developed the Workflow Definition Language (WDL) and an associated runner called Cromwell. Together these have allowed the Broad to build, run at scale, and publish its best practices pipelines. If you want to run the Broad’s published GATK workflows or are interested in using the same technology stack, take a look at WDL and Cromwell.

Many computational biologists have experience running tasks on compute clusters using a job manager such as Grid Engine or Slurm. If you have existing tools that assume such an environment, then you can create a similar cluster on Google Cloud using Elasticluster.

If you want to develop brand new pipelines with the most sophisticated and scalable data processing infrastructure, then Apache Beam and Google Cloud Dataflow may be the right choice for you. Dataflow is a fully managed runner for Apache Beam.

To run tasks on Google Cloud, dsub uses the Google Genomics Pipelines API. For most usage, dsub is the recommended interface. The Pipelines API is best suited to developers wanting to build new job management tools (like dsub) or workflow systems (like Cromwell).