Revision as of 19:07, 13 April 2016

In this lab you will learn how to use the program BEAST, and its companion BEAUTi, written by Alexei Drummond, Andrew Rambaut, Marc Suchard, Remco Bouckaert, and numerous other contributors. BEAST specializes in estimating divergence times under uncorrelated relaxed clock models, estimating species trees using a model that accounts for the independent coalescent history of each gene tree, and Bayesian skyline analyses that estimate population growth or decline through time. While BEAST itself is written in C and Java and has no user interface, the companion program BEAUTi has a nice graphical user interface that allows you to create an XML data file that is consulted by BEAST when it runs.

In this lab, you will use BEAST to estimate divergence times. Rather than analyze a real data set, you will simulated a data set in which you know the true values of all parameters, and then see how to obtain estimates of those parameters in BEAST.

Login to the UConn Bioinformatics cluster

Once you login, type

qlogin

to obtain a session on a currently unused node.

Download BEAUTi and BEAST

While we will be using BEAST on the cluster, you will need to download the software locally in order to get BEAUTi, which cannot be easily run remotely on the cluster because of its interactive GUI (Graphical User Interface).

Download the version of BEAST 2.4.0 from the BEAST web site. You will also need FigTree and Tracer (but you should already have those).

BEAST and BEAUti form a pair. BEAUti is a user-friendly, graphical Java application that can be used to create the data file that is then fed to BEAST, which is neither user-friendly nor graphical, but is nevertheless also written in Java. Thus, to do anything with BEAST, you will need a working Java runtime environment. You will actually use the cluster to run BEAST, but you will need to run BEAUti on your own laptop. If you have trouble getting Java running on your laptop, pair with another student until you have an XML file. At that point, you can use your own computer to interact with the cluster.

Analysis of simulated data

Download the file simyule.nex to your hard drive and import it into BEAUTi using the File > Import alignment menu command. An entry called simyule should appear in the Partitions section.

The tree from which these data were simulated was itself simulated using the Yule model. Here are some facts about the true tree:

Number of leaves (species)

20

Per-lineage speciation rate

10

Tree length (sum of all edge times)

1.77537

Tree height (length of path from root to any tip)

0.279743

log(Yule model probability density)

25.9954

The last line requires some explanation. Simulating a Yule tree requires choosing n-1 times (where n is the number of tips) and, at each step, deciding which of the current lineages will speciate after the chosen sojourn time. The probability density of a Yule tree thus involves the product of n-1 Exponential(lambda*k) densities and n-1 Discrete Uniform(k) probabilities, where k is the number of lineages at each step. The value reported in the table above is the log of this probability density evaluated using the times actually chosen for this particular tree.

Sequence data (10000 sites) was simulated using this Yule tree. Here are some facts about the true substitution model:

Model

HKY85

Gamma rate heterogeneity shape

0.5

Equilibrium frequency for A

0.3

Equilibrium frequency for C

0.2

Equilibrium frequency for G

0.2

Equilibrium frequency for T

0.3

Kappa (trs./trv. rate ratio)

4.0

Setting up the Site Model

We will begin by estimating the parameters of the substitution model that we know was used to generate these data.

Click on the Site Model tab in BEAUTi and set Gamma Category Count to 4, the model to HKY, and check estimate for Shape and Kappa. The Frequencies drop-down list should be set to Estimated, Proportion Invariant should remain fixed to 0.0, and Substitution Rate should be fixed at 1.0.

Setting up the Clock Model

Click on the Clock Model tab in BEAUTi. We will leave everything here at the default values (Strict Clock with Clock.rate fixed to 1). Again, we know the truth, and the truth was that all 10000 sites were generated under a strict clock, with expected number of substitutions equal to the Yule branch lengths (which is why we keep clock rate set at 1, which says that a branch length of 1 means 1 expected substitution per site).

Setting up the Priors

Click on the Priors tab in BEAUTi and set up the priors as follows.

Tree

Keep this set to Yule Model and be sure estimate is checked for Birth Diff Rate (but leave Conditional On Root unchecked)

birthRate

Set this to an Exponential distribution with mean 100 (do not check estimate)

gammaShape

Set this to an Exponential distribution with mean 100 (do not check estimate)

kappa

Set this to an Exponential distribution with mean 100 (do not check estimate)

proportionInvariant

Keep this set to a Uniform(0,1) distribution (do not check estimate)

ucldMean

Keep this set to an Exponential distribution with mean 100 (do not check estimate)

ucldStdev

Keep this set to an Exponential distribution with mean 100 (do not check estimate)

Setting up MCMC

Click on the MCMC tab in BEAUTi and set the Chain Length to 1000000, leave Store Every set to -1, and set Pre Burnin to 10000.

Note (but do NOT check) the box at the bottom labeled Sample From Prior

Saving the XML file

You are now ready to save the XML file that BEAST will use. To do this, choose File > Save As from the main menu and save the file as simyule.xml in a directory of your choosing. Upload this file to your home directory on the cluster.

Running BEAST on the cluster

I am assuming that you are now logged into the cluster and have issued a qlogin command to acquire a free node. (You should also be able to see your simyule.xml file in your home directory when you type the ls command.)

The version of BEAST that was installed on the cluster was a little out of date, so I installed the latest version (2.4.0 at the time I wrote this) in the directory /common/phylogenetics. To further complicate things, this version of BEAST requires a more recent version of Java than we had available on the cluster, so in that same directory is a freshly downloaded Java Runtime Environment. To make all this work, we have to set an environmental variable so that the default version of Java is not used when we run BEAST.

export PATH="/common/phylogenetics/jre1.8.0_77/bin:$PATH"

This command puts the latest Java Runtime Environment's bin directory at the head of the PATH, where the PATH is a list of directories that the system will search when looking for a program (e.g. java).