MCMCTREE is a phylogenetic program for Bayesian estimation of species divergence times using soft fossil constraints under various molecular clock models. This is part of the PAML package. In this tutorial I will analyze an easy example modified from dataset of Inoue et al. (2010). Here we conduct a commonly used time estimation method, "Approximate Likelihood Method", for the datasets including more than 10 species.

I am assuming that you have some basic knowledge of command line in Windows or Unix system in Linux/MacOS．

Make sure you are using the latest version of PAML. Now I am using 4.8a [Sept. 2015]．
Here is an explanation by slide.
Japanese language version is here.

The approximate likelihood method consists of 2 steps (3 steps including preparation). We conduct each analysis as different job.

0. Rough Estimation of Substitution Rate1. Estimation of Gradient and Hessian of Branch Lengths2. Estimation of Time and Rate by MCMC Analysis

Caution： For Windows, sometimes we need to change "attribute" of example directory by right click. Then unlock "read only"．

0. Rough Estimation of Substitution Rate

We will estimate overall substitution rate in this dataset. This estimates will be used in rgene_gamma setting in mcmctree.ctl later.

Move into the mcmctreeEasyExam/examples directory by command prompt (Win) or terminal (Mac).

InfilesOpen baseml.ctl by your editor．Sequence file (Vertebrate6.phy) and tree file (Vertebrate6PE.tre) names are written．GTR+G model (model = 7) will be used for rate estimation．Strict clock model (clock = 1) is used because of quick estimation．

Execution of BASEML[Windows]From command prompt, type,

> baseml baseml.ctl

[Mac]
From terminal, type,

> ./baseml baseml.ctl

If the control file is named as baseml.ctl, baseml read it automatically without specifying.

Result
Result will be saved in mlb file．You can find the following result in mlb file:

Substitution rate is per time unit
0.084825 +- 0.008316

The exact value would differ depending on the program version. This value will be used in mcmctree.ctl file.

1. Estimation of Gradient and Hessian of Branch Lengths

The branch lengths are estimated by maximum likelihood together with the gradient and Hessian (i.e. vector of first derivatives and matrix of second derivatives) of the likelihood function at the maximum likelihood estimates. The gradient and Hessian contain information about the curvature of the likelihood surface (dos Reis and Yang, 2013, P10)．

InfilesOpen mcmctree.ctl file by your editor and make sure the following line:

usedata = 3

This option is for the estimation of the maximum likelihood estimates (MLE) of branch lengths, gradient (g), and Hessian (H).

Execution of MCMCTREE

Type,

> mcmctree mcmctree.ctl

Result
The estimated gradient and Hessian will be saved in out.BV file.

2. Estimation of Time and Rate by MCMC Analysis

We will estimate divergence times by Bayesian analysis.

InfilesChange the name "out.BV" to "in.BV"．

Open mcmctree.ctl file and change the usedata option as follows:

usedata = 2

This option will start the MCMC analysis using Approximate Likelihood Calculation. Also change the name of output file as follows:

outfile = out_usedata2

rgene_gammargene_gamma specifies the prior distribution of overall substitution rate μ with assuming the gamma distribution. Here we need to specify the shape and scale parameters of the gamma distribution.

The gamma distribution is described by the shape parameter (α) and scale parameter (β)．Let m be the mean and s be the standard deviation of the gamma distribution. These are related as follows: a = (m/s)^2 b = m/s^2.
For example, μ is 2/2 =1，then it means 1 change/site/time unit．If time unit is 100 MY，the overall substitution rate is 10^-8 substitution/site/year．
According to the rough estimation using BASEML (see above), we will use m = 0.08 and s = 0.08．The shape parameter α is α = (0.08/0.08)^2
= 1.
And the scale parameter is β = 0.08/0.08^2
=12.5.
So in the control file, we specify rgene_gamma as follows.

If the acceptance proportion are not stable, change "finetune = 1" to "finetune = 0" and modify the finetune values manually．In this case, you need to quit analysis using "Ctl+C". See the manual for detail.

ResultThe result will be saved in out_usedata2 file．You can find the estimated time and 95% confidence interval for each node.

The raw data is saved in mcmc.out file.

3. Figure Making

FigTree.tre (output file of MCMCTREE program) can be used for the infile of FigTree.