CRAN Task View: Phylogenetics, Especially Comparative Methods

The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on GitHub, R-Forge, or authors' websites.

Getting trees into R
: Trees in R are usually stored in the S3 phylo class (implemented in
ape), though the S4 phylo4 class (implemented in
phylobase) is also available.
ape
can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo));").
phylobase
and its lighter weight sibling
rncl
can use the
Nexus Class Library
to read NEXUS, Newick, and other tree formats.
treebase
can search for and load trees from the online tree repository TreeBASE,
rdryad
can pull data from the online data repository Dryad.
RNeXML
can read, write, and process metadata for the
NeXML
format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's
website
).
phyext2
can read and write various tree formats, including simmap formats.
rotl
can pull in a synthetic tree and individual study trees from the Open Tree of Life project.

Utility functions:
These packages include functions for manipulating trees or associated data.
ape
has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, pulling in data from GenBank, and many more.
phylobase
has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants).
geiger
can prune trees and data to an overlapping set of taxa.
treeplyr
can use dplyr-style functions (filter, mutate, reorder, etc.) on objects consisting of trees plus associated data.
evobiR
can do fuzzy matching of names (to allow some differences).
rphast
implements an R interface to the PHAST, which can be used for many types of analysis in comparative and evolutionary genomics, such as estimating models of evolution from sequence data, scoring alignments for conservation or acceleration, and predicting elements based on conservation or custom phylogenetic hidden Markov models.
SigTree
finds branches that are responsive to some treatment, while allowing correction for multiple comparisons.
dendextend
can manipulate dendrograms, including subdividing trees, adding leaves, and more.

Ancestral state reconstruction
: Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in
ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in
ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using
ape.
markophylo
can fit a broad set of discrete character types with models that can incorporate constrained substitution rates, rate partitioning across sites, branch-specific rates, sampling bias, and non-stationary root probabilities.
phytools
can do stochastic character mapping of traits on trees.

Diversification Analysis:
Lineage through time plots can be done in
ape;
nLTT
can estimate the normalized lineage through time statistic, which can be used as a summary statistic in ABC approaches. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification).
TESS
can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in
geiger.
diversitree
implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009).
TreePar
estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species.
caper
can do the macrocaic test to evaluate the effect of a a trait on diversity.
apTreeshape
also has tests for differential diversification (see
description
).
iteRates
can identify and visualize areas on a tree undergoing differential diversification.
DDD
can fit density dependent models as well as models with occasional escape from density-dependence.
BAMMtools
is an interface to the BAMM program to allow visualization of rate shifts, comparison of diversification models, and other functions.
DDD
implements maximum likelihood methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, as well as identifies key innovations and simulate a density-dependent process.
expoTree
can calculate the likelihood of a tree under a density dependent model.
PBD
can calculate the likelihood of a tree under a protracted speciation model.
phyloTop
has functions for investigating tree shape, with special functions and datasets relating to trees of infectious diseases.

Phylogenetic Inference:
UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package
ape.
phangorn
can estimate trees using distance, parsimony, and likelihood.
ips
wraps several tree inference and other programs, including MrBayes, Beast, and RAxML, allowing their easy use from within R.
phyclust
can cluster sequences.
phytools
can build trees using MRP supertree estimation and least squares.
Rphylip
wraps
PHYLIP
, a broad variety of programs for tree inference under parsimony, likelihood, and distance, bootstrapping, character evolution, and more.
phylotools
can build supermatrices for analyses in other software.
pastis
can use taxonomic information to make constraints for Bayesian tree searches.
RADami
can import RADseq data for use with
pyRAD
.
expands
can reconstruct phylogenies of tumors and cluster them into populations.
outbreaker
can infer transmission trees for diseases, as well as other parameters of disease spread;
OutbreakTools
can infer parameters of disease spread. For more information on importing sequence data, see the
Genetics
task view;
pegas
may also be of use.

Time series/Paleontology:
Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using
paleoTS.
strap
can do stratigraphic analysis of phylogenetic trees.

Tree Simulations:
Trees can be simulated using constant-rate birth-death with various constraints in
TreeSim
and a birth-death process in
geiger. Random trees can be generated in
ape
by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees).
paleotree
can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa.
TESS
can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.

Trait evolution:
Independent contrasts for continuous characters can be calculated using
ape,
picante, or
caper
(which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in
geiger.
corHMM
can look for hidden rates in discrete traits as well as fit correlational models for two or three binary traits (similar to Pagel's old Discrete program) and complex models for multistate traits (similar to Pagel's old Multistate program). Brownian motion models can be fit in
geiger,
ape, and
paleotree. Multiple-rate Brownian motion can be fit
in motmot and RBrownie (both currently not on CRAN, but older versions can be downloaded obtained from the
archive
). Deviations from Brownian motion can be investigated in
geiger
and
OUwie.
mvMORPH
can fit Brownian motion, early burst, ACDC, OU, and shift models to univariate or multivariate data. Ornstein-Uhlenbeck (OU) models can be fitted in
geiger,
ape,
ouch
(with multiple means), and
OUwie
(with multiple means, rates, and attraction values).
surface
wraps
ouch
to infer shifts in the OU optimum;
bayou
also allows data-driven selection between different OU models.
geiger
fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with
geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in
geiger. Traditional GLS methods (sensu Grafen or Martins) can be implemented in
ape,
PHYLOGR, or
caper. Phylogenetic autoregression (sensu Cheverud et al) and Phylogenetic autocorrelation (Moran's I) can be implemented in
ape
or--if you wish the significance test of Moran's I to be calculated via a randomization procedure--in
adephylo. Correlation between traits using a GLMM can also be investigated using
MCMCglmm.
phylolm
can fit phylogenetic linear regression and phylogenetic logistic regression models using a fast algorithm, making it suitable for large trees.
phytools
can also investigate rates of trait evolution and do stochastic character mapping.
metafor
can perform meta-analyses accounting for phylogenetic structure.
pmc
evaluates the model adequacy of several trait models (from
geiger
and
ouch) using Monte Carlo approaches.
geomorph
can do geometric morphometric analysis in a phylogenetic context.
MPSEM
can predict features of one species based on information from related species using phylogenetic eigenvector maps.
Rphylip
wraps
PHYLIP
which can do independent contrasts, the threshold model, and more.
convevol
can test for convergent evolution on a phylogeny.

Trait Simulations
: Continuous traits can be simulated using brownian motion in
ouch,
geiger,
ape,
picante,
OUwie, and
caper, the Hansen model (a form of the OU) in
ouch
and
OUwie
and a speciational model in
geiger. Discrete traits can be simulated using a continuous time Markov model in
geiger.
phangorn
can simulate DNA or amino acids. Both discrete and continuous traits can be simulated under models where rates change through time in
geiger.
phytools
can simulate discrete characters using stochastic character mapping.
phylolm
can simulate continuous or binary traits along a tree.

Tree Manipulation
: Branch length scaling using ACDC; Pagel's (1999) lambda, delta and kappa parameters; and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in
geiger.
phytools
also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using
ape. Extinct taxa can be pruned using
geiger.
phylobase
offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with
phangorn.
paleotree
has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations.
dendextend
can manipulate dendrograms, including subdividing trees, adding leaves, and more.

Phyloclimatic Modeling
:
phyloclim
integrates several new tools in this area.

Phylogeography / Biogeography
:
phyloland
implements a model of space colonization mapped on a phylogeny, it aims at estimating limited dispersal and competitive exclusion in a statistical phylogeographic framework.
jaatha
can infer demographic parameters for two species with multiple individuals per species.
BioGeoBEARS
implements a variety of models for discrete biogeography.

Tree Plotting and Visualization:
User trees can be plotted using
ape,
adephylo,
phylobase,
phytools,
ouch, and
dendextend; several of these have options for branch or taxon coloring based on some criterion (ancestral state, tree structure, etc.).
paleoPhylo
and
paleotree
are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using
ape. Ancestral state reconstructions can be visualized along branches using
ape
and
paleotree.
phytools
can project a tree into a morphospace.
BAMMtools
can visualize rate shifts calculated by BAMM on a tree. The popular R visualization package
ggplot2
can be extended by
ggtree
to visualize phylogenies. Trees can also be to interactively explored (as dendrograms) using
idendr0.
phylocanvas
is a widget for "htmlwidgets" that enables embedding of phylogenetic trees using the phylocanvas javascript library.

Tree Comparison:
Tree-tree distances can be evaluated, and used in additional analyses, in
distory
and
Rphylip.
ape
can compute tree-tree distances and also create a plot showing two trees with links between associated tips.
kdetrees
implements a non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees, which could represent inference problems or processes such as horizontal gene transfer.
dendextend
can evaluate multiple measures comparing dendrograms.

Taxonomy:
taxize
can interact with a suite of web APIs for taxonomic tasks, such as verifying species names, getting taxonomic hierarchies, and verifying name spelling.
evobiR
contains functions for making a tree at higher taxonomic levels, downloading a taxonomy tree from NCBI or ITIS, and various other miscellaneous functions (simulations of character evolution, calculating D-statistics, etc.).
pastis
can use taxonomic information to make constraints for Bayesian tree searches.

Gene tree - species tree:
HyPhy
can count the duplication and loss cost to reconcile a gene tree to a species tree. It can also sample histories of gene trees from within family trees.
rmetasim
can simulate loci and individuals across landscapes using the metasim simulation engine.

Notes:
At least ten packages start as phy* in this domain, including two pairs of similarly named packages (phytools and phylotools, phylobase and phybase). This can easily lead to confusion, and future package authors are encouraged to consider such overlaps when naming packages. For clarification,
phytools
provides a wide array of functions, especially for comparative methods, and is maintained by Liam Revell;
phylotools
has functions for building supermatrices and is maintained by Jinlong Zhang.
phylobase
implements S4 classes for phylogenetic trees and associated data and is maintained by Francois Michonneau;
phybase
has tree utility functions and many functions for gene tree - species tree questions and is authored by Liang Liu, but no longer appears on CRAN.