April 1, 2012

Carnival of Evolution, Number 46 -- The Tree (structures) of Life

Welcome to Carnival of Evolution, Number 46. I am your host, Bradly Alicea. This month's theme is: the Tree (structures) of Life. Since this blog covers a mix of both biological and computational content, it is fitting that we explore this month's submissions in the context of trees (the computational kind) and biological classification (the biological kind). I will indulge in a historical and technical overview of trees used in evolutionary analysis, then present this month's posts.

What is a tree structure? In computational science, trees are a type of data structure often used to hierarchically sort information. In graph theory, this is called a directed acyclic graph (DAG). There are decision trees, factor trees, and classification trees, and even trees resulting from fractal growth (Figure A). Factorization of the number 46 can be used to illustrate the way tree structures are built: it can be directly factored to its primes in a single bifurcation (Figure B).

Figure A. LEFT: An example of a decision tree, strict hierarchy. RIGHT: a tree that embodies fractal growth (built using a recursion process).

Figure B. A factor tree for the number "46".

In Darwin's notebooks, common descent was conceptualized as being represented by directed acyclic graph (Figure C). Indeed, one of the primary signatures of common descent (shared, derived traits) is quite well suited to analysis using directed acyclic graphs, although that relationship was not appreciated by Darwin. This connection would not become clear until the rise of phylogenetic theory many years later.

Figure C. One of the first "trees of life", from Darwin's Notebooks.

Figure D. A later version of an evolutionary tree, by Ernst Haeckel (late 19th century).

The "tree of life" is often thought of as a "branching bush" (Figure D) -- meaning that taxa (e.g. species) do not arise from one another in linear fashion. The concept of common ancestry is key. Common ancestors, or one ancestral form giving rise to many descendents, are key enablers of the proliferation of biodiversity, represented by bifurcations (binary splits) in the tree (Figure E). In a biological context then, "trees" are reconstructed from available data to infer a set of evolutionary relationships.

Figure E. Why do we use trees? To reveal a "black box" of evolutionary relatedness containing the common ancestor using a variety of character types (traits). In many cases, the more characters you have the more likely you are to find the "correct" tree, but it comes with a computational cost. Inference can be done using many different types of data.

Modern phylogenetics (or as some people prefer, cladistics) provides a specialized language and protocol for understanding evolution from common descent. Tree structures are also used to reveal clades (e.g. phylogenetic sets) and nested relationships. This is the idea behind monophyly, which postulates that given the data, all related species are directly connected to one and only one internal node (common hypothetical ancestor) in the graph (Figure E).

In general, polyphyly (e.g. parallel or convergent evolution) is considered an incorrect evolutionary hypothesis. However, in select cases, polyphyletic relationships may capture true evolutionary relationships [1]. Yet the work of Carl Woese [2] demonstrates that all three known domains of life (eukarya, bacteria, and archaea) of life can be classified as a series of monophyletic groups.

Figure F. A modern "consensus" (based on rRNA data) phylogeny demonstrating the three domains of life. COURTESY: Wikipedia.

Some people argue that in certain cases (hybridization or horizontal gene transfer) evolution is also reticulate (or in the parlance of graph theory, cyclical). Indeed, depending on natural history, some groups of species or particular traits can exhibit a cycle [3]. Even in the case of the universal tree (Figure F), reticulations in the form of horizontal gene transfer can violate the strict hierarchy of this tree topology.

Finally, phylogenetic relationships range from relationships that distinguish 2-3 species to complex intraspecific relationships and the three domains of life. From a computational point-of-view, this is not a trivial issue. In general, the greater the number of taxa (e.g. species) analyzed, the much greater the number of possible evolutionary hypotheses (e.g. tree structures) there are to evaluate. In equation form, this scales according to the equation in Figure G.

Figure G. Equation to find the total number of possible phylogenies (tree topologies) given a specific number of related taxa.

In Figure G, the number of possible trees (T) increases in exponential fashion with the the number of taxa (N) added to the analysis. As additional taxa are added to the analysis, finding the true tree quickly becomes an NP-hard problem (for the biologists, this means an exact solution is not likely). Fortunately, we can use search heuristics to approximate the true tree. This approximation is of course subject to the type and amount of data added to the analysis.

If you guessed from the equation that this is a combinatorics problem, you are correct, but you still have to complete the evolution crossword puzzle to claim your prize.

Now, on to the posts.........

For this version of Carnival of Evolution, I will be incorporating the submitted and other featured posts into a series of phylogenetic trees. Each tree will demonstrate a typical tree topology that one might encounter in the scientific literature. My basis for homology, character coding, etc. were conceptual more than systematic. In addition, the sampling was non-uniform over the course of March. Nevertheless, this should still be a fun (and potentially educational) experience.

Tree #1: two clades with an outgroup.

In tree #1, we have two clades (taxa that share some set of derived characteristics), as well as an outgroup (an distantly related taxon that helps to determine the polarity, or ancestral state, of traits that make up the tree.

* the outgroup for this tree topology is a post from PZ Myers at Pharyngula, and is a link to an educational video. PZ thinks this is a good way to teach 11-year olds (or the uninitiated) about evolution. PZ's post also serves to root two clades of two posts each.

* the first clade features posts on taxonomy by Larry Moran at Sandwalk and Jerry Coyne at Why Evolution is True. Larry's post is a critique of a recent paper and press release on the taxonomic status of Pikaia (a chordate from the Burgess shale). Jerry's post is a review of a recent PNAS paper on the inactivation of the genes for taste buds (which provides human with the tastes for sweet, bitter, umami, salty, and sour) in certain carnivore species. Particularly, the inactivation of S(sweet)-genes were found to involve multiple types of changes to the genes. For a much more in-depth take on this topic (the subspecies exemplar in our clade), please see Bjorn Ostman at Pleiotropy. In a post called "Carnivores have bad taste", he will help you understand the molecular evolution behind pseudogenes that are coupled to function.

* the second clade features two posts from John Hawks at the John Hawks weblog. If you've never been to this blog, visiting for his artwork on human evolution and diversity alone is worth the time. This is the first post of of many this month on sequence data and its evolutionary implications involving the Gorilla genome [4]. In his first post, John is interested in genes that evolve with respect to hearing (particularly LOXHD1) and how those genes have diverged between Gorillas and Humans. The second post concerns the taxonomic status of humans. Commenting on a recent article about Richard Dawkins in the Washington Post, John argues that Humans should be considered Hominoids rather than apes, because Hominoidea represents a valid taxonomic (e.g. monophyletic) group.

Tree #2: the four taxon case.

In tree #2, we have a four-taxon case, which was originally used by Huelsenbeck and Hillis [5] to test searchheuristics (e.g. algorithms) that allow us to determine maximum parsimony for a set of hypothesized evolutionary relationships. In this unrooted tree, we have two clades.

* the clade on the right involves posts on human evolution. The first post is from This Week in Evolution, and features insight on cumulative culture in primate species. The post focuses on two recent articles on solving puzzles and game playing in a cross-species context. What distinguishes humans from other primates: problem-solving ability, cumulative culture, or a bit of both? The second post, at 10,000 Birds, is a free-association-style essay on the paleoanthropology of human scavenging and its relationship to our modern energy behaviors/needs and the species around us.

Tree #3: variable branch-length tree topology

In tree #3, we have a tree with variable branch lengths. Trees of this style are often used when time-of-divergence information (e.g. mutation rate) is available. Speaking of which, there was a post by Larry Moran from Sandwalk discussing a reconsideration of how mutation rates are calculated. This is based on a recent paper that suggests there is variation in genome-wide mutation rates within and between human families.

Tree #3 (based on no particular mutation rate) features two clades: one on human taxonomy and the other on human sociocultural evolution.

* the first clade features two different takes on human taxonomy. The first post from Stephanie Zvan at Almost Diamonds is a discussion about human subspecies (always a contentious topic) and their relationship to human variation. She asks Greg Laden (a fellow blogger) about this issue, and gets a very thoughtful response. The second post is from John Wilkins at Evolving Thoughts with his thoughts and comments on John Hawks post featured in tree #1.

* the second clade features a post on a social attribute of our species and one of our cultural products, both from an evolutionary perspective. Anne Buchanan at Mermaid's Tale gives her thoughts on human altruism in the context of Hamilton's rule and what it means to be cooperative. She proposes that we must look beyond kinship to understand the true nature of altruism. The second post from This Week in Evolution involves recent finding related to diet soda and how they might be explained by a model of evolutionary tradeoffs between fertility and longetivity.

In tree #4, we have an example of reticulating branches (or evolutionary graph cycles, if you will). Our reticulations are due to related intellectual content or institutional affiliations rather than hybridization or horizontal gene transfer (HGT) events, but hopefully it still conveys the concept of evolution as a tangled bank [7]. Instead of calling out blog posts by clade, I will go clockwise from the upper-left portion of the graph.

* the first post is from yours truly at Synthetic Daisies (Use, reuse, and use again....), and discusses an instance of exaptation called neural recycling (which is one way the brain can acquire new functional architectures without growing accordingly). There is also discussion of homology in a neural context, and the techniques researchers use to map cross-species relationships. The second post is from the Beacon Center Blog, and features current work by Daniel Couvertier on simulating biased group selection and digital evolution. The third post is another post on the Gorilla Genome, this one from David Winter at The Atavism. In this post, the taxonomic relationship between Gorilla, Chimp, and Human is considered, as he shows why not every gene gives the same phylogenetic signal in a three-taxon relationship.

* Ken Weiss at the Mermaid's Tale posts (actually two posts) on role of "slop" in describing how life works. By "slop", he means the role of stochastic processes, non-normally distributed phenomena, and chance events. In another post from the Beacon Center Blog, Eric Bruger gives his own take on the evolution of cooperative behaviors, this time in bacteria. And in another post from Mermaid's Tale (this one by Anne Buchanan), the role of random events in ordered biological systems is pondered. The subject of the post is a recent paper on stochastic gene expression, which Anne then relates to the role populations play in averaging out and otherwise resampling random events in biological systems and evolution. The last post on this tree is from EvoAnth on reconsidering the evolution of monogamy among Primate species by using digit ratios (2D:4D) as an assay.

1) this year's Artificial Life conference (Alife XIII) will be held at Michigan State University in East Lansing from July 18-22. The conference is being hosted by the BEACON center, and the theme is experimental evolution. The program will cover cutting-edge work being done in biological theory, artificial life (the simulation of evolution), and the evolution of intelligence. This will be a very interesting and intellectually stimulating conference, so be sure to attend if you can.

2) here is a bonus for those of you inclined to puzzles and games. I have created an "evolutionary" crossword puzzle for you to ponder over the next month. It is fairly light, but also requires a fair amount of knowledge about evolutionary theory and biology. If you can answer all of the clues correctly by April 20, e-mail me proof of completion and I will post your name to Synthetic Daisies saying that you are a CoE 46 puzzle solver. Good luck!