Wednesday, June 13, 2007

Phylogenetic Analysis

A phylogenetic analysis of a family of related nucleic acid or protein sequences is a determination of how the family might have been derived during evolution. The evolutionary relationships among the sequences are depicted by placing the sequences as outer branches on a tree. The branching relationships on the inner part of the tree then reflect the degree to which different sequences are related. Two sequences that are very much alike will be located as neighboring outside branches and will be joined to a common branch beneath them. The object of phylogenetic analysis is to discover all of the branching relationships in the tree and the branch lengths.

Phylogenetic analysis of nucleic acid and protein sequences is presently and will continue to be a important area of sequence analysis. In addition to analyzing changes that have occurred in the evolution of different organisms, the evolution of a family of sequences may be studied. On the basis of the analysis, sequences that are the most closely related can be identified by their occupying neighboring branches on a tree. When a gene family is found in an organism or group of organisms, phylogenetic relationships among the genes can help to predict which ones might have an equivalent function. These functional predictions can then be tested by genetic experiments. Phylogenetic analysis may also be used to follow the changes occurring in a rapidly changing species, such as a virus. Analysis of the types of changes within a population can reveal, for example,whether or not a particular gene is under selection (McDonald and Kreitman 1991; comeron and Kreitman 1998; Nielsen and Yang 1998), an important source of information in applications like epidemiology.

Phylogenetic Analysis is the study of evolutionary relationships phylogenetic analysis means estimating these relationships. The evolutionary history inferred from phylogenetic analysis is usually depicted as branding, treelike diagrams that represent on estimated pedigree of the inherited relationships among molecules, organisms or both.Phylogenetics is also called as cladistics because the word ‘clade’ a set of descendants from a single ancestor is derived form the Greek word for branch. However, cladistics is a particular method of hypothesizing about evolutionary relationships.

Cladistics analysis is performed by comparing multiple characteristics or characters at once. Either multiple phenotype characters or multiple base pairs or amino acids in a sequence.

Three basic assumptions in cladistics.

Any group of organisms id related by descent from a common ancestor.

There is a bifurcating pattern of cladogenesis.

Change in characteristics occurs in lineages over time. This is a necessary condition for cladistics to work.

A clade is a monophyletic taxon. Clades are grous of organisors or genes that include the most recent common ancestor of all or its members and all of the descendants of that most recent common ancestor. Clade is derived from the Greek work ‘Klados’ meaning branchingor twig.

A Taxon is any named group of organisms but not necessarily a clade.

In some analysis, branch lengths correspond to divergence (in above eg. mouse is slightly more related to fly than human to fly.)

A node is bifurcating branch point.

Branch : defines the relationship between the taxa in terms of descent and ancestry.

Topology : is the branching pattern.

Branch length : often represents the number of changes that have occurred in that branch.

Root: is the common ancestor of all taxa.

Distance scale : scale which represents the number of differences between sequences (e.g. 0.1 means 10 % differences between two sequences)

Common Phylogenetic Tree Terminology

Phylogenetic trees diagram the evolutionary relationships between the taxa

This dimension either can have no scale (for ‘cladograms’), can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportionalto time (for ‘ultrametric trees’ or true evolutionary trees).

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.

Three types of trees

Tree Styles

This offers the choice of tree diagram unrooted or the rooted forms of Chladogram, Phenogram, Curvogram, eurogram and Woopogram. The style are describe as

Rooted and Unrooted tree

Cladogram – Nodes are connected to other nodes and to tips by straight lines going directly from one to the other. This gives a V-shaped appearance.

Curvogram – Nodes are connected to other nodes and to tips by a curve, which is one fourth of an ellipse, starting out horizontally and then curving upwards to become vertical. John Rudd suggested this pattern.

Phenogram – Nodes are connected to other nodes and to other tips by a hortizontal and then a vertical line. This gives a particularly precise idea of horizontal levels.

Eurogram – So-called because it is a version of cladogram diagram popular in europe (name courtesy of David Maddison). Nodes are connected to other nodes and to tips by a diagonal line that goes outward and goes at most one-third of the way upto the next node,then turns sharply straight upwards and is vertical. Unfortunately it is nearly impossible to guarantee, when branch lengths are used, that the angles of divergence of lines are the same.

Swoopogram – This option (suggested by James Archie) connects two nodes or a node and a tip using two curves that are actually each one-quarter of an ellipse. The first part starts out vertical and then bends over to become horizontal. The second part, which is at least two-thirds of the total, starts out horizontal and then bends up to become vertical. The effect is that two lineages split apart gradually, then more rapidly, then both turn upwards.

Possible ways of drawing a tree:

Trees can be drawn in different ways. There are trees with unscaled branches and with scaled branches.

Unscaled branches : the length is not proportional to the number of changes. Sometimes, the number of changes are indicated on the branches with numbers. The nodes represents the divergence event on a time scale.

Scaled branches : the length of the branch is proportional to the number of changes. The distance between 2 species is the sum of the length of all branches connecting them.

It is also possible to draw these trees with or without a root. For rooted trees, the root is the common ancestor. For each species, there is a unique path that leads from the root to that species. The direction of each path corresponds to evolutionary time. An unrooted tree specifies the relationships among species and does not define the evolutionary path.

Applications:

The objective of Phylogenetic analysis is to discover all of the branching relationship in the tree and the branch lengths. Phylogenetic analysis of nucleic acid and protein is presently and will continue to be an important area of sequence analysis. In addition to analysing changes that have occurred in the evolution of different organisms the evolution of family of sequences may be studied. On the analysing sequences that are closely related can be identified by their ocuupying neighbouring braches on a tree. When a gene family is found in an organism or group of organism phylogenetic relationship among the genes can help to predict which ones might have an equivalent function. Phylogenetic analysis may also be used to follow the changes occuring in a rapidly changing species, such as Virus. Analysis of the types of changes with in a population can reveal, for example: whether or not a particular gene is under selection an important source of information in applications like Epidemiology. With the aid of sequences , it should be possible to find the genealogical ties between the organisms. Experience learns that, closely related organisms have simillar sequences, more distantly related organisms have more dissimilar sequences . One objective is to reconstruct evoulutionary relationship beween species. Another objective is to estimate the time of divergence between two organisms since they last shared a common ancestor.