Abstract

Pinus taeda L. (loblolly pine) and Arabidopsis thaliana differ greatly in form, ecological niche, evolutionary history, and genome size. Arabidopsis is a small, herbaceous, annual dicotyledon, whereas pines are large, long-lived, coniferous forest trees. Such diverse plants might be expected to differ in a large number of functional genes. We have obtained and analyzed 59,797 expressed sequence tags (ESTs) from wood-forming tissues of loblolly pine and compared them to the gene sequences inferred from the complete sequence of the Arabidopsis genome. Approximately 50% of pine ESTs have no apparent homologs in Arabidopsis or any other angiosperm in public databases. When evaluated by using contigs containing long, high-quality sequences, we find a higher level of apparent homology between the inferred genes of these two species. For those contigs 1,100 bp or longer, approximately 90% have an apparent Arabidopsis homolog (E value < 10-10). Pines and Arabidopsis last shared a common ancestor approximately 300 million years ago. Few genes would be expected to retain high sequence similarity for this time if they did not have essential functions. These observations suggest substantial conservation of gene sequence in seed plants.

Loblolly pine xylogenesis UniGene set, classified by cellular functional categories, compared with A. thaliana. The proportion of Arabidopsis genes in each functional category is relative to the 12,922 total predicted genes that were assigned by the Arabidopsis Genome Initiative () to 1 of 12 major categories. The proportion of predicted loblolly pine genes in each functional category is relative to the total number of contigs and singlets (xylogenesis UniGene set) for which homology was found to an Arabidopsis gene (blastxE value < 10-5) that was assigned to at least one functional category.

Sequence similarity (blastx score on y axis) of contig 6,593 at increasing lengths (from 5′ to 3′) compared with the Arabidopsis probable homolog. The left and right arrows indicate the beginning and end of the Arabidopsis gene coding sequence, respectively.