Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.

Unresolved questions about evolution of the large and diverse legume family include the timing of polyploidy (whole-genome duplication; WGDs) relative to the origin of the major lineages within the Fabaceae and to the origin of symbiotic nitrogen fixation. Previous work has established that a WGD affects most lineages in the Papilionoideae and occurred sometime after the divergence of the papilionoid and mimosoid clades, but the exact timing has been unknown. The history of WGD has also not been established for legume lineages outside the Papilionoideae. We investigated the presence and timing of WGDs in the legumes by querying thousands of phylogenetic trees constructed from transcriptome and genome data from 20 diverse legumes and 17 outgroup species. The timing of duplications in the gene trees indicates that the papilionoid WGD occurred in the common ancestor of all papilionoids. The earliest diverging lineages of the Papilionoideae include both nodulating taxa, such as the genistoids (e.g., lupin), dalbergioids (e.g., peanut), phaseoloids (e.g., beans), and galegoids (=Hologalegina, e.g., clovers), and clades with nonnodulating taxa including Xanthocercis and Cladrastis (evaluated in this study). We also found evidence for several independent WGDs near the base of other major legume lineages, including the Mimosoideae-Cassiinae-Caesalpinieae (MCC), Detarieae, and Cercideae clades. Nodulation is found in the MCC and papilionoid clades, both of which experienced ancestral WGDs. However, there are numerous nonnodulating lineages in both clades, making it unclear whether the phylogenetic distribution of nodulation is due to independent gains or a single origin followed by multiple losses.

The plant hormone auxin is a conserved regulator of development which has been implicated in the generation of morphological novelty. PIN-FORMED1 (PIN) auxin efflux carriers are central to auxin function by regulating its distribution. PIN family members have divergent structures and cellular localizations, but the origin and evolutionary significance of this variation is unresolved. To characterize PIN family evolution, we have undertaken phylogenetic and structural analyses with a massive increase in taxon sampling over previous studies. Our phylogeny shows that following the divergence of the bryophyte and lycophyte lineages, two deep duplication events gave rise to three distinct lineages of PIN proteins in euphyllophytes. Subsequent independent radiations within each of these lineages were taxonomically asymmetric, giving rise to at least 21 clades of PIN proteins, of which 15 are revealed here for the first time. Although most PIN protein clades share a conserved canonical structure with a modular central loop domain, a small number of noncanonical clades dispersed across the phylogeny have highly divergent protein structure. We propose that PIN proteins underwent sub- and neofunctionalization with substantial modification to protein structure throughout plant evolution. Our results have important implications for plant evolution as they suggest that structurally divergent PIN proteins that arose in paralogous radiations contributed to the convergent evolution of organ systems in different land plant lineages.

Conventional methods used to detect and characterize influenza viruses in biological samples face multiple challenges due to the diversity of subtypes and high dissimilarity of emerging strains. Next-generation sequencing (NGS) is a powerful technique that can facilitate the detection and characterization of influenza, however, the sequencing strategy and the procedures of data analysis possess different aspects that require careful consideration.

Ferns are well known for their shade-dwelling habits. Their ability to thrive under low-light conditions has been linked to the evolution of a novel chimeric photoreceptor--neochrome--that fuses red-sensing phytochrome and blue-sensing phototropin modules into a single gene, thereby optimizing phototropic responses. Despite being implicated in facilitating the diversification of modern ferns, the origin of neochrome has remained a mystery. We present evidence for neochrome in hornworts (a bryophyte lineage) and demonstrate that ferns acquired neochrome from hornworts via horizontal gene transfer (HGT). Fern neochromes are nested within hornwort neochromes in our large-scale phylogenetic reconstructions of phototropin and phytochrome gene families. Divergence date estimates further support the HGT hypothesis, with fern and hornwort neochromes diverging 179 Mya, long after the split between the two plant lineages (at least 400 Mya). By analyzing the draft genome of the hornwort Anthoceros punctatus, we also discovered a previously unidentified phototropin gene that likely represents the ancestral lineage of the neochrome phototropin module. Thus, a neochrome originating in hornworts was transferred horizontally to ferns, where it may have played a significant role in the diversification of modern ferns.

It is commonly believed that gene duplications provide the raw material for morphological evolution. Both the number of genes and size of gene families have increased during the diversification of land plants. Several small proteins that regulate transcription factors have recently been identified in plants, including the LITTLE ZIPPER (ZPR) proteins. ZPRs are post-translational negative regulators, via heterodimerization, of class III Homeodomain Leucine Zipper (C3HDZ) proteins that play a key role in directing plant form and growth. We show that ZPR genes originated as a duplication of a C3HDZ transcription factor paralog in the common ancestor of euphyllophytes (ferns and seed plants). The ZPRs evolved by degenerative mutations resulting in loss all of the C3HDZ functional domains, except the leucine zipper that modulates dimerization. ZPRs represent a novel regulatory module of the C3HDZ network unique to the euphyllophyte lineage, and their origin correlates to a period of rapid morphological changes and increased complexity in land plants. The origin of the ZPRs illustrates the significance of gene duplications in creating developmental complexity during land plant evolution that likely led to morphological evolution.

CAM and C4 photosynthesis are two key plant adaptations that have evolved independently multiple times, and are especially prevalent in particular groups of plants, including the Caryophyllales. We investigate the origin of photosynthetic PEPC, a key enzyme of both the CAM and C4 pathways. We combine phylogenetic analyses of genes encoding PEPC with analyses of RNA sequence data of Portulaca, the only plants known to perform both CAM and C4 photosynthesis. Three distinct gene lineages encoding PEPC exist in eudicots (namely ppc-1E1, ppc-1E2 and ppc-2), one of which (ppc-1E1) was recurrently recruited for use in both CAM and C4 photosynthesis within the Caryophyllales. This gene is present in multiple copies in the cacti and relatives, including Portulaca. The PEPC involved in the CAM and C4 cycles of Portulaca are encoded by closely related yet distinct genes. The CAM-specific gene is similar to genes from related CAM taxa, suggesting that CAM has evolved before C4 in these species. The similar origin of PEPC and other genes involved in the CAM and C4 cycles highlights the shared early steps of evolutionary trajectories towards CAM and C4, which probably diverged irreversibly only during the optimization of CAM and C4 phenotypes.

Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 × 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge.

Transcription factors (TFs) are key players in evolution. Changes affecting their function can yield novel life forms but may also have deleterious effects. Consequently, gene duplication events that release one gene copy from selective pressure are thought to be the common mechanism by which TFs acquire new activities. Here, we show that LEAFY, a major regulator of flower development and cell division in land plants, underwent changes to its DNA binding specificity, even though plant genomes generally contain a single copy of the LEAFY gene. We examined how these changes occurred at the structural level and identify an intermediate LEAFY form in hornworts that appears to adopt all different specificities. This promiscuous intermediate could have smoothed the evolutionary transitions, thereby allowing LEAFY to evolve new binding specificities while remaining a single-copy gene.

Nucleomorphs are residual nuclei derived from eukaryotic endosymbionts in chlorarachniophyte and cryptophyte algae. The endosymbionts that gave rise to nucleomorphs and plastids in these two algal groups were green and red algae, respectively. Despite their independent origin, the chlorarachniophyte and cryptophyte nucleomorph genomes share similar genomic features such as extreme size reduction and a three-chromosome architecture. This suggests that similar reductive evolutionary forces have acted to shape the nucleomorph genomes in the two groups. Thus far, however, only a single chlorarachniophyte nucleomorph and plastid genome has been sequenced, making broad evolutionary inferences within the chlorarachniophytes and between chlorarachniophytes and cryptophytes difficult. We have sequenced the nucleomorph and plastid genomes of the chlorarachniophyte Lotharella oceanica in order to gain insight into nucleomorph and plastid genome diversity and evolution.

Optogenetic tools enable examination of how specific cell types contribute to brain circuit functions. A long-standing question is whether it is possible to independently activate two distinct neural populations in mammalian brain tissue. Such a capability would enable the study of how different synapses or pathways interact to encode information in the brain. Here we describe two channelrhodopsins, Chronos and Chrimson, discovered through sequencing and physiological characterization of opsins from over 100 species of alga. Chrimson's excitation spectrum is red shifted by 45 nm relative to previous channelrhodopsins and can enable experiments in which red light is preferred. We show minimal visual system-mediated behavioral interference when using Chrimson in neurobehavioral studies in Drosophila melanogaster. Chronos has faster kinetics than previous channelrhodopsins yet is effectively more light sensitive. Together these two reagents enable two-color activation of neural spiking and downstream synaptic transmission in independent neural populations without detectable cross-talk in mouse brain slice.

The genetic mechanisms regulating dry fruit development and opercular dehiscence have been identified in Arabidopsis thaliana. In the bicarpellate silique, valve elongation and differentiation is controlled by FRUITFULL (FUL) that antagonizes SHATTERPROOF1-2 (SHP1/SHP2) and INDEHISCENT (IND) at the dehiscence zone where they control normal lignification. SHP1/2 are also repressed by REPLUMLESS (RPL), responsible for replum formation. Similarly, FUL indirectly controls two other factors ALCATRAZ (ALC) and SPATULA (SPT) that function in the proper formation of the separation layer. FUL and SHP1/2 belong to the MADS-box family, IND and ALC belong to the bHLH family and RPL belongs to the homeodomain family, all of which are large transcription factor families. These families have undergone numerous duplications and losses in plants, likely accompanied by functional changes. Functional analyses of homologous genes suggest that this network is fairly conserved in Brassicaceae and less conserved in other core eudicots. Only the MADS box genes have been functionally characterized in basal eudicots and suggest partial conservation of the functions recorded for Brassicaceae. Here we do a comprehensive search of SHP, IND, ALC, SPT, and RPL homologs across core-eudicots, basal eudicots, monocots and basal angiosperms. Based on gene-tree analyses we hypothesize what parts of the network for fruit development in Brassicaceae, in particular regarding direct and indirect targets of FUL, might be conserved across angiosperms.

GSK3 (glycogen synthase kinase 3) genes encode signal transduction proteins with roles in a variety of biological processes in eukaryotes. In contrast to the low copy numbers observed in animals, GSK3 genes have expanded into a multi-gene family in land plants (embryophytes), and have also evolved functions in diverse plant specific processes, including floral development in angiosperms. However, despite previous efforts, the phylogeny of land plant GSK3 genes is currently unclear. Here, we analyze genes from a representative sample of phylogenetically pivotal taxa, including basal angiosperms, gymnosperms, and monilophytes, to reconstruct the evolutionary history and functional diversification of the GSK3 gene family in land plants.

Molecular phylogenetic investigations have revolutionized our understanding of the evolutionary history of ferns-the second-most species-rich major group of vascular plants, and the sister clade to seed plants. The general absence of genomic resources available for this important group of plants, however, has resulted in the strong dependence of these studies on plastid data; nuclear or mitochondrial data have been rarely used. In this study, we utilize transcriptome data to design primers for nuclear markers for use in studies of fern evolutionary biology, and demonstrate the utility of these markers across the largest order of ferns, the Polypodiales.

We conducted an unbiased metagenomics survey using plasma from patients with chronic hepatitis B, chronic hepatitis C, autoimmune hepatitis (AIH), non-alcoholic steatohepatitis (NASH), and patients without liver disease (control). RNA and DNA libraries were sequenced from plasma filtrates enriched in viral particles to catalog virus populations. Hepatitis viruses were readily detected at high coverage in patients with chronic viral hepatitis B and C, but only a limited number of sequences resembling other viruses were found. The exception was a library from a patient diagnosed with hepatitis C virus (HCV) infection that contained multiple sequences matching GB virus C (GBV-C). Abundant GBV-C reads were also found in plasma from patients with AIH, whereas Torque teno virus (TTV) was found at high frequency in samples from patients with AIH and NASH. After taxonomic classification of sequences by BLASTn, a substantial fraction in each library, ranging from 35% to 76%, remained unclassified. These unknown sequences were assembled into scaffolds along with virus, phage and endogenous retrovirus sequences and then analyzed by BLASTx against the non-redundant protein database. Nearly the full genome of a heretofore-unknown circovirus was assembled and many scaffolds that encoded proteins with similarity to plant, insect and mammalian viruses. The presence of this novel circovirus was confirmed by PCR. BLASTx also identified many polypeptides resembling nucleo-cytoplasmic large DNA viruses (NCLDV) proteins. We re-evaluated these alignments with a profile hidden Markov method, HHblits, and observed inconsistencies in the target proteins reported by the different algorithms. This suggests that sequence alignments are insufficient to identify NCLDV proteins, especially when these alignments are only to small portions of the target protein. Nevertheless, we have now established a reliable protocol for the identification of viruses in plasma that can also be adapted to other patient samples such as urine, bile, saliva and other body fluids.

In the canonical version of evolution by gene duplication, one copy is kept unaltered while the other is free to evolve. This process of evolutionary experimentation can persist for millions of years. Since it is so short lived in comparison to the lifetime of the core genes that make up the majority of most genomes, a substantial fraction of the genome and the transcriptome may-in principle-be attributable to what we will refer to as "evolutionary transients", referring here to both the process and the genes that have gone or are undergoing this process. Using the rice gene set as a test case, we argue that this phenomenon goes a long way towards explaining why there are so many more rice genes than Arabidopsis genes, and why most excess rice genes show low similarity to eudicots.

Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

In 2005, Wyckoff and coworkers described a surprisingly strong correlation between Ka/Ks and Ks in several data sets using the LPB93 algorithm. This finding indicated the possibility of a paradigm shift in the way selection strength can be measured using the Ka/Ks ratio. We carried out a calculation of Ka and Ks using six different algorithms on three cross-species orthologous data sets and found a highly variable correlation among the algorithms and lineages. Algorithms based on the GY-HKY substitution model exhibit a weaker positive correlation or a stronger negative correlation than those based on the K2P and JC69 substitution model. Even if one algorithm shows a positive correlation between Ka/Ks and Ks in a warm-blooded lineage, it may show no correlation in a cold-blooded lineage. This algorithm-related and evolutionary lineage-related correlation indicates the need for great caution in drawing conclusions when using only one Ka and Ks algorithm in a genomewide analysis of selection strength. Our results indicated that currently used algorithms for Ka and Ks calculations are flawed and need improvements.

The resolution of the chicken consensus linkage map has been dramatically improved in this study by genotyping 12,945 single nucleotide polymorphisms (SNPs) on three existing mapping populations in chicken: the Wageningen (WU), East Lansing (EL), and Uppsala (UPP) mapping populations. As many as 8599 SNPs could be included, bringing the total number of markers in the current consensus linkage map to 9268. The total length of the sex average map is 3228 cM, considerably smaller than previous estimates using the WU and EL populations, reflecting the higher quality of the new map. The current map consists of 34 linkage groups and covers at least 29 of the 38 autosomes. Sex-specific analysis and comparisons of the maps based on the three individual populations showed prominent heterogeneity in recombination rates between populations, but no significant heterogeneity between sexes. The recombination rates in the F(1) Red Jungle fowl/White Leghorn males and females were significantly lower compared with those in the WU broiler population, consistent with a higher recombination rate in purebred domestic animals under strong artificial selection. The recombination rate varied considerably among chromosomes as well as along individual chromosomes. An analysis of the sequence composition at recombination hot and cold spots revealed a strong positive correlation between GC-rich sequences and high recombination rates. The GC-rich cohesin binding sites in particular stood out from other GC-rich sequences with a 3.4-fold higher density at recombination hot spots versus cold spots, suggesting a functional relationship between recombination frequency and cohesin binding.

Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ? 1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers.

JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.

How does it work?

We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.

Video X seems to be unrelated to Abstract Y...

In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.