Rice is an important crop and major model plant for monocot functionalgenomics studies. With the establishment of various genetic resources for rice genomics, the next challenge is to systematically assign functions to predicted genes in the rice genome. Compared with the robustness of genome sequencing and bioinformatics techniques, progress in understanding the function of rice genes has lagged, hampering the utilization of rice genes for cereal crop improvement. The use of transfer DNA (T-DNA) insertional mutagenesis offers the advantage of uniform distribution throughout the rice genome, but preferentially in gene-rich regions, resulting in direct gene knockout or activation of genes within 20-30 kb up- and downstream of the T-DNA insertion site and high gene tagging efficiency. Here, we summarize the recent progress in functionalgenomics using the T-DNA-tagged rice mutant population. We also discuss important features of T-DNA activation- and knockout-tagging and promoter-trapping of the rice genome in relation to mutant and candidate gene characterizations and how to more efficiently utilize rice mutant populations and datasets for high-throughput functionalgenomics and phenomics studies by forward and reverse genetics approaches. These studies may facilitate the translation of rice functionalgenomics research to improvements of rice and other cereal crops.

Drosophila melanogaster has become a system of choice for functionalgenomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functionalgenomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functionalgenomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de. PMID:28053165

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de.

Functional characterisation of proteins and large-scale, systems-level studies are enabled by extensive sets of cloned open reading frames (ORFs) in an easily-accessible format that enables many different applications. Here we report the release of the first stage of the Xenopus ORFeome, which contains 8673 ORFs from the Xenopus Gene Collection (XGC) for Xenopus laevis, cloned into a Gateway® donor vector enabling rapid in-frame transfer of the ORFs to expression vectors. This resource represents an estimated 7871 unique genes, approximately 40% of the non-redundant X. laevis gene complement, and includes 2724 genes where the human ortholog has an association with disease. Transfer into the Gateway system was validated by 5′ and 3′ end sequencing of the entire collection and protein expression of a set of test clones. In a parallel process, the underlying ORF predictions from the original XGC collection were re-analysed to verify quality and full-length status, identifying those proteins likely to exhibit truncations when translated. These data are integrated into Xenbase, the Xenopus community database, which associates genomic, expression, function and human disease model metadata to each ORF, enabling end-users to search for ORFeome clones with links to commercial distributors of the collection. When coupled with the experimental advantages of Xenopus eggs and embryos, the ORFeome collection represents a valuable resource for functionalgenomics and disease modelling. PMID:26391338

Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the FunctionalGenomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomicresources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID

Tomato (Solanum lycopersicum L., Solanaceae) is an excellent model plant for genomic research of solanaceous plants, as well as for studying the development, ripening, and metabolism of fruit. In 2003, the International Solanaceae Project (SOL, www.sgn.cornell.edu ) was initiated by members from more than 30 countries, and the tomato genome-sequencing project is currently underway. Genome sequence of tomato obtained by this project will provide a firm foundation for forthcoming genomic studies such as the comparative analysis of genes conserved among the Solanaceae species and the elucidation of the functions of unknown tomato genes. To exploit the wealth of the genome sequence information, there is an urgent need for novel resources and analytical tools for tomato functionalgenomics. Here, we present an overview of the development of genetic and genomicresources of tomato in the last decade, with a special focus on the activities of Japan SOL and the National Bio-Resource Project in the development of functionalgenomicresources of a model cultivar, Micro-Tom. PMID:19506732

The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functionalgenomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.

Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functionalgenomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functionalgenomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. ‘User Comments’ may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate. PMID:19843604

Recent technological innovations have ignited an explosion in virus genome sequencing that promises to fundamentally alter our understanding of viral biology and profoundly impact public health policy. Yet, any potential benefits from the billowing cloud of next generation sequence data hinge upon well implemented reference resources that facilitate the identification of sequences, aid in the assembly of sequence reads and provide reference annotation sources. The NCBI Viral GenomesResource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data. The resource can be accessed at http://www.ncbi.nlm.nih.gov/genome/viruses/ and catalogs all publicly available virus genome sequences and curates reference genome sequences. As the number of genome sequences has grown, so too have the difficulties in annotating and maintaining reference sequences. The rapid expansion of the viral sequence universe has forced a recalibration of the data model to better provide extant sequence representation and enhanced reference sequence products to serve the needs of the various viral communities. This, in turn, has placed increased emphasis on leveraging the knowledge of individual scientific communities to identify important viral sequences and develop well annotated reference virus genome sets.

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomicsresources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.

Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. PMID:27903883

Background EST (expressed sequence tag) sequences and their annotation provide a highly valuable resource for gene discovery, genome sequence annotation, and other genomics studies that can be applied in genetics, breeding and conservation programs for non-model organisms. Conifers are long-lived plants that are ecologically and economically important globally, and have a large genome size. Black spruce (Picea mariana), is a transcontinental species of the North American boreal and temperate forests. However, there are limited transcriptomic and genomicresources for this species. The primary objective of our study was to develop a black spruce transcriptomic resource to facilitate on-going functionalgenomics projects related to growth and adaptation to climate change. Results We conducted bidirectional sequencing of cDNA clones from a standard cDNA library constructed from black spruce needle tissues. We obtained 4,594 high quality (2,455 5' end and 2,139 3' end) sequence reads, with an average read-length of 532 bp. Clustering and assembly of ESTs resulted in 2,731 unique sequences, consisting of 2,234 singletons and 497 contigs. Approximately two-thirds (63%) of unique sequences were functionally annotated. Genes involved in 36 molecular functions and 90 biological processes were discovered, including 24 putative transcription factors and 232 genes involved in photosynthesis. Most abundantly expressed transcripts were associated with photosynthesis, growth factors, stress and disease response, and transcription factors. A total of 216 full-length genes were identified. About 18% (493) of the transcripts were novel, representing an important addition to the Genbank EST database (dbEST). Fifty-seven di-, tri-, tetra- and penta-nucleotide simple sequence repeats were identified. Conclusions We have developed the first high quality EST resource for black spruce and identified 493 novel transcripts, which may be species-specific related to life history and

With the completion of the zebrafish genome sequencing project, it becomes possible to analyze the function of zebrafish genes in a systematic way. The first step in such an analysis is to inactivate each protein-coding gene by targeted or random mutation. Here we describe a streamlined pipeline using proviral insertions coupled with high-throughput sequencing and mapping technologies to widely mutagenize genes in the zebrafish genome. We also report the first 6144 mutagenized and archived F1's predicted to carry up to 3776 mutations in annotated genes. Using in vitro fertilization, we have rescued and characterized ~0.5% of the predicted mutations, showing mutation efficacy and a variety of phenotypes relevant to both developmental processes and human genetic diseases. Mutagenized fish lines are being made freely available to the public through the Zebrafish International Resource Center. These fish lines establish an important milestone for zebrafish genetics research and should greatly facilitate systematic functional studies of the vertebrate genome.

A vast majority of the burden from neglected tropical diseases result from helminth infections (nematodes and platyhelminthes). Parasitic helminthes infect over 2 billion, exerting a high collective burden that rivals high-mortality conditions such as AIDS or malaria, and cause devastation to crops and livestock. The challenges to improve control of parasitic helminth infections are multi-fold and no single category of approaches will meet them all. New information such as helminth genomics, functionalgenomics and proteomics coupled with innovative bioinformatic approaches provide fundamental molecular information about these parasites, accelerating both basic research as well as development of effective diagnostics, vaccines and new drugs. To facilitate such studies we have developed an online resource, HelmCoP (Helminth Control and Prevention), built by integrating functional, structural and comparative genomic data from plant, animal and human helminthes, to enable researchers to develop strategies for drug, vaccine and pesticide prioritization, while also providing a useful comparative genomics platform. HelmCoP encompasses genomic data from several hosts, including model organisms, along with a comprehensive suite of structural and functional annotations, to assist in comparative analyses and to study host-parasite interactions. The HelmCoP interface, with a sophisticated query engine as a backbone, allows users to search for multi-factorial combinations of properties and serves readily accessible information that will assist in the identification of various genes of interest. HelmCoP is publicly available at: http://www.nematode.net/helmcop.html.

An integrated database with a variety of Web-based systems named WheatGenome.info hosting wheat genome and genomic data has been developed to support wheat research and crop improvement. The resource includes multiple Web-based applications, which are implemented as a variety of Web-based systems. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This portal provides links to a variety of wheat genomeresources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/ .

Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock GenomicResource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

Identification and elucidation of functions of plant genes is valuable for both basic and applied research. In addition to natural variation in model plants, numerous loss-of-functionresources have been produced by mutagenesis with chemicals, irradiation, or insertions of transposable elements or T-DNA. However, we may be unable to observe loss-of-function phenotypes for genes with functionally redundant homologs and for those essential for growth and development. To offset such disadvantages, gain-of-function transgenic resources have been exploited. Activation-tagged lines have been generated using obligatory overexpression of endogenous genes by random insertion of an enhancer. Recent progress in DNA sequencing technology and bioinformatics has enabled the preparation of genomewide collections of full-length cDNAs (fl-cDNAs) in some model species. Using the fl-cDNA clones, a novel gain-of-function strategy, Fl-cDNA OvereXpressor gene (FOX)-hunting system, has been developed. A mutant phenotype in a FOX line can be directly attributed to the overexpressed fl-cDNA. Investigating a large population of FOX lines could reveal important genes conferring favorable phenotypes for crop breeding. Alternatively, a unique loss-of-function approach Chimeric REpressor gene Silencing Technology (CRES-T) has been developed. In CRES-T, overexpression of a chimeric repressor, composed of the coding sequence of a transcription factor (TF) and short peptide designated as the repression domain, could interfere with the action of endogenous TF in plants. Although plant TFs usually consist of gene families, CRES-T is effective, in principle, even for the TFs with functional redundancy. In this review, we focus on the current status of the gene-overexpression strategies and resources for identifying and elucidating novel functions of cereal genes. We discuss the potential of these research tools for identifying useful genes and phenotypes for application in crop breeding. PMID

Gramene (http://www.gramene.org) is a curated online resource for comparative functionalgenomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...

With the completion of the zebrafish genome sequencing project, it becomes possible to analyze the function of zebrafish genes in a systematic way. The first step in such an analysis is to inactivate each protein-coding gene by targeted or random mutation. Here we describe a streamlined pipeline using proviral insertions coupled with high-throughput sequencing and mapping technologies to widely mutagenize genes in the zebrafish genome. We also report the first 6144 mutagenized and archived F1's predicted to carry up to 3776 mutations in annotated genes. Using in vitro fertilization, we have rescued and characterized ∼0.5% of the predicted mutations, showing mutation efficacy and a variety of phenotypes relevant to both developmental processes and human genetic diseases. Mutagenized fish lines are being made freely available to the public through the Zebrafish International Resource Center. These fish lines establish an important milestone for zebrafish genetics research and should greatly facilitate systematic functional studies of the vertebrate genome. PMID:23382537

This page provides links to research resources, complied by the Epidemiology and Genomics Research Program, that may be of interest to genetic epidemiologists conducting cancer research, but is not exhaustive.

The National Center for Biotechnology Information (NCBI) is well known for the nucleotide sequence archive, GenBank and sequence analysis tool BLAST. However, NCBI integrates many types of biomolecular data from variety of sources and makes it available to the scientific community as interactive web resources as well as organized releases of bulk data. These tools are available to explore and compare fungal genomes. Searching all databases with Fungi [organism] at http://www.ncbi.nlm.nih.gov/ is the quickest way to find resources of interest with fungal entries. Some tools though are resources specific and can be indirectly accessed from a particular database in the Entrez system. These include graphical viewers and comparative analysis tools such as TaxPlot, TaxMap and UniGene DDD (found via UniGene Homepage). Gene and BioProject pages also serve as portals to external data such as community annotation websites, BioGrid and UniProt. There are many different ways of accessing genomic data at NCBI. Depending on the focus and goal of research projects or the level of interest, a user would select a particular route for accessing genomic databases and resources. This review article describes methods of accessing fungal genome data and provides examples that illustrate the use of analysis tools.

Comparative genomics studies in primates are restricted due to our limited access to samples. In order to gain better insight into the genetic processes that underlie variation in complex phenotypes in primates, we must have access to faithful model systems for a wide range of cell types. To facilitate this, we generated a panel of 7 fully characterized chimpanzee induced pluripotent stem cell (iPSC) lines derived from healthy donors. To demonstrate the utility of comparative iPSC panels, we collected RNA-sequencing and DNA methylation data from the chimpanzee iPSCs and the corresponding fibroblast lines, as well as from 7 human iPSCs and their source lines, which encompass multiple populations and cell types. We observe much less within-species variation in iPSCs than in somatic cells, indicating the reprogramming process erases many inter-individual differences. The low within-species regulatory variation in iPSCs allowed us to identify many novel inter-species regulatory differences of small magnitude.

Comparative genomics studies in primates are restricted due to our limited access to samples. In order to gain better insight into the genetic processes that underlie variation in complex phenotypes in primates, we must have access to faithful model systems for a wide range of cell types. To facilitate this, we generated a panel of 7 fully characterized chimpanzee induced pluripotent stem cell (iPSC) lines derived from healthy donors. To demonstrate the utility of comparative iPSC panels, we collected RNA-sequencing and DNA methylation data from the chimpanzee iPSCs and the corresponding fibroblast lines, as well as from 7 human iPSCs and their source lines, which encompass multiple populations and cell types. We observe much less within-species variation in iPSCs than in somatic cells, indicating the reprogramming process erases many inter-individual differences. The low within-species regulatory variation in iPSCs allowed us to identify many novel inter-species regulatory differences of small magnitude. DOI: http://dx.doi.org/10.7554/eLife.07103.001 PMID:26102527

Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes

Recent remarkable innovations in platforms for omics-based research and application development provide crucial resources to promote research in model and applied plant species. A combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant productivity. Furthermore, promotion of comparative genomics among model and applied plants allows us to grasp the biological properties of each species and to accelerate gene discovery and functional analyses of genes. Bioinformatics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomicresources, including resource integration. We review recent advances in research platforms and resources in plant omics together with related databases and advances in technology. PMID:20208064

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

Functionalgenome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functionalgenomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functionalgenomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functionalgenomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

ABSTRACT Human genome-wide association studies (GWAS) have successfully identified thousands of susceptibility loci for common diseases with complex genetic etiologies. Although the susceptibility variants identified by GWAS usually have only modest effects on individual disease risk, they contribute to a substantial burden of trait variation in the overall population. GWAS also offer valuable clues to disease mechanisms that have long proven to be elusive. These insights could lead the way to breakthrough treatments; however, several challenges hinder progress, making innovative approaches to accelerate the follow-up of results from GWAS an urgent priority. Here, we discuss the largely untapped potential of the fruit fly, Drosophila melanogaster, for functional investigation of findings from human GWAS. We highlight selected examples where strong genomic conservation with humans along with the rapid and powerful genetic tools available for flies have already facilitated fine mapping of association signals, elucidated gene mechanisms, and revealed novel disease-relevant biology. We emphasize current research opportunities in this rapidly advancing field, and present bioinformatic analyses that systematically explore the applicability of Drosophila for interrogation of susceptibility signals implicated in more than 1000 human traits, based on all GWAS completed to date. Thus, our discussion is targeted at both human geneticists seeking innovative strategies for experimental validation of findings from GWAS, as well as the Drosophila research community, by whom ongoing investigations of the implicated genes will powerfully inform our understanding of human disease. PMID:28151408

Gramene (http://www.gramene.org) is an online resource for comparative functionalgenomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the data...

Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functionalgenomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances.

Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functionalgenomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances. PMID:26430154

With the completion of the genome sequences of the model plants Arabidopsis and rice, and the continuing sequencing efforts of other economically important crop plants, an unprecedented amount of genome sequence data is now available for large-scale genomics studies and analyses, such as the identification and discovery of novel genes, comparative genomics, and functionalgenomics. Efficient utilization of these large data sets is critically dependent on the ease of access and organization of the data. The plant databases at The Institute for Genomic Research (TIGR) have been set up to maintain various data types including genomic sequence, annotation and analyses, expressed transcript assemblies and analyses, and gene expression profiles from microarray studies. We present here an overview of the TIGR database resources for plant genomics and describe methods to access the data.

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic mode...

OCG provides a variety of scientific and educational resources for both cancer researchers and members of the general public. These resources are divided into the following types: OCG-Supported Resources: Tools, databases, and reagents generated by initiated and completed OCG programs for researchers, educators, and students. (Note: Databases for current OCG programs are available through program-specific data matrices)

Gramene (www.gramene.org) is a curated genetic, genomic and comparative genome analysis resource for the major crop species, such as rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project, with all data and software freely downloadable through the ftp site ...

Maintenance of genome integrity is a fundamental requirement of all organisms. To address this, organisms have evolved extremely faithful modes of replication, DNA repair and chromosome segregation to combat the deleterious effects of an unstable genome. Nonetheless, a small amount of genome instability is the driver of evolutionary change and adaptation, and thus a low level of instability is permitted in populations. While defects in genome maintenance almost invariably reduce fitness in the short term, they can create an environment where beneficial mutations are more likely to occur. The importance of this fact is clearest in the development of human cancer, where genome instability is a well-established enabling characteristic of carcinogenesis. This raises the crucial question: what are the cellular pathways that promote genome maintenance and what are their mechanisms? Work in model organisms, in particular the yeast Saccharomyces cerevisiae, has provided the global foundations of genome maintenance mechanisms in eukaryotes. The development of pioneering genomic tools inS. cerevisiae, such as the systematic creation of mutants in all nonessential and essential genes, has enabled whole-genome approaches to identifying genes with roles in genome maintenance. Here, we review the extensive whole-genome approaches taken in yeast, with an emphasis on functionalgenomic screens, to understand the genetic basis of genome instability, highlighting a range of genetic and cytological screening modalities. By revealing the biological pathways and processes regulating genome integrity, these analyses contribute to the systems-level map of the yeast cell and inform studies of human disease, especially cancer.

Background Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Results We generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembled into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians

The primary function of the genome is to store, propagate, and express the genetic information that gives rise to a cell's architectural and functional machinery. However, the genome is also a major structural component of the cell. Besides its genetic roles, the genome affects cellular functions by nongenetic means through its physical and structural properties, particularly by exerting mechanical forces and by serving as a scaffold for binding of cellular components. Major cellular processes affected by nongenetic functions of the genome include establishment of nuclear structure, signal transduction, mechanoresponses, cell migration, and vision in nocturnal animals. We discuss the concept, mechanisms, and implications of nongenetic functions of the genome.

The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413

PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functionalgenomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

The Apiaceae family includes carrot, celery, cilantro, dill, fennel and numerous other spice and medicinal crops. Carrot is the most economically important member of the Apiaceae with an annual value of $600 M in the United States alone. There are few genomicresources for carrot or other Apiaceae, ...

Thermoanaerobacterium saccharolyticum is a hemicellulose-degrading thermophilic anaerobe that was previously engineered to produce ethanol at high yield. For this research, a major project was undertaken to develop this organism into an industrial biocatalyst, but the lack of genome information and resources were recognized early on as a key limitation.

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functionalgenomics measurements to unravel how regulatory information is encoded in the genome. PMID:19226437

The National Institute of Mental Health (NIMH) has made sustained investments in the development of genomicresources over the last two decades. These investments have led to the development of the largest biorepository for psychiatric genetics as a centralized national resource. In the realm of genomicresources, NIMH has been supporting large team science (TS) consortia focused on gene discovery, fine mapping of loci, and functionalgenomics using state-of-the-art technologies. The scientific output from these efforts has not only begun to transform our understanding of the genetic architecture of neuropsychiatric disorders, but it has also led to a broader cultural change among the investigator community towards deeper collaborations and broad pre-publication sharing of data and resources. The NIMH supported efforts have led to a vast increase in the amount of genetic and genomicresources available to the mental health research community. Here we provide an account of the existing resources and estimates of the scale and scope of what will be available in the near future. All biosamples and data described are intended for broad sharing with researchers worldwide, as allowed by the subject consent and applicable laws.Molecular Psychiatry advance online publication, 21 March 2017; doi:10.1038/mp.2017.29.

Gramene (http://www.gramene.org) is an online resource for comparative functionalgenomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

Gramene (http://www.gramene.org) is an online resource for comparative functionalgenomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.

During the genomic era, a large amount of whole-genome sequences accumulated, which identified many hypothetical proteins of unknown function. Rapidly, functionalgenomics, which is the research domain that assign a function to a given gene product, has thus been developed. Functionalgenomics of intracellular pathogenic bacteria exhibit specific peculiarities due to the fastidious growth of most of these intracellular micro-organisms, due to the close interaction with the host cell, due to the risk of contamination of experiments with host cell proteins and, for some strict intracellular bacteria such as Chlamydia, due to the absence of simple genetic system to manipulate the bacterial genome. To identify virulence factors of intracellular pathogenic bacteria, functionalgenomics often rely on bioinformatic analyses compared with model organisms such as Escherichia coli and Bacillus subtilis. The use of heterologous expression is another common approach. Given the intracellular lifestyle and the many effectors that are used by the intracellular bacteria to corrupt host cell functions, functionalgenomics is also often targeting the identification of new effectors such as those of the T4SS of Brucella and Legionella.

Summary: 1. Analyses based on utilization distributions (UDs) have been ubiquitous in animal space use studies, largely because they are computationally straightforward and relatively easy to employ. Conventional applications of resource utilization functions (RUFs) suggest that estimates of UDs can be used as response variables in a regression involving spatial covariates of interest. 2. It has been claimed that contemporary implementations of RUFs can yield inference about resource selection, although to our knowledge, an explicit connection has not been described. 3. We explore the relationships between RUFs and resource selection functions from a hueristic and simulation perspective. We investigate several sources of potential bias in the estimation of resource selection coefficients using RUFs (e.g. the spatial covariance modelling that is often used in RUF analyses). 4. Our findings illustrate that RUFs can, in fact, serve as approximations to RSFs and are capable of providing inference about resource selection, but only with some modification and under specific circumstances. 5. Using real telemetry data as an example, we provide guidance on which methods for estimating resource selection may be more appropriate and in which situations. In general, if telemetry data are assumed to arise as a point process, then RSF methods may be preferable to RUFs; however, modified RUFs may provide less biased parameter estimates when the data are subject to location error.

Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

Fungal infections are challenging to diagnose and often difficult to treat, with only a handful of drug classes existing. Understanding the molecular mechanisms by which pathogenic fungi cause human disease is imperative. Here, we discuss how the development and use of genome-scale genetic resources, such as whole-genome knockout collections, can address this unmet need. Using work in Saccharomcyes cerevisiae as a guide, studies of Cryptococcus neoformans and Candida albicans have shown how the challenges of large-scale gene deletion can be overcome, and how such collections can be effectively used to obtain insights into mechanisms of pathogenesis. We conclude that, with concerted efforts, full genome-wide functional analysis of human fungal pathogen genomes is within reach. PMID:25377143

MycoCosm is a web-based interactive fungal genomicsresource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/month or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data—previously only browseable through our FTP site—by focusing on particular samples, populations or data sets of interest. PMID:27638885

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest.

Flatfishes are a group of teleosts of high commercial and environmental interest, whose biology is still poorly understood. The recent rapid development of different 'omic' technologies is, however, enhancing the knowledge of the complex genetic control underlying different physiological processes of flatfishes. This review describes the different functionalgenomic approaches and resources currently available for flatfish research and summarizes different areas where microarray-based gene expression analysis has been applied. The increase in genome sequencing data has also allowed the construction of genetic linkage maps in different flatfish species; these maps are invaluable for investigating genome organization and identifying genetic traits of commercial interest. Despite the significant progress in this field, the genomicresources currently available for flatfish are still scarce. Further intensive research should be carried out to develop larger genomic sequence databases, high-density microarrays and, more detailed, complete linkage maps, using second-generation sequencing platforms. These tools will be crucial for further expanding the knowledge of flatfish physiology, and it is predicted that they will have important implications for wild fish population management, improved fish welfare and increased productivity in aquaculture.

The Kiwifruit Information Resource (KIR) is dedicated to maintain and integrate comprehensive datasets on genomics, functionalgenomics and transcriptomics of kiwifruit (Actinidiaceae). KIR serves as a central access point for existing/new genomic and genetic data. KIR also provides researchers with a variety of visualization and analysis tools. Current developments include the updated genome structure of Actinidia chinensis cv. Hongyang and its newest genome annotation, putative transcripts, gene expression, physical markers of genetic traits as well as relevant publications based on the latest genome assembly. Nine thousand five hundred and forty-seven new transcripts are detected and 21 132 old transcripts are changed. At the present release, the next-generation transcriptome sequencing data has been incorporated into gene models and splice variants. Protein-protein interactions are also identified based on experimentally determined orthologous interactions. Furthermore, the experimental results reported in peer-reviewed literature are manually extracted and integrated within a well-developed query page. In total, 122 identifications are currently associated, including commonly used gene names and symbols. All KIR datasets are helpful to facilitate a broad range of kiwifruit research topics and freely available to the research community. Database URL: http://bdg.hfut.edu.cn/kir/index.html.

Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomicresources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

Mulberry is an important cultivated plant that has received the attention of biologists interested in sericulture and plant-insect interaction. Morus notabilis, a wild mulberry species with a minimal chromosome number is an ideal material for whole-genome sequencing and assembly. The genome and transcriptome of M. notabilis were sequenced and analyzed. In this article, a web-based and open-access database, the Morus Genome Database (MorusDB), was developed to enable easy-to-access and data mining. The MorusDB provides an integrated data source and an easy accession of mulberry large-scale genomic sequencing and assembly, predicted genes and functional annotations, expressed sequence tags (ESTs), transposable elements (TEs), Gene Ontology (GO) terms, horizontal gene transfers between mulberry and silkworm and ortholog and paralog groups. Transcriptome sequencing data for M. notabilis root, leaf, bark, winter bud and male flower can also be searched and downloaded. Furthermore, MorusDB provides an analytical workbench with some built-in tools and pipelines, such as BLAST, Search GO, Mulberry GO and Mulberry GBrowse, to facilitate genomic studies and comparative genomics. The MorusDB provides important genomicresources for scientists working with mulberry and other Moraceae species, which include many important fruit crops. Designed as a basic platform and accompanied by the SilkDB, MorusDB strives to be a comprehensive platform for the silkworm-mulberry interaction studies. Database URL: http://morus.swu.edu.cn/morusdb.

Microbial diseases remain the commonest cause of global mortality and morbidity. Automated-DNA sequencing has revolutionized the investigation of pathogenic microbes by making the immense fund of information contained in their genomes available at reasonable cost. The challenge is how this information can be used to increase current understanding of the biology of commensal and virulence behaviour of pathogens with particular emphasis on in vivo function and novel approaches to prevention. One example of the application of whole-genome-sequence information is afforded by investigations of the pathogenic role of Haemophilus influenzae lipopolysaccharide and its candidacy as a vaccine. PMID:11839188

Among the legume family, mungbean (Vigna radiata) has become one of the important crops in Asia, showing a steady increase in global production. It provides a good source of protein and contains most notably folate and iron. Beyond the nutritional value of mungbean, certain features make it a well-suited model organism among legume plants because of its small genome size, short life-cycle, self-pollinating, and close genetic relationship to other legumes. In the past, there have been several efforts to develop molecular markers and linkage maps associated with agronomic traits for the genetic improvement of mungbean and, ultimately, breeding for cultivar development to increase the average yields of mungbean. The recent release of a reference genome of the cultivated mungbean (V. radiata var. radiata VC1973A) and an additional de novo sequencing of a wild relative mungbean (V. radiata var. sublobata) has provided a framework for mungbean genetic and genome research, that can further be used for genome-wide association and functional studies to identify genes related to specific agronomic traits. Moreover, the diverse gene pool of wild mungbean comprises valuable genetic resources of beneficial genes that may be helpful in widening the genetic diversity of cultivated mungbean. This review paper covers the research progress on molecular and genomics approaches and the current status of breeding programs that have developed to move toward the ultimate goal of mungbean improvement. PMID:26322067

As research on parasitic helminths is moving into the post-genomic era, an enormous effort is directed towards deciphering gene function and to achieve gene annotation. The sequences that are available in public databases undoubtedly hold information that can be utilized for new interventions and control but the exploitation of these resources has until recently remained difficult. Only now, with the emergence of methods to genetically manipulate and transform parasitic worms will it be possible to gain a comprehensive understanding of the molecular mechanisms involved in nutrition, metabolism, developmental switches/maturation and interaction with the host immune system. This review focuses on functionalgenomics approaches in parasitic helminths that are currently used, to highlight potential applications of these technologies in the areas of cell biology, systems biology and immunobiology of parasitic helminths.

This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy.

Mammalian gestation and pregnancy are fast evolving processes that involve the interaction of the fetal, maternal and paternal genomes. Version 1.0 of the GEneSTATION database (http://genestation.org) integrates diverse types of omics data across mammals to advance understanding of the genetic basis of gestation and pregnancy-associated phenotypes and to accelerate the translation of discoveries from model organisms to humans. GEneSTATION is built using tools from the Generic Model Organism Database project, including the biology-aware database CHADO, new tools for rapid data integration, and algorithms that streamline synthesis and user access. GEneSTATION contains curated life history information on pregnancy and reproduction from 23 high-quality mammalian genomes. For every human gene, GEneSTATION contains diverse evolutionary (e.g. gene age, population genetic and molecular evolutionary statistics), organismal (e.g. tissue-specific gene and protein expression, differential gene expression, disease phenotype), and molecular data types (e.g. Gene Ontology Annotation, protein interactions), as well as links to many general (e.g. Entrez, PubMed) and pregnancy disease-specific (e.g. PTBgene, dbPTB) databases. By facilitating the synthesis of diverse functional and evolutionary data in pregnancy-associated tissues and phenotypes and enabling their quick, intuitive, accurate and customized meta-analysis, GEneSTATION provides a novel platform for comprehensive investigation of the function and evolution of mammalian pregnancy. PMID:26567549

The genomes of dozens of placental mammal species are now publicly available. These genome sequences have the potential to provide insight into the development and evolution of the placenta. In particular, the variable anatomy of the placenta has likely been affected by natural selection on the genomes of living and extinct mammals. In this note the current availability of mammal genome sequences is reviewed, and strengths and limitations of these data are discussed. Additionally, museums, zoos, and commercial entities are available to provide genomicresources to the placental research community. Recommendations for tissue storage conditions of placentas in genomic research are given. PMID:18155141

As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protein function prediction using structure and sequence based classification of protein domain family resources with a special focus on functional families in the CATH-Gene3D resource.

One of the challenges of functionalgenomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and

Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al. 2009). Here we demonstrate a freely available Internet resource -- the Genomic MRI program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al. 2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition. PMID:21610667

Intronless genes (IGs) fraction varies between 2.7 and 97.7% in eukaryotic genomes. Although many databases on exons and introns exist, there was no curated database for such genes which allowed their study in a concerted manner. Such a database would be useful to identify the functional features and the distribution of these genes across the genome. Here, a new database of IGs in eukaryotes based on GenBank data was described. This database, called IGD (Intronless Gene Database), is a collection of gene sequences that were annotated and curated. The current version of IGD contains 687 human intronless genes with their protein and CDS sequences. Some features of the entries are given in this paper. Data was extracted from GenBank release 183 using a Perl script. Data extraction was followed by a manual curation step. Intronless genes were then analyzed based on their RefSeq annotation and Gene Ontology functional class. IGD represents a useful resource for retrieval and in silico study of intronless genes. IGD is available at http://www.bioinfo-cbs.org/igd with comprehensive help and FAQ pages that illustrate the main uses of this resource.

Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html.

These past few years of scientific discovery will undoubtedly be remembered as the "genomics era," the period in which biologists succeeded in enumerating the sequence of nucleotides making up all, or at least most, of human DNA. And while this achievement has been heralded as a technological feat equal to the moon landing, it is only the first of many advances in DNA technology. Scientists are now faced with the task of understanding the meaning of the DNA sequence. Specifically, they want to learn how the DNA code relates to protein function. An important tool in the study of "functionalgenomics," is the cDNA microarray—also known as the gene chip. Inspired by computer microchips, gene chips allow scientists to monitor the expression of hundreds, even thousands, of genes in a fraction of the time it used to take to monitor the expression of a single one. By altering the conditions under which a particular tissue expresses genes—say, by exposing it to toxins or growth factors—scientists can determine the suite of genes expressed in different situations and hence start to get a handle on the function of these genes. The authors discuss this important new technology and some of its practical applications.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomicresources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomicresources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended.

Genetic improvement in industrially important guar (Cyamopsis tetragonoloba, L. Taub.) crop has been hindered due to the lack of sufficient genomic or transcriptomic resources. In this study, RNA-Seq technology was employed to characterize the transcriptome of leaf tissues from two guar varieties, namely, M-83 and RGC-1066. Approximately 30 million high-quality pair-end reads of each variety generated by Illumina HiSeq platform were used for de novo assembly by Trinity program. A total of 62,146 non-redundant unigenes with an average length of 679 bp were obtained. The quality assessment of assembled unigenes revealed 87.50% of complete and 97.18% partial core eukaryotic genes (CEGs). Sequence similarity analyses and annotation of the unigenes against non-redundant protein (Nr) and Gene Ontology (GO) databases identified 175,882 GO annotations. A total of 11,308 guar unigenes were annotated with various enzyme codes (EC) and categorized in six categories with 55 subclasses. The annotation of biochemical pathways resulted in a total of 11,971 unigenes assigned with 145 KEGG maps and 1759 enzyme codes. The species distribution analysis of the unigenes showed highest similarity with Glycine max genes. A total of 5773 potential simple sequence repeats (SSRs) and 3594 high-quality single nucleotide polymorphisms (SNPs) were identified. Out of 20 randomly selected SSRs for wet laboratory validation, 13 showed consistent PCR amplification in both guar varieties. In silico studies identified 145 polymorphic SSR markers in two varieties. To the best of our knowledge, this is the first report on transcriptome analysis and SNPs identification in guar till date. PMID:28210265

Bovine respiratory disease (BRD) is the most common economically important disease affecting cattle. For developing accurate diagnostics that can predict disease susceptibility/resistance and stratification, it is necessary to identify the molecular mechanisms that underlie BRD. To study the complex interactions among the bovine host and the multitude of viral and bacterial pathogens, as well as the environmental factors associated with BRD etiology, genome-scale high-throughput functionalgenomics methods such as microarrays, RNA-seq, and proteomics are helpful. In this review, we summarize the progress made in our understanding of BRD using functionalgenomics approaches. We also discuss some of the available bioinformatics resources for analyzing high-throughput data, in the context of biological pathways and molecular interactions. Although resources for studying host response to infection are avail-able, the corresponding information is lacking for majority of BRD pathogens, impeding progress in identifying diagnostic signatures for BRD using functionalgenomics approaches. PMID:26526746

Bovine respiratory disease (BRD) is the most common economically important disease affecting cattle. For developing accurate diagnostics that can predict disease susceptibility/resistance and stratification, it is necessary to identify the molecular mechanisms that underlie BRD. To study the complex interactions among the bovine host and the multitude of viral and bacterial pathogens, as well as the environmental factors associated with BRD etiology, genome-scale high-throughput functionalgenomics methods such as microarrays, RNA-seq, and proteomics are helpful. In this review, we summarize the progress made in our understanding of BRD using functionalgenomics approaches. We also discuss some of the available bioinformatics resources for analyzing high-throughput data, in the context of biological pathways and molecular interactions. Although resources for studying host response to infection are avail-able, the corresponding information is lacking for majority of BRD pathogens, impeding progress in identifying diagnostic signatures for BRD using functionalgenomics approaches.

Almost a century ago, Wittgenstein pointed out that theory in science is intricately connected to language. This connection is not a frequent topic in the genomics literature. But a case can be made that functionalgenomics is today hindered by the paradoxes that Wittgenstein identified. If this is true, until these paradoxes are recognized and addressed, functionalgenomics will continue to be limited in its ability to extrapolate information from genomic sequences.

Almost a century ago, Wittgenstein pointed out that theory in science is intricately connected to language. This connection is not a frequent topic in the genomics literature. But a case can be made that functionalgenomics is today hindered by the paradoxes that Wittgenstein identified. If this is true, until these paradoxes are recognized and addressed, functionalgenomics will continue to be limited in its ability to extrapolate information from genomic sequences.

CottonDB (http://cottondb.org/) is a database and web resource for cotton genomic and genetic research. Created in 1995, CottonDB was among the first plant genome databases established by the USDA-ARS. Accessed through a website interface, the database aims to be a convenient, inclusive medium of ...

A grand challenge in the post-genomic era is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity of cellular processes and organism behavior from genomic information. Toward this end we have been developing a knowledge-based approach for network prediction, which is to predict, given a complete set of genes in the genome, the protein interaction networks that are responsible for various cellular processes. KEGG at http://www.genome.ad.jp/kegg/ is the reference knowledge base that integrates current knowledge on molecular interaction networks such as pathways and complexes (PATHWAY database), information about genes and proteins generated by genome projects (GENES/SSDB/KO databases) and information about biochemical compounds and reactions (COMPOUND/GLYCAN/REACTION databases). These three types of database actually represent three graph objects, called the protein network, the gene universe and the chemical universe. New efforts are being made to abstract knowledge, both computationally and manually, about ortholog clusters in the KO (KEGG Orthology) database, and to collect and analyze carbohydrate structures in the GLYCAN database.

The FunctionalGenomics Initiative at the Oak Ridge National Laboratory integrates outstanding capabilities in mouse genetics, bioinformatics, and instrumentation. The 50 year investment by the DOE in mouse genetics/mutagenesis has created a one-of-a-kind resource for generating mutations and understanding their biological consequences. It is generally accepted that, through the mouse as a surrogate for human biology, we will come to understand the function of human genes. In addition to this world class program in mammalian genetics, ORNL has also been a world leader in developing bioinformatics tools for the analysis, management and visualization of genomic data. Combining this expertise with new instrumentation technologies will provide a unique capability to understand the consequences of mutations in the mouse at both the organism and molecular levels. The goal of the FunctionalGenomics Initiative is to develop the technology and methodology necessary to understand gene function on a genomic scale and apply these technologies to megabase regions of the human genome. The effort is scoped so as to create an effective and powerful resource for functionalgenomics. ORNL is partnering with the Joint Genome Institute and other large scale sequencing centers to sequence several multimegabase regions of both human and mouse genomic DNA, to identify all the genes in these regions, and to conduct fundamental surveys to examine gene function at the molecular and organism level. The Initiative is designed to be a pilot for larger scale deployment in the post-genome era. Technologies will be applied to the examination of gene expression and regulation, metabolism, gene networks, physiology and development.

The genus Drosophila has been the subject of intense comparative phylogenomics characterization to provide insights into genome evolution under diverse biological and ecological contexts and to functionally annotate the Drosophila melanogaster genome, a model system for animal and insect genetics. Recent sequencing of 11 additional Drosophila species from various divergence points of the genus is a first step in this direction. However, to fully reap the benefits of this resource, the Drosophila community is faced with two critical needs: i.e., the expansion of genomicresources from a much broader range of phylogenetic diversity and the development of additional resources to aid in finishing the existing draft genomes. To address these needs, we report the first synthesis of a comprehensive set of bacterial artificial chromosome (BAC) resources for 19 Drosophila species from all three subgenera. Ten libraries were derived from the exact source used to generate 10 of the 12 draft genomes, while the rest were generated from a strategically selected set of species on the basis of salient ecological and life history features and their phylogenetic positions. The majority of the new species have at least one sequenced reference genome for immediate comparative benefit. This 19-BAC library set was rigorously characterized and shown to have large insert sizes (125–168 kb), low nonrecombinant clone content (0.3–5.3%), and deep coverage (9.1–42.9×). Further, we demonstrated the utility of this BAC resource for generating physical maps of targeted loci, refining draft sequence assemblies and identifying potential genomic rearrangements across the phylogeny. PMID:21321134

The genus Drosophila has been the subject of intense comparative phylogenomics characterization to provide insights into genome evolution under diverse biological and ecological contexts and to functionally annotate the Drosophila melanogaster genome, a model system for animal and insect genetics. Recent sequencing of 11 additional Drosophila species from various divergence points of the genus is a first step in this direction. However, to fully reap the benefits of this resource, the Drosophila community is faced with two critical needs: i.e., the expansion of genomicresources from a much broader range of phylogenetic diversity and the development of additional resources to aid in finishing the existing draft genomes. To address these needs, we report the first synthesis of a comprehensive set of bacterial artificial chromosome (BAC) resources for 19 Drosophila species from all three subgenera. Ten libraries were derived from the exact source used to generate 10 of the 12 draft genomes, while the rest were generated from a strategically selected set of species on the basis of salient ecological and life history features and their phylogenetic positions. The majority of the new species have at least one sequenced reference genome for immediate comparative benefit. This 19-BAC library set was rigorously characterized and shown to have large insert sizes (125-168 kb), low nonrecombinant clone content (0.3-5.3%), and deep coverage (9.1-42.9×). Further, we demonstrated the utility of this BAC resource for generating physical maps of targeted loci, refining draft sequence assemblies and identifying potential genomic rearrangements across the phylogeny.

Fungal research and education has for many years been supported by public service genetic resource centres, whose roles have been to maintain, preserve and supply living cultures to the research community. In the genomic era, genetic resource centres are perhaps more important than ever before. The cultures held, many of which are described and validated by expert biosystematists, are valuable resources for the future. There is a need to supply genomic and proteomic research programmes with fully characterised organisms, as usage of organisms from unreliable sources can prove disastrous, not least in economical terms. However, mycologists often require more than just the organisms, for example, their associated information is vital for bioinformatic applications and some researchers may only require genomic DNA from the organism rather than the organism per se. Genetic resource centres are continually adapting to meet the needs of their users and the wider mycological research community, this associated with OECD international initiatives should ensure they exist to support research for many years to come. This review considers the impact of such initiatives, the current roles of fungal genetic resource centres, the mechanisms used to preserve organisms in a stable manner and the range of resources that are offered for genomic research.

Cellular processes mediated through nuclear DNA must contend with chromatin. Chromatin structural assays can efficiently integrate information across diverse regulatory elements, revealing the functional noncoding genome. In this study, we use a differential nuclease sensitivity assay based on micrococcal nuclease (MNase) digestion to discover open chromatin regions in the maize genome. We find that maize MNase-hypersensitive (MNase HS) regions localize around active genes and within recombination hotspots, focusing biased gene conversion at their flanks. Although MNase HS regions map to less than 1% of the genome, they consistently explain a remarkably large amount (∼40%) of heritable phenotypic variance in diverse complex traits. MNase HS regions are therefore on par with coding sequences as annotations that demarcate the functional parts of the maize genome. These results imply that less than 3% of the maize genome (coding and MNase HS regions) may give rise to the overwhelming majority of phenotypic variation, greatly narrowing the scope of the functionalgenome. PMID:27185945

The microbial genome database (MBGD) for comparative analysis is a platform for microbial comparative genomics based on automated ortholog group identification. A prominent feature of MBGD is that it allows users to create ortholog groups using a specified subgroup of organisms. The database is constantly updated and now contains almost 1000 genomes. To utilize the MBGD database as a comprehensive resource for investigating microbial genome diversity, we have developed the following advanced functionalities: (i) enhanced assignment of functional annotation, including external database links to each orthologous group, (ii) interface for choosing a set of genomes to compare based on phenotypic properties, (iii) the addition of more eukaryotic microbial genomes (fungi and protists) and some higher eukaryotes as references and (iv) enhancement of the MyMBGD mode, which allows users to add their own genomes to MBGD and now accepts raw genomic sequences without any annotation (in such a case, it runs a gene-finding procedure before identifying the orthologs). Some analysis functions, such as the function to find orthologs with similar phylogenetic patterns, have also been improved. MBGD is accessible at http://mbgd.genome.ad.jp/.

Background The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functionalgenomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. Description To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. Conclusion The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database

PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species. PMID:26476449

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

We have generated a BAC library from the Indonesian coelacanth, Latimeria menadoensis. This library was generated using genomic DNA of nuclei isolated from heart tissue, and has an average insert size of 171 kb. There are a total of 288 384-well microtiter dishes in the library (110,592 clones) and its genomic representation is estimated to encompass > or = 7X coverage based on the amount of DNA presumably cloned in the library as well as via hybridization with probes to a small set of single copy genes. This genomicresource has been made available to the public and should prove useful to the scientific community for many applications, including comparative genomics, molecular evolution and conservation genetics.

Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomicresources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomicresources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives

Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functionalgenomics.

Flatfish have a high market acceptance thus representing a profitable aquaculture production. The main farmed species is the turbot (Scophthalmus maximus) followed by Japanese flounder (Paralichthys olivaceous) and tongue sole (Cynoglossus semilaevis), but other species like Atlantic halibut (Hippoglossus hippoglossus), Senegalese sole (Solea senegalensis) and common sole (Solea solea) also register an important production and are very promising for farming. Important genomicresources are available for most of these species including whole genome sequencing projects, genetic maps and transcriptomes. In this work, we integrate all available genomic information of these species within a common framework, taking as reference the whole assembled genomes of turbot and tongue sole (>210× coverage). New insights related to the genetic basis of productive traits and new data useful to understand the evolutionary origin and diversification of this group were obtained. Despite a general 1:1 chromosome syntenic relationship between species, the comparison of turbot and tongue sole genomes showed huge intrachromosomic reorganizations. The integration of available mapping information supported specific chromosome fusions along flatfish evolution and facilitated the comparison between species of previously reported genetic associations for productive traits. When comparing transcriptomic resources of the six species, a common set of ~2500 othologues and ~150 common miRNAs were identified, and specific sets of putative missing genes were detected in flatfish transcriptomes, likely reflecting their evolutionary diversification.

Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at . PMID:15357877

Background Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories. Results We propose a factor analysis model (FAM) for functionalgenomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functionsgenomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance. Conclusion Our factor analysis model is a computationally efficient technique for functionalgenomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions. PMID:16630343

This week, Scientific Data published a collection of eight papers that describe datasets from high-throughput functionalgenomics screens, primarily utilizing RNA interference (RNAi). The publications explore host-pathogen dependencies, innate immune response, disease pathways, and cell morphology and motility at the genome-level. All data, including raw images from the high content screens, are publically available in PubChem BioAssay, figshare, Harvard Dataverse or the Image Data Resource (IDR). Detailed data descriptors enable use of these data for analysis algorithm design, machine learning, data comparisons, as well as generating new scientific hypotheses. PMID:28248922

This week, Scientific Data published a collection of eight papers that describe datasets from high-throughput functionalgenomics screens, primarily utilizing RNA interference (RNAi). The publications explore host-pathogen dependencies, innate immune response, disease pathways, and cell morphology and motility at the genome-level. All data, including raw images from the high content screens, are publically available in PubChem BioAssay, figshare, Harvard Dataverse or the Image Data Resource (IDR). Detailed data descriptors enable use of these data for analysis algorithm design, machine learning, data comparisons, as well as generating new scientific hypotheses.

Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomicresources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/. PMID:24466021

Genome editing with engineered nucleases (zinc finger nucleases, TAL effector nucleases s and Clustered regularly inter-spaced short palindromic repeats/CRISPR-associated) has recently been shown to have great promise in a variety of therapeutic and biotechnological applications. However, their exploitation in genetic analysis and clinical settings largely depends on their specificity for the intended genomic target. Large and complex genomes often contain highly homologous/repetitive sequences, which limits the specificity of genome editing tools and could result in off-target activity. Over the past few years, various computational approaches have been developed to assist the design process and predict/reduce the off-target activity of these nucleases. These tools could be efficiently used to guide the design of constructs for engineered nucleases and evaluate results after genome editing. This review provides a comprehensive overview of various databases, tools, web servers and resources for genome editing and compares their features and functionalities. Additionally, it also describes tools that have been developed to analyse post-genome editing results. The article also discusses important design parameters that could be considered while designing these nucleases. This review is intended to be a quick reference guide for experimentalists as well as computational biologists working in the field of genome editing with engineered nucleases.

With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomicresources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production

With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomicresources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production.

In the frame of modern agriculture facing the predicted increase of population and general environmental changes, the securement of high quality food remains a major challenge to deal with. Vegetable crops include a large number of species, characterized by multiple geographical origins, large genetic variability and diverse reproductive features. Due to their nutritional value, they have an important place in human diet. In recent years, many crop genomes have been sequenced permitting the identification of genes and superior alleles associated with desirable traits. Furthermore, innovative biotechnological approaches allow to take a step forward towards the development of new improved cultivars harboring precise genome modifications. Sequence-based knowledge coupled with advanced biotechnologies is supporting the widespread application of new plant breeding techniques to enhance the success in modification and transfer of useful alleles into target varieties. Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 system, zinc-finger nucleases, and transcription activator-like effector nucleases represent the main methods available for plant genome engineering through targeted modifications. Such technologies, however, require efficient transformation protocols as well as extensive genomicresources and accurate knowledge before they can be efficiently exploited in practical breeding programs. In this review, we revise the state of the art in relation to availability of such scientific and technological resources in various groups of vegetables, describe genome editing results obtained so far and discuss the implications for future applications. PMID:28275380

The pronouncements of the ENCODE Project Consortium regarding "junk DNA" exposed the need for an evolutionary classification of genomic elements according to their selected-effect function. In the classification scheme presented here, we divide the genome into "functional DNA," that is, DNA sequences that have a selected-effect function, and "rubbish DNA," that is, sequences that do not. Functional DNA is further subdivided into "literal DNA" and "indifferent DNA." In literal DNA, the order of nucleotides is under selection; in indifferent DNA, only the presence or absence of the sequence is under selection. Rubbish DNA is further subdivided into "junk DNA" and "garbage DNA." Junk DNA neither contributes to nor detracts from the fitness of the organism and, hence, evolves under selective neutrality. Garbage DNA, on the other hand, decreases the fitness of its carriers. Garbage DNA exists in the genome only because natural selection is neither omnipotent nor instantaneous. Each of these four functional categories can be 1) transcribed and translated, 2) transcribed but not translated, or 3) not transcribed. The affiliation of a DNA segment to a particular functional category may change during evolution: Functional DNA may become junk DNA, junk DNA may become garbage DNA, rubbish DNA may become functional DNA, and so on; however, determining the functionality or nonfunctionality of a genomic sequence must be based on its present status rather than on its potential to change (or not to change) in the future. Changes in functional affiliation are divided into pseudogenes, Lazarus DNA, zombie DNA, and Jekyll-to-Hyde DNA.

The pronouncements of the ENCODE Project Consortium regarding “junk DNA” exposed the need for an evolutionary classification of genomic elements according to their selected-effect function. In the classification scheme presented here, we divide the genome into “functional DNA,” that is, DNA sequences that have a selected-effect function, and “rubbish DNA,” that is, sequences that do not. Functional DNA is further subdivided into “literal DNA” and “indifferent DNA.” In literal DNA, the order of nucleotides is under selection; in indifferent DNA, only the presence or absence of the sequence is under selection. Rubbish DNA is further subdivided into “junk DNA” and “garbage DNA.” Junk DNA neither contributes to nor detracts from the fitness of the organism and, hence, evolves under selective neutrality. Garbage DNA, on the other hand, decreases the fitness of its carriers. Garbage DNA exists in the genome only because natural selection is neither omnipotent nor instantaneous. Each of these four functional categories can be 1) transcribed and translated, 2) transcribed but not translated, or 3) not transcribed. The affiliation of a DNA segment to a particular functional category may change during evolution: Functional DNA may become junk DNA, junk DNA may become garbage DNA, rubbish DNA may become functional DNA, and so on; however, determining the functionality or nonfunctionality of a genomic sequence must be based on its present status rather than on its potential to change (or not to change) in the future. Changes in functional affiliation are divided into pseudogenes, Lazarus DNA, zombie DNA, and Jekyll-to-Hyde DNA. PMID:25635041

Every cellular process mediated through nuclear DNA must contend with chromatin. As results from ENCODE show, open chromatin assays can efficiently integrate across diverse regulatory elements, revealing functional non-coding genome. In this study, we use a MNase hypersensitivity assay to discover o...

Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops. PMID:25923345

Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

Using next-generation sequencing, we developed the first whole-genomeresources for two hybridizing Nothofagus species of the Patagonian forests that crucially lack genomic data, despite their ecological and industrial value. A de novo assembly strategy combining base quality control and optimization of the putative chloroplast gene map yielded ~32,000 contigs from 43% of the reads produced. With 12.5% of assembled reads, we covered ~96% of the chloroplast genome and ~70% of the mitochondrial gene content, providing functional and structural annotations for 112 and 52 genes, respectively. Functional annotation was possible on 15% of the contigs, with ~1750 potentially novel nuclear genes identified for Nothofagus species. We estimated that the new resources (13.41 Mb in total) included ~4000 gene regions representing ~6.5% of the expected genic partition of the genome, the remaining contigs potentially being nongenic DNA. A high-quality single nucleotide polymorphisms resource was developed by comparing various filtering methods, and preliminary results indicate a strong conservation of cpDNA genomes in contrast to numerous exclusive nuclear polymorphisms in both species. Finally, we characterized 2274 potential simple sequence repeat (SSR) loci, designed primers for 769 of them and validated nine of 29 loci in 42 individuals per species. Nothofagus obliqua had more alleles (4.89) on average than N. nervosa (2.89), 8 SSRs were efficient to discriminate species, and three were successfully transferred in three other Nothofagus species. These resources will greatly help for future inferences of demographic, adaptive and hybridizing events in Nothofagus species, and for conserving and managing natural populations.

With the predicted trends in climate change, drought will increasingly impose a grand challenge to biomass production. Most of the bioenergy crops have some degree of drought susceptibility with low water-use efficiency (WUE). It is imperative to improve drought tolerance and WUE in bioenergy crops for sustainable biomass production in arid and semi-arid regions with minimal water input. Genetics and functionalgenomics can play a critical role in generating knowledge to inform and aid genetic improvement of drought tolerance in bioenergy crops. The molecular aspect of drought response has been extensively investigated in model plants like Arabidopsis, yet our understanding of the molecular mechanisms underlying drought tolerance in bioenergy crops are limited. Crops exhibit various responses to drought stress depending on species and genotype. A rational strategy for studying drought tolerance in bioenergy crops is to translate the knowledge from model plants and pinpoint the unique features associated with individual species and genotypes. In this review, we summarize the general knowledge about drought responsive pathways in plants, with a focus on the identification of commonality and specialty in drought responsive mechanisms among different species and/or genotypes. We describe the genomicresources developed for bioenergy crops and discuss genetic and epigenetic regulation of drought responses. We also examine comparative and evolutionary genomics to leverage the ever-increasing genomicsresources and provide new insights beyond what has been known from studies on individual species. Finally, we outline future exploration of drought tolerance using the emerging new technologies.

To facilitate the practical application of highly efficient semi-automated methods for general application in genomic analyses, we have developed a fluorescence-based marker resource. Ninety highly polymorphic simple tandem repeat markers were combined to provide a rapid, accurate, and highly efficient initial genome-wide screening system. These markers are spaced on average every 33 recombination units, with a mean heterozygosity of 81% (range 65-94%), covering 22 autosomes and the X and Y chromosomes. Less than 3% of the genome lies beyond 30 cM of the nearest marker. Markers were placed in a vertical ladder that we have termed a SET according to the size of the PCR fragments they produce during electrophoresis. Each SET was designed to avoid overlap between loci during gel separations to assure accuracy when scoring genotypes. We have constructed 15 SETS of markers. Three SETS, each labelled with one of three fluors, were combined into what we have termed a GROUP, which is co-electrophoresed with internal size standards that are labelled with a fourth flour. Five GROUPS of markers were assembled that contain a total of 15 SETS of markers. Each GROUP cover 18 regions of the genome that can be detected simultaneously, since this genomic analysis system is fully compatible with automated fragment analyzers using simultaneous four-color fluorescence-based detection systems. This allows for multiplex detection and a throughput of 1,944 genotypes daily per instrument. This system will be highly beneficial in a number of clinical and research applications including: linkage, cancer genetics, forensics, and cytogenetics.

This ERIC Digest identifies how the human genome project fits into the "National Science Education Standards" and lists Human Genome Project Web sites found on the World Wide Web. It is a resource companion to "Learning about the Human Genome. Part 1: Challenge to Science Educators" (Haury 2001). The Web resources and…

Rice, one of the most important cereal crops for mankind, feeds more than half the world population. Rice has been heralded as a model cereal owing to its small genome size, amenability to easy transformation, high synteny to other cereal crops and availability of complete genome sequence. Moreover, sequence wealth in rice is getting more refined and precise due to resequencing efforts. This humungous resource of sequence data has confronted research fraternity with a herculean challenge as well as an excellent opportunity to functionally validate expressed as well as regulatory portions of the genome. This will not only help us in understanding the genetic basis of plant architecture and physiology but would also steer us towards developing improved cultivars. No single technique can achieve such a mammoth task. Functionalgenomics through its diverse tools viz. loss and gain of function mutants, multifarious omics strategies like transcriptomics, proteomics, metabolomics and phenomics provide us with the necessary handle. A paradigm shift in technological advances in functionalgenomics strategies has been instrumental in generating considerable amount of information w.r.t functionality of rice genome. We now have several databases and online resources for functionally validated genes but despite that we are far from reaching the desired milestone of functionally characterizing each and every rice gene. There is an urgent need for a common platform, for information already available in rice, and collaborative efforts between researchers in a concerted manner as well as healthy public-private partnership, for genetic improvement of rice crop better able to handle the pressures of climate change and exponentially increasing population. PMID:27252584

Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functionalgenomics studies in yeast. PMID:23852385

Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functionalgenomics studies in yeast.

Functionalgenomics is an experimental approach that incorporates genome-wide or system-wide experimentation, expanding the scope of biological investigation from studying single genes to studying potentially all genes at once in a systematic manner. This technology is highly appealing because of its high throughput and relatively low cost. Furthermore, analysis of gene expression using microarrays is likely to be more biologically relevant than the conventional paradigm of reductionism, because it has the potential to uncover new biological connections between genes and biochemical pathways. However, functionalgenomics is still in its infancy, especially with regard to the study of pig reproduction. Currently, efforts are centred on developing the necessary resources to enable high throughput evaluation and comparison of gene expression. However, it is clear that in the near future functionalgenomics will be applied on a large scale to study the biology and physiology of reproduction in pigs, and to understand better the complex nature of genetic control over polygenic characteristics, such as ovulation rate and litter size. We can look forward to generating a significant amount of new data on differences in gene expression between genotypes, treatments, or at various temporal and spatial coordinates within a variety of reproductively relevant systems. Along with this capability will be the challenge of collating, analysing and interpreting datasets that are orders of magnitude more extensive and complex than those currently used. Furthermore, integration of functionalgenomics with traditional genetic approaches and with detailed analysis of the proteome and relevant whole animal phenotypes will be required to make full use of this powerful new experimental paradigm as a beneficial research tool.

The Mouse Genome Database (MGD: http://www.informatics.jax.org) is the primary community data resource for the laboratory mouse. It provides a highly integrated and highly curated system offering a comprehensive view of current knowledge about mouse genes, genetic markers and genomic features as well as the associations of those features with sequence, phenotypes, functional and comparative information, and their relationships to human diseases. MGD continues to enhance access to these data, to extend the scope of data content and visualizations, and to provide infrastructure and user support that ensures effective and efficient use of MGD in the advancement of scientific knowledge. Here, we report on recent enhancements made to the resource and new features. PMID:27899570

The order Entomophthorales, which formerly contained c.280 species, has recently been recognized as a separate phylum, Entomophthoromycota, consisting of three recognized classes and six families. Many genera in this group contain obligate insect-pathogenic species with narrow host ranges, capable of producing epizootics in natural insect populations. Available sequence information from the phylum Entomophthoromycota can be classified into three main categories: first, partial gene regions (exons+introns) used for phylogenetic inference; second, protein coding gene regions obtained using degenerate primers, expressed sequence tag methodology or de novo transcriptome sequencing with molecular function inferred by homology analysis; and third, primarily forthcoming whole-genome sequencing data sets. Here we summarize the current genetic resources for Entomophthoromycota and identify research areas that are likely to be significantly advanced from the availability of new whole-genomeresources.

A specialized orchid database, named Orchidstra (URL: http://orchidstra.abrc.sinica.edu.tw), has been constructed to collect, annotate and share genomic information for orchid functionalgenomics studies. The Orchidaceae is a large family of Angiosperms that exhibits extraordinary biodiversity in terms of both the number of species and their distribution worldwide. Orchids exhibit many unique biological features; however, investigation of these traits is currently constrained due to the limited availability of genomic information. Transcriptome information for five orchid species and one commercial hybrid has been included in the Orchidstra database. Altogether, these comprise >380,000 non-redundant orchid transcript sequences, of which >110,000 are protein-coding genes. Sequences from the transcriptome shotgun assembly (TSA) were obtained either from output reads from next-generation sequencing technologies assembled into contigs, or from conventional cDNA library approaches. An annotation pipeline using Gene Ontology, KEGG and Pfam was built to assign gene descriptions and functional annotation to protein-coding genes. Deep sequencing of small RNA was also performed for Phalaenopsis aphrodite to search for microRNAs (miRNAs), extending the information archived for this species to miRNA annotation, precursors and putative target genes. The P. aphrodite transcriptome information was further used to design probes for an oligonucleotide microarray, and expression profiling analysis was carried out. The intensities of hybridized probes derived from microarray assays of various tissues were incorporated into the database as part of the functional evidence. In the future, the content of the Orchidstra database will be expanded with transcriptome data and genomic information from more orchid species.

Functionalgenomic screening has emerged as a powerful approach for understanding complex biological phenomena. Of the available tools, genome-wide RNA interference (RNAi) technology is unquestionably the most incisive, as it directly probes gene function. Recent applications of RNAi screening have been impressive. Notable amongst these are its use in elucidated mechanism(s) for signal transduction, various aspects of cell biology, tumourigenesis and metastasis, resistance to cancer therapeutics, and the host's response to a pathogen. Herein we discuss how recent RNAi screening efforts have helped turn our attention to the targetability of non-oncogene support pathways for cancer treatment, with a particular focus on a recent study that identified a non-oncogene addiction to the ER stress response as a synergist target for oncolytic virus therapy (OVT). Moreover, we give our thoughts on the future of RNAi screening as a tool to enhance OVT and describe recent technical improvements that are poised to make genome-scale RNAi experiments more sensitive, less noisy, more applicable in vivo, and more easily validated in clinically relevant animal models.

Rice [Oryza sativa (L.)] feeds more than half of the world's population. Rhizoctonia solaniis a major fungal pathogen of rice causing extreme crop losses in all rice-growing regions of the world. R. solani AG1 IA is a major cause of sheath blight in rice. In this study, we constructed a comprehensive and user-friendly web-based database, RSIADB, to analyse its draft genome and transcriptome. The database was built using the genome sequence (10,489 genes) and annotation information for R. solani AG1 IA. A total of six RNAseq samples of R. solani AG1 IA were also analysed, corresponding to 10, 18, 24, 32, 48 and 72 h after infection of rice leaves. The RSIADB database enables users to search, browse, and download gene sequences for R. solani AG1 IA, and mine the data using BLAST, Sequence Extractor, Browse and Construction Diagram tools that were integrated into the database. RSIADB is an important genomicresource for scientists working with R. solani AG1 IA and will assist researchers in analysing the annotated genome and transcriptome of this pathogen. This resource will facilitate studies on gene function, pathogenesis factors and secreted proteins, as well as provide an avenue for comparative analyses of genes expressed during different stages of infection. Database URL:http://genedenovoweb.ticp.net:81/rsia/index.php.

Genome-Wide Association Studies are widely used to correlate phenotypic traits with genetic variants. These studies usually compare the genetic variation between two groups to single out certain Single Nucleotide Polymorphisms (SNPs) that are linked to a phenotypic variation in one of the groups. However, it is necessary to have a large enough sample size to find statistically significant correlations. Direct-To-Consumer (DTC) genetic testing can supply additional data: DTC-companies offer the analysis of a large amount of SNPs for an individual at low cost without the need to consult a physician or geneticist. Over 100,000 people have already been genotyped through Direct-To-Consumer genetic testing companies. However, this data is not public for a variety of reasons and thus cannot be used in research. It seems reasonable to create a central open data repository for such data. Here we present the web platform openSNP, an open database which allows participants of Direct-To-Consumer genetic testing to publish their genetic data at no cost along with phenotypic information. Through this crowdsourced effort of collecting genetic and phenotypic information, openSNP has become a resource for a wide area of studies, including Genome-Wide Association Studies. openSNP is hosted at http://www.opensnp.org, and the code is released under MIT-license at http://github.com/gedankenstuecke/snpr. PMID:24647222

The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. PMID:27899624

The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations.

Background: Soybean Knowledge Base (SoyKB) is a comprehensive all-inclusive web resource for soybean translational genomics. SoyKB is designed to handle the management and integration of soybean genomics, transcriptomics, proteomics and metabolomics data along with annotation of gene function and biological pathway. It contains information on four entities, namely genes, microRNAs, metabolites and single nucleotide polymorphisms (SNPs). Methods: SoyKB has many useful tools such as Affymetrix probe ID search, gene family search, multiple gene/ metabolite search supporting co-expression analysis, and protein 3D structure viewer as well as download and upload capacity for experimental data and annotations. It has four tiers of registration, which control different levels of access to public and private data. It allows users of certain levels to share their expertise by adding comments to the data. It has a user-friendly web interface together with genome browser and pathway viewer, which display data in an intuitive manner to the soybean researchers, producers and consumers. Conclusions: SoyKB addresses the increasing need of the soybean research community to have a one-stop-shop functional and translational omics web resource for information retrieval and analysis in a user-friendly way. SoyKB can be publicly accessed at http://soykb.org/.

With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomicsresource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of

The availability of complete bacterial genome sequences has significantly furthered our understanding of the genetics, physiology and biochemistry of the microorganisms in question, particularly those that have commercially important applications. Bifidobacteria are among such microorganisms, as they constitute mammalian commensals of biotechnological significance due to their perceived role in maintaining a balanced gastrointestinal (GIT) microflora. Bifidobacteria are therefore frequently used as health-promoting or probiotic components in functional food products. A fundamental understanding of the metabolic activities employed by these commensal bacteria, in particular their capability to utilize a wide range of complex oligosaccharides, can reveal ways to provide in vivo growth advantages relative to other competing gut bacteria or pathogens. Furthermore, an in depth analysis of adaptive responses to nutritional or environmental stresses may provide methodologies to retain viability and improve functionality during commercial preparation, storage and delivery of the probiotic organism.

The Mouse Genome Informatics (MGI), resource ( www.informatics.jax.org ) has existed for over 25 years, and over this time its data content, informatics infrastructure, and user interfaces and tools have undergone dramatic changes (Eppig et al., Mamm Genome 26:272-284, 2015). Change has been driven by scientific methodological advances, rapid improvements in computational software, growth in computer hardware capacity, and the ongoing collaborative nature of the mouse genomics community in building resources and sharing data. Here we present an overview of the current data content of MGI, describe its general organization, and provide examples using simple and complex searches, and tools for mining and retrieving sets of data.

Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection1. Contrastingly, matchups between distant genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes revealed that the non-coding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, within each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection promoting change in the sequences influencing transmission. PMID:27270106

Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection. In contrast, matchups between distantly related genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome, leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes showed that the noncoding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, in each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection, promoting change in the sequences influencing transmission.

CCAP, the largest European protistan culture collection, is based at the Scottish Association for Marine Science near Oban, Scotland (http://www.ccap.ac.uk). The Collection comprises more than 2700 strains in the public domain, of which 1050 are marine algae, 1300 freshwater algae, and 350 protozoa. The primary mission of CCAP is to maintain and distribute defined cultures and their associated information to its customers. It also has a support and advisory function on all aspects of protistan science. In addition, it is involved in the training of students and researchers in algal identification and culture techniques. In light of the increasing number of fully sequenced protists, the CCAP is striving to provide targeted services and support to workers involved in all aspects of genomic research. At present, the Collection holds several hundred strains of genomic model taxa including: Acanthamoeba, Cafeteria, Cercomonas, Chlamydomonas, Chlorella, Cyanophora, Dictyostelium, Dunaliella, Ectocarpus, Emiliania, Euglena, Micromonas, Naegleria, Nephroselmis, Paramecium, Pavlova, Phaeodactylum, Porphyra, Pseudendoclonium, Pylaiella, Rhodomonas, Scenedesmus, Staurastrum, Tetrahymena, Thalassiosira, Volvox and Zygnema. These strains provide a defined representation of natural variation within model organisms, an increasingly useful resource for post-genomics approaches. Our aim over the next 2-5 years is to add value to the Collection by increasing the number of genome model species, and by offering an integrated, up-to-date, easy-to-use resource that would provide curated information on our strain holdings. In collaboration with other major Biological Resource Centres worldwide, we intend to build a hub providing access to both protistan cultures and their associated bioinformatics data.

Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic

The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts.

Objectives. We investigated how fluctuations and linear changes in health and cognitive resources influence the motivation to engage in complex cognitive activity and the extent to which motivation mediated the relationship between changing resources and cognitively demanding activities. Method. Longitudinal data from 332 adults aged 20–85 years were examined. Motivation was assessed using a composite of Need for Cognition and Personal Need for Structure and additional measures of health, sensory functioning, cognitive ability, and self-reported activity engagement. Results. Multilevel modeling revealed that age-typical changes in health, sensory functions, and ability were associated with changes in motivation, with the impact of declining health on motivation being particularly strong in older adulthood. Changes in motivation, in turn, predicted involvement in cognitive and social activities as well as changes in cognitive ability. Finally, motivation was observed to partially mediate the relationship between changes in resources and cognitively demanding activities. Discussion. Our results suggest that motivation may play an important role in determining the course of cognitive change and involvement in cognitively demanding everyday activities in adulthood. PMID:21926400

Mitochondria and chloroplasts are energy-transducing organelles of the cytoplasm of eukaryotic cells. They originated as bacterial symbionts whose host cells acquired respiration from the precursor of the mitochondrion, and oxygenic photosynthesis from the precursor of the chloroplast. The host cells also acquired genetic information from their symbionts, eventually incorporating much of it into their own genomes. Genes of the eukaryotic cell nucleus now encode most mitochondrial and chloroplast proteins. Genes are copied and moved between cellular compartments with relative ease, and there is no obvious obstacle to successful import of any protein precursor from the cytosol. So why are any genes at all retained in cytoplasmic organelles? One proposal is that these small but functionalgenomes provide a location for genes that is close to, and in the same compartment as, their gene products. This co-location facilitates rapid and direct regulatory coupling. Redox control of synthesis de novo is put forward as the common property of those proteins that must be encoded and synthesized within mitochondria and chloroplasts. This testable hypothesis is termed CORR, for co-location for redox regulation. Principles, predictions and consequences of CORR are examined in the context of competing hypotheses and current evidence. PMID:12594916

Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenomeResource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.

Gramene is an integrated informatics resource for accessing, visualizing, and comparing plant genomes and biological pathways. Originally targeting grasses, Gramene has grown to host annotations for economically important and research model crops, including wheat, potato, tomato, banana, grape, poplar, and Chlamydomonas. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. This chapter outlines system requirements for end users and database hosting, data types and basic navigation within Gramene, and provides examples of how to (1) view a phylogenetic tree for a family of transcription factors, (2) explore genetic variation in the orthologues of a gene with a known trait association, and (3) upload, visualize, and privately share end user data into a new genome browser track.Moreover, this is the first publication describing Gramene's new web interface-intended to provide a simplified portal to the most complete and up-to-date set of plant genome and pathway annotations.

Arabidopsis thaliana is the most widely studied model plant. Functionalgenomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db.

Viruses are the most abundant and diverse biological entities on earth, and while most of this diversity remains completely unexplored, advances in genome sequencing have provided unprecedented glimpses into the virosphere. The Prokaryotic Virus Orthologous Groups (pVOGs, formerly called Phage Orthologous Groups, POGs) resource has aided in this task over the past decade by using automated methods to keep pace with the rapid increase in genomic data. The uses of pVOGs include functional annotation of viral proteins, identification of genes and viruses in uncharacterized DNA samples, phylogenetic analysis, large-scale comparative genomics projects, and more. The pVOGs database represents a comprehensive set of orthologous gene families shared across multiple complete genomes of viruses that infect bacterial or archaeal hosts (viruses of eukaryotes will be added at a future date). The pVOGs are constructed within the Clusters of Orthologous Groups (COGs) framework that is widely used for orthology identification in prokaryotes. Since the previous release of the POGs, the size has tripled to nearly 3000 genomes and 300 000 proteins, and the number of conserved orthologous groups doubled to 9518. User-friendly webpages are available, including multiple sequence alignments and HMM profiles for each VOG. These changes provide major improvements to the pVOGs database, at a time of rapid advances in virus genomics. The pVOGs database is hosted jointly at the University of Iowa at http://dmk-brain.ecn.uiowa.edu/pVOGs and the NCBI at ftp://ftp.ncbi.nlm.nih.gov/pub/kristensen/pVOGs/home.html. PMID:27789703

Advances in high-throughput sequencing have facilitated large-scale surveys of genomic variation in the budding yeast,Saccharomyces cerevisiae These surveys have revealed extensive sequence variation between yeast strains. However, much less is known about how such variation influences the amount and nature of variation for functionalgenomic traits within and between yeast lineages. We review population-level studies of functionalgenomic variation, with a particular focus on how population functionalgenomic approaches can provide insights into both genomefunction and the evolutionary process. Although variation in functionalgenomics phenotypes is pervasive, our understanding of the consequences of this variation, either in physiological or evolutionary terms, is still rudimentary and thus motivates increased attention to appropriate null models. To date, much of the focus of population functionalgenomic studies has been on gene expression variation, but other functionalgenomic data types are just as likely to reveal important insights at the population level, suggesting a pressing need for more studies that go beyond transcription. Finally, we discuss how a population functionalgenomic perspective can be a powerful approach for developing a mechanistic understanding of the processes that link genomic variation to organismal phenotypes through gene networks.

The SPAcecraft SIMulator (SPASIM) simulates the functions and resources of a spacecraft to quickly perform Phase A trade-off analyses and uncover any operational bottlenecks during any part of the mission. Failure modes and operational contingencies can be evaluated allowing optimization for a range of mission scenarios. The payloads and subsystems are simulated, using a hierarchy of graphical models, in terms of how their functions affect resources such as propellant, power, and data. Any of the inputs and outputs of the payloads and subsystems can be plotted during the simulation. Most trade-off analyses, including those that compare current versus advanced technology, can be performed by changing values in the parameter menus. However, when a component is replaced by one with a different functional architecture, its graphical model can also be modified or replaced by drawing from a component library. SPASIM has been validated using several spacecraft designs which were at least at the Critical Design Review level. The user and programmer guide, including figures, is available on line as a hyper text document. This is an easy-to-use and expand tool which is based on MATLAB and SIMULINK. It runs on SGI workstations and PCs under Windows 95 or NT.

MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest.

Understanding cellular life requires a comprehensive knowledge of the essential cellular functions, the components involved, and their interactions. Minimized genomes are an important tool to gain this knowledge. We have constructed strains of the model bacterium, Bacillus subtilis, whose genomes have been reduced by ∼36%. These strains are fully viable, and their growth rates in complex medium are comparable to those of wild type strains. An in-depth multi-omics analysis of the genome reduced strains revealed how the deletions affect the transcription regulatory network of the cell, translation resource allocation, and metabolism. A comparison of gene counts and resource allocation demonstrates drastic differences in the two parameters, with 50% of the genes using as little as 10% of translation capacity, whereas the 6% essential genes require 57% of the translation resources. Taken together, the results are a valuable resource on gene dispensability in B. subtilis, and they suggest the roads to further genome reduction to approach the final aim of a minimal cell in which all functions are understood.

lines with an average deletion frequency of ~10% were identified for developing high density marker scaffolds of the D-genome. Conclusions The RH panel reported here is the first developed for any wild ancestor of a major cultivated plant species. The results provided insight into various aspects of RH mapping in plants, including the genetically effective cell number for wheat (for the first time) and the potential implementation of this technique in other plant species. This RH panel will be an invaluable resource for mapping gene based markers, developing a complete marker scaffold for the whole genome sequence assembly, fine mapping of markers and functional characterization of genes and gene networks present on the D-genome. PMID:23127207

The Victorian Centre for FunctionalGenomics (VCFG) is an RNAi screening facility housed at the Peter MacCallum Cancer Centre in Melbourne, Australia. The Peter Mac is Australia's largest dedicated Cancer Research Institute, home to a team of over 520 scientists that focus on understanding the genetic risk of cancer, the molecular events regulating cancer growth and dissemination and improving detection through new diagnostic tools (www.petermac.org). Peter Mac is a well recognised technology leader and established the VCFG with a view to enabling researchers Australia and New Zealand-wide access to cutting edge functionalgenomics technology, infrastructure and expertise. This review documents the technology platforms operated within the VCFG and provides insight into the workflows and analysis pipelines currently in operation.

The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functionalgenomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

The National Center for Biotechnology Information (NCBI) provides a data-rich environment in support of genomic research by collecting the biological data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains and integrating the data with analytical, search, and retrieval resources through the NCBI Web site. Entrez, an integrated search and retrieval system, enables text searches across various diverse biological databases maintained at NCBI. Map Viewer, the genome browser developed at NCBI, displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allows maps from all plant genomes available in the Map Viewer to be searched to produce a display of aligned maps from several species. Customized Plant Basic Local Alignment Search Tool (PlantBLAST) allows the user to perform sequence similarity searches in a special collection of mapped plant sequence data and to view the resulting alignments within a genomic context using Map Viewer. In addition, pre-computed sequence similarities, such as those for proteins offered by BLAST Link (BLink), enable fluid navigation from un-annotated to annotated sequences, quickening the pace of discovery. Plant Genome Central (PGC) is a Web portal that provides centralized access to all NCBI plant genomeresources. Also, there are links to plant-specific Web resources external to NCBI such as organism-specific databases, genome-sequencing project Web pages, and homepages of genomic bioinformatics organizations.

Pearl millet is one of the most important small-grained C4 Panicoid crops with a large genome size (∼2352 Mb), short life cycle and outbreeding nature. It is highly resilient to areas with scanty rain and high temperature. Pearl millet is a nutritionally superior staple crop for people inhabiting hot, drought-prone arid and semi-arid regions of South Asia and Africa where it is widely grown and used for food, hay, silage, bird feed, building material, and fuel. Having excellent nutrient composition and exceptional buffering capacity against variable climatic conditions and pathogen attack makes pearl millet a wonderful model crop for stress tolerance studies. Pearl millet germplasm show a large range of genotypic and phenotypic variations including tolerance to abiotic and biotic stresses. Conventional breeding for enhancing abiotic and biotic stress resistance in pearl millet have met with considerable success, however, in last few years various novel approaches including functionalgenomics and molecular breeding have been attempted in this crop for augmenting yield under adverse environmental conditions, and there is still a lot of scope for further improvement using genomic tools. Discovery and use of various DNA-based markers such as EST-SSRs, DArT, CISP, and SSCP-SNP in pearl millet not only help in determining population structure and genetic diversity but also prove to be important for developing strategies for crop improvement at a faster rate and greater precision. Molecular marker-based genetic linkage maps and identification of genomic regions determining yield under abiotic stresses particularly terminal drought have paved way for marker-assisted selection and breeding of pearl millet cultivars. Reference collections and marker-assisted backcrossing have also been used to improve biotic stress resistance in pearl millet specifically to downy mildew. Whole genome sequencing of pearl millet genome will give new insights for processing of functional

Summary Background The Clinical GenomeResource (ClinGen) Electronic Health Record (EHR) Workgroup aims to integrate ClinGen resources with EHRs. A promising option to enable this integration is through the Health Level Seven (HL7) Infobutton Standard. EHR systems that are certified according to the US Meaningful Use program provide HL7-compliant infobutton capabilities, which can be leveraged to support clinical decision-making in genomics. Objectives To integrate genomic knowledge resources using the HL7 infobutton standard. Two tactics to achieve this objective were: (1) creating an HL7-compliant search interface for ClinGen, and (2) proposing guidance for genomicresources on achieving HL7 Infobutton standard accessibility and compliance. Methods We built a search interface utilizing OpenInfobutton, an open source reference implementation of the HL7 Infobutton standard. ClinGen resources were assessed for readiness towards HL7 compliance. Finally, based upon our experiences we provide recommendations for publishers seeking to achieve HL7 compliance. Results Eight genomicresources and two sub-resources were integrated with the ClinGen search engine via OpenInfobutton and the HL7 infobutton standard. Resources we assessed have varying levels of readiness towards HL7-compliance. Furthermore, we found that adoption of standard terminologies used by EHR systems is the main gap to achieve compliance. Conclusion Genomicresources can be integrated with EHR systems via the HL7 Infobutton standard using OpenInfobutton. Full compliance of genomicresources with the Infobutton standard would further enhance interoperability with EHR systems. PMID:27579472

The 1000 genomes project changed the way how we see the human genome. The rapid development of the deep sequencing technologies is raising several practical questions, and the way how we answer these questions will affect deeply the future of the oncological reseach in Hungary. In our manuscript we give a short overview of the results of the 1000 genomes project and we present the place of the functionalgenomic investigations between other genomic tools. Based on the recent development in the field we summarize the challenges that have to be addressed in the next couple of years.

Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

Pacific herring (Clupea pallasii) support commercially and culturally important fisheries but have experienced significant additional pressure from a variety of anthropogenic and environmental sources. In order to provide genomicresources to facilitate organismal and population level research, high-throughput pyrosequencing (Roche 454) was carried out on transcriptome libraries from liver and testes samples taken in Prince William Sound, the Bering Sea, and the Gulf of Alaska. Over 40,000 contigs were identified with an average length of 728 bp. We describe an annotated transcriptome as well as a workflow for single nucleotide polymorphism (SNP) discovery and validation. A subset of 96 candidate SNPs chosen from 10,933 potential SNPs, were tested using a combination of Sanger sequencing and high-resolution melt-curve analysis. Five SNPs supported between-ocean-basin differentiation, while one SNP associated with immune function provided high differentiation between Prince William Sound and Kodiak Island within the Gulf of Alaska. These genomicresources provide a basis for environmental physiology studies and opportunities for marker development and subsequent population structure analysis. PMID:22383979

The genome must be highly compacted to fit within eukaryotic nuclei but must be accessible to the transcriptional machinery to allow appropriate expression of genes in different cell types and throughout developmental pathways. A growing body of work has shown that the genome, analogously to proteins, forms an ordered, hierarchical structure that closely correlates and may even be causally linked with regulation of functions such as transcription. This review describes our current understanding of how these functionalgenomic "secondary and tertiary structures" form a blueprint for global nuclear architecture and the potential they hold for understanding and manipulating genomic regulation.

Since the completion of the Human Genome Project (HGP) in 2003, the understanding of genetics and its influence on disease, particularly cancer, has increased dramatically. The initial focus after the completion of HGP was on identifying single-gene disorders, such as many hereditary cancer syndromes (e.g., BRCA1, BRCA2, HNPCC). As research continues, the major impact that genetics and genomics have across the healthcare continuum is only beginning to become clear.

Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net. PMID:19858098

The development of functionalgenomics including transcriptomics, proteomics and metabolomics allow us to monitor a large number of key cellular pathways simultaneously. Several technology-specific data models have been introduced for the representation of functionalgenomics experimental data, including the MicroArray Gene Expression-Object Model (MAGE-OM), the Proteomics Experiment Data Repository (PEDRo), and the Tissue MicroArray-Object Model (TMA-OM). Despite the increasing number of cancer studies using multiple functionalgenomics technologies, there is still no integrated data model for multiple functionalgenomics experimental and clinical data. We propose an object-oriented data model for cancer genomics research, Cancer Genomics Object Model (CaGe-OM). We reference four data models: FunctionalGenomic-Object Model, MAGE-OM, TMAOM and PEDRo. The clinical and histopathological information models are created by analyzing cancer management workflow and referencing the College of American Pathology Cancer Protocols and National Cancer Institute Common Data Elements. The CaGe-OM provides a comprehensive data model for integrated storage and analysis of clinical and multiple functionalgenomics data.

Rice is the first cereal genome to be completely sequenced. Since the completion of its genome sequencing, considerable progress has been made in multiple areas including the whole genome annotation, gene expression profiling, mutant collection, etc. Here, we summarize the current status of rice genome annotation and review the methodology of assigning biological functions to hundreds of thousands of rice genes as well as discuss the major limitations and the future perspective in rice functionalgenomics. Available data analysis shows that the rice genome encodes around 32,000 protein-coding genes. Expression analysis revealed at least 31,000 genes with expression evidence from full-length cDNA/EST collection or other transcript profiling. In addition, we have summarized various strategies to generate mutant population including natural, physical, chemical, T-DNA, transposon/retrotransposon or gene silencing based mutagenesis. Currently, more than 1 million of mutants have been generated and 27,551 of them have their flanking sequence tags. To assign biological functions to hundreds of thousands of rice genes, global co-operations are required, various genetic resources should be more easily accessible and diverse data from transcriptomics, proteomics, epigenetics, comparative genomics and bioinformatics should be integrated to better understand the functions of these genes and their regulatory mechanisms.

In the post genome era, a major goal in molecular biology is to determine the function of the many thousands of genes present in the vertebrate genome. The zebrafish (Danio rerio) provides an almost ideal genetic model to identify the biological roles of these novel genes, in part because their embryos are transparent and develop rapidly. The zebrafish has many advantages over mouse for genome-wide mutagenesis studies, allowing for easier, cheaper and faster functional characterization of novel genes in the vertebrate genome. Many molecular research tools such as chemical mutagenesis, transgenesis, gene trapping, gene knockdown, TILLING, gene targeting, RNAi and chemical genetic screen are now available in zebrafish. Combining all the forward, reverse, and chemical genetic tools, it is expected that zebrafish will make invaluable contribution to vertebrate functionalgenomics in functional annotation of the genes, modeling human diseases and drug discoveries.

The tomato is a model species for fleshy fruit development and ripening, as well as for genomics studies of others Solanaceae. Many genetic and genomicsresources, including databases for sequencing, transcriptomics and metabolomics data, have been developed and are today available. The purpose of the present work was to uncover new genes and/or alleles that determine ascorbic acid and carotenoids accumulation, by exploiting one Solanum pennellii introgression lines (IL7-3) harboring quantitative trait loci (QTL) that increase the content of these metabolites in the fruit. The higher ascorbic acid and carotenoids content in IL7-3 was confirmed at three fruit developmental stages. The tomato genome reference sequence and the recently released S. pennellii genome sequence were investigated to identify candidate genes (CGs) that might control ascorbic acid and carotenoids accumulation. First of all, a refinement of the wild region borders in the IL7-3 was achieved by analyzing CAPS markers designed in our laboratory. Afterward, six CGs associated to ascorbic acid and one with carotenoids metabolism were identified exploring the annotation and the Gene Ontology terms of genes included in the region. Variants between the sequence of the wild and the cultivated alleles of these genes were investigated for their functional relevance and their potential effects on the protein sequences were predicted. Transcriptional levels of CGs in the introgression region were extracted from RNA-Seq data available for the entire S. pennellii introgression lines collection and verified by Real-Time qPCR. Finally, seven IL7-3 sub-lines were genotyped using 28 species-specific markers and then were evaluated for metabolites content. These analyses evidenced a significant decrease in transcript abundance for one 9-cis-epoxycarotenoid dioxygenase and one L-ascorbate oxidase homolog, whose role in the accumulation of carotenoids and ascorbic acid is discussed. Comprehensively, the reported

AtPID (Arabidopsis thaliana Protein Interactome Database, available at http://www.megabionet.org/atpid) is an integrated database resource for protein interaction network and functional annotation. In the past few years, we collected 5564 mutants with significant morphological alterations and manually curated them to 167 plant ontology (PO) morphology categories. These single/multiple-gene mutants were indexed and linked to 3919 genes. After integrated these genotype–phenotype associations with the comprehensive protein interaction network in AtPID, we developed a Naïve Bayes method and predicted 4457 novel high confidence gene-PO pairs with 1369 genes as the complement. Along with the accumulated novel data for protein interaction and functional annotation, and the updated visualization toolkits, we present a genome-scale resource for genotype–phenotype associations for Arabidopsis in AtPID 5.0. In our updated website, all the new genotype–phenotype associations from mutants, protein network, and the protein annotation information can be vividly displayed in a comprehensive network view, which will greatly enhance plant protein function and genotype–phenotype association studies in a systematical way. PMID:27899679

Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the “Reptilian Transcriptomes Database 2.0,” which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. PMID:26133641

Advances in genetics and genomics have fuelled a revolution in discovery-based, or hypothesis-generating, research that provides a powerful complement to the more directly hypothesis-driven molecular, cellular and systems neuroscience. Genetic and functionalgenomic studies have already yielded important insights into neuronal diversity and function, as well as disease. One of the most exciting and challenging frontiers in neuroscience involves harnessing the power of large-scale genetic, genomic and phenotypic data sets, and the development of tools for data integration and mining. Methods for network analysis and systems biology offer the promise of integrating these multiple levels of data, connecting molecular pathways to nervous system function.

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247

Consumer-resource interactions are a central issue in evolutionary and community ecology because they play important roles in selection and population regulation. Most consumers encounter resource variation at multiple scales, and respond through phenotypic plasticity in the short term or evolutionary divergence in the long term. The key traits for these responses may influence resource acquisition, assimilation, and/or allocation. To identify relevant candidate genes, we experimentally assayed genome-wide gene expression in pond and lake Daphnia ecotypes exposed to alternate resource environments. One was a simple, high-quality laboratory diet, Ankistrodesmus falcatus. The other was the complex natural seston from a large lake. In temporary ponds, Daphnia generally experience high-quality, abundant resources, whereas lakes provide low-quality, seasonally shifting resources that are chronically limiting. For both ecotypes, we used replicate clones drawn from a number of separate populations. Fourteen genes were differentially regulated with respect to resources, including genes involved in gut processes, resource allocation, and activities with no obvious connection to resource exploitation. Three genes were differentially regulated in both ecotypes; the others may play a role in ecological divergence. Genes clearly linked to gut processes include two peritrophic matrix proteins, a Niemann-Pick type C2 gene, and a chymotrypsin. A pancreatic lipase, an epoxide hydrolase, a neuroparsin, and an UDP-dependent glucuronyltransferase are potentially involved in resource allocation through effects on energy processing and storage or hormone pathways. We performed quantitative rt-PCR for eight genes in independent samples of three clones of each of the two ecotypes. Though these largely confirmed observed differential regulation, some genes' expression was highly variable among clones. Our results demonstrate the value of matching the level of biological replication in

The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at .

Shiga toxin-producing Escherichia coli O157:H7 primarily resides in cattle asymptomatically, and can be transmitted to humans through food. A study by Lupolova et al applied a machine-learning approach to complex pan-genome information and predicted that only a small subset of bovine isolates have t...

Genome-wide association studies (GWAS) have provided a rich collection of ~58 CAD loci that suggest the existence of previously unsuspected new biology relevant to atherosclerosis. However, these studies only identify genomic loci associated with CAD and many questions remain even after a genomic locus is definitively implicated, including the nature of the causal variant(s) and the causal gene(s), as well as the directionality of effect. There are a number of tools that can be employed for investigation of the functionalgenomics of these loci, and progress has been made on a limited number of novel CAD loci. New biology regarding atherosclerosis and CAD will be learned through the functionalgenomics of these loci and the hope is that at least some of these new pathways relevant to CAD pathogenesis will yield new therapeutic targets for the prevention and treatment of CAD. PMID:26892960

With the development of new technologies in genome sequencing, gene expression profiling, genotyping, and high-throughput screening of chemical compound libraries, small molecules are playing increasingly important roles in studying gene expression regulation, gene-gene interaction, and gene function. Here we briefly review and discuss some recent advancements in drug target identification and phenotype characterization using combinations of high-throughput screening of small-molecule libraries and various genome-wide methods such as whole genome sequencing, genome-wide association studies, and genome-wide expressional analysis. These approaches can be used to search for new drugs against parasitic infections, to identify drug targets or drug-resistance genes, and to infer gene function. PMID:24215777

Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment.

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genomefunction in human biology and disease.

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genomefunction in human biology and disease. PMID:24753594

The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts. DOI: http://dx.doi.org/10.7554/eLife.12068.001 PMID:26896675

A collection of 5006 full-length (FL) cDNA sequences was developed in barley. Fifteen mRNA samples from various organs and treatments were pooled to develop a cDNA library using the CAP trapper method. More than 60% of the clones were confirmed to have complete coding sequences, based on comparison with rice amino acid and UniProt sequences. Blastn homologies (E<1E-5) to rice genes and Arabidopsis genes were 89 and 47%, respectively. Of the 5028 possible amino acid sequences derived from the 5006 FLcDNAs, 4032 (80.2%) were classified into 1678 GreenPhyl multigenic families. There were 555 cDNAs showing low homology to both rice and Arabidopsis. Gene ontology annotation by InterProScan indicated that many of these cDNAs (71%) have no known molecular functions and may be unique to barley. The cDNAs showed high homology to Barley 1 GeneChip oligo probes (81%) and the wheat gene index (84%). The high homology between FLcDNAs (27%) and mapped barley expressed sequence tag enabled assigning linkage map positions to 151–233 FLcDNAs on each of the seven barley chromosomes. These comprehensive barley FLcDNAs provide strong platform to connect pre-existing genomic and genetic resources and accelerate gene identification and genome analysis in barley and related species. PMID:19150987

The goal of human genome project is to characterize and sequence entire genomes of human and several model organisms, thus providing complete sets of information on the entire structure of transcribed, regulatory and other functional regions for these organisms. In the past years, a number of useful genetic and physical markers on human and mouse genomes have been made available along with the advent of BAC library resources for these organisms. The advances in technology and resource development made it feasible to efficiently construct genome-wide physical BAC contigs for human and other genomes. Currently, over 30,000 mapped STSs and 27,000 mapped Unigenes are available for human genome mapping. ESTs and cDNAs are excellent resources for building contig maps for two reasons. Firstly, they exist in two alternative forms--as both sequence information for PCR primer pairs, and cDoreen genomic libraries efficiently for large number of DNA probes by combining over 100 cDNA probes in each hybridization. Second, the linkage and order of genes are rather conserved among human, mouse and other model organisms. Therefore, gene markers have advantages over random anonymous STSs in building maps for comparative genomic studies.

The pig, a representative of the artiodactyla clade, is one of the first animals domesticated, and has become an important agriculture animal as one of the major human nutritional sources of animal based protein. The pig is also a valuable biomedical model organism for human health. The pig's importance to human health and nutrition is reflected in the decision to sequence its genome (3X). As an animal species with its wild ancestors present in the world, the pig provides a unique opportunity for tracing mammalian evolutionary history and defining signatures of selection resulting from both domestication and natural selection. Completion of the pig genome sequencing project will have significant impacts on both agriculture and human health. Following the pig whole genome sequence drafts, along with large-scale polymorphism data, it will be possible to conduct genome sweeps using association mapping, and identify signatures of selection. Here, we provide a description of the pig genome sequencing project and perspectives on utilizing genomic technologies to exploit pig genome evolution and the molecular basis for phenotypic traits for improving pig production and health. PMID:17384734

Growth traits represent a main goal in aquaculture breeding programs and may be related to adaptive variation in wild fisheries. Integrating quantitative trait loci (QTL) mapping and next generation sequencing can greatly help to identify variation in candidate genes, which can result in marker-assisted selection and better genetic structure information. Turbot is a commercially important flatfish in Europe and China, with available genomic information on QTLs and genome mapping. Muscle and liver RNA-seq from 18 individuals was carried out to obtain gene sequences and markers functionally related to growth, resulting in a total of 20,447 genes and 85,344 single nucleotide polymorphisms (SNPs). Many growth-related genes and SNPs were identified and placed in the turbot genome and genetic map to explore their co-localization with growth-QTL markers. Forty-five SNPs on growth-related genes were selected based on QTL co-localization and relevant function for growth traits. Forty-three SNPs were technically feasible and validated in a wild Atlantic population, where 91% were polymorphic. The integration of functional and structural genomicresources in turbot provides a practical approach for QTL mining in this species. Validated SNPs represent a useful set of growth-related gene markers for future association, functional and population studies in this flatfish species.

Growth traits represent a main goal in aquaculture breeding programs and may be related to adaptive variation in wild fisheries. Integrating quantitative trait loci (QTL) mapping and next generation sequencing can greatly help to identify variation in candidate genes, which can result in marker-assisted selection and better genetic structure information. Turbot is a commercially important flatfish in Europe and China, with available genomic information on QTLs and genome mapping. Muscle and liver RNA-seq from 18 individuals was carried out to obtain gene sequences and markers functionally related to growth, resulting in a total of 20,447 genes and 85,344 single nucleotide polymorphisms (SNPs). Many growth-related genes and SNPs were identified and placed in the turbot genome and genetic map to explore their co-localization with growth-QTL markers. Forty-five SNPs on growth-related genes were selected based on QTL co-localization and relevant function for growth traits. Forty-three SNPs were technically feasible and validated in a wild Atlantic population, where 91% were polymorphic. The integration of functional and structural genomicresources in turbot provides a practical approach for QTL mining in this species. Validated SNPs represent a useful set of growth-related gene markers for future association, functional and population studies in this flatfish species. PMID:26901189

We describe the organization of a nascent international effort - the "Functional Annotation of ANimal Genomes" project - whose aim is to produce comprehensive maps of functional elements in the genomes of domesticated animal species....

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

Rapidly accumulating data from genome-wide association studies (GWASs) and other large-scale studies are most useful when synthesized with existing databases. To address this opportunity, we developed the Phenotype-Genotype Integrator (PheGenI), a user-friendly web interface that integrates various National Center for Biotechnology Information (NCBI) genomic databases with association data from the National Human Genome Research Institute GWAS Catalog and supports downloads of search results. Here, we describe the rationale for and development of this resource. Integrating over 66,000 association records with extensive single nucleotide polymorphism (SNP), gene, and expression quantitative trait loci data already available from the NCBI, PheGenI enables deeper investigation and interrogation of SNPs associated with a wide range of traits, facilitating the examination of the relationships between genetic variation and human diseases.

Background The green peach aphid, Myzus persicae (Sulzer), is a world-wide insect pest capable of infesting more than 40 plant families, including many crop species. However, despite the significant damage inflicted by M. persicae in agricultural systems through direct feeding damage and by its ability to transmit plant viruses, limited genomic information is available for this species. Results Sequencing of 16 M. persicae cDNA libraries generated 26,669 expressed sequence tags (ESTs). Aphids for library construction were raised on Arabidopsis thaliana, Nicotiana benthamiana, Brassica oleracea, B. napus, and Physalis floridana (with and without Potato leafroll virus infection). The M. persicae cDNA libraries include ones made from sexual and asexual whole aphids, guts, heads, and salivary glands. In silico comparison of cDNA libraries identified aphid genes with tissue-specific expression patterns, and gene expression that is induced by feeding on Nicotiana benthamiana. Furthermore, 2423 genes that are novel to science and potentially aphid-specific were identified. Comparison of cDNA data from three aphid lineages identified single nucleotide polymorphisms that can be used as genetic markers and, in some cases, may represent functional differences in the protein products. In particular, non-conservative amino acid substitutions in a highly expressed gut protease may be of adaptive significance for M. persicae feeding on different host plants. The Agilent eArray platform was used to design an M. persicae oligonucleotide microarray representing over 10,000 unique genes. Conclusion New genomicresources have been developed for M. persicae, an agriculturally important insect pest. These include previously unknown sequence data, a collection of expressed genes, molecular markers, and a DNA microarray that can be used to study aphid gene expression. These resources will help elucidate the adaptations that allow M. persicae to develop compatible interactions with its

Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein–protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomicresource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org PMID:24602877

Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomicresource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org.

The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782

Over 95% of all metazoan (animal) species comprise the "invertebrates," but very few genomes from these organisms have been sequenced. We have, therefore, formed a "Global Invertebrate Genomics Alliance" (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture.

Over 95% of all metazoan (animal) species comprise the “invertebrates,” but very few genomes from these organisms have been sequenced. We have, therefore, formed a “Global Invertebrate Genomics Alliance” (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture. PMID:24336862

Background It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. Conclusions Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. PMID:12537575

Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functionalgenomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.

Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functionalgenomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563

Summary: Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Availability: Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo. Contact: elee@berkeleybop.org PMID:19439563

Genome analysis using next generation sequencing technologies has revolutionized the characterization of lactic acid bacteria and complete genomes of all major groups are now available. Comparative genomics has provided new insights into the natural and laboratory evolution of lactic acid bacteria and their environmental interactions. Moreover, functionalgenomics approaches have been used to understand the response of lactic acid bacteria to their environment. The results have been instrumental in understanding the adaptation of lactic acid bacteria in artisanal and industrial food fermentations as well as their interactions with the human host. Collectively, this has led to a detailed analysis of genes involved in colonization, persistence, interaction and signaling towards to the human host and its health. Finally, massive parallel genome re-sequencing has provided new opportunities in applied genomics, specifically in the characterization of novel non-GMO strains that have potential to be used in the food industry. Here, we provide an overview of the state of the art of these functionalgenomics approaches and their impact in understanding, applying and designing lactic acid bacteria for food and health. PMID:25186768

Genome analysis using next generation sequencing technologies has revolutionized the characterization of lactic acid bacteria and complete genomes of all major groups are now available. Comparative genomics has provided new insights into the natural and laboratory evolution of lactic acid bacteria and their environmental interactions. Moreover, functionalgenomics approaches have been used to understand the response of lactic acid bacteria to their environment. The results have been instrumental in understanding the adaptation of lactic acid bacteria in artisanal and industrial food fermentations as well as their interactions with the human host. Collectively, this has led to a detailed analysis of genes involved in colonization, persistence, interaction and signaling towards to the human host and its health. Finally, massive parallel genome re-sequencing has provided new opportunities in applied genomics, specifically in the characterization of novel non-GMO strains that have potential to be used in the food industry. Here, we provide an overview of the state of the art of these functionalgenomics approaches and their impact in understanding, applying and designing lactic acid bacteria for food and health.

Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal GenomicResources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF GenomicResources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that "…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity." However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomicresource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomicresources are

Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal GenomicResources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF GenomicResources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that “…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity.” However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomicresource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomicresources are

Richard Lewontin proposed that the ability of a scientific field to create a narrative for public understanding garners it social relevance. This article applies Lewontin's conceptual framework of the functions of science (manipulatory and explanatory) to compare and explain the current differences in perceived societal relevance of genetics/genomics and proteomics. We provide three examples to illustrate the social relevance and strong cultural narrative of genetics/genomics for which no counterpart exists for proteomics. We argue that the major difference between genetics/genomics and proteomics is that genomics has a strong explanatory function, due to the strong cultural narrative of heredity. Based on qualitative interviews and observations of proteomics conferences, we suggest that the nature of proteins, lack of public understanding, and theoretical complexity exacerbates this difference for proteomics. Lewontin's framework suggests that social scientists may find that omics sciences affect social relations in different ways than past analyses of genetics.

Richard Lewontin proposed that the ability of a scientific field to create a narrative for public understanding garners it social relevance. This article applies Lewontin's conceptual framework of the functions of science (manipulatory and explanatory) to compare and explain the current differences in perceived societal relevance of genetics/genomics and proteomics. We provide three examples to illustrate the social relevance and strong cultural narrative of genetics/genomics for which no counterpart exists for proteomics. We argue that the major difference between genetics/genomics and proteomics is that genomics has a strong explanatory function, due to the strong cultural narrative of heredity. Based on qualitative interviews and observations of proteomics conferences, we suggest that the nature of proteins, lack of public understanding, and theoretical complexity exacerbates this difference for proteomics. Lewontin's framework suggests that social scientists may find that omics sciences affect social relations in different ways than past analyses of genetics. PMID:27134568

I will divide my remarks into 3 parts. First, I will give a brief summary of the Human Genome Project. Second, I will describe our work on human chromosome 7 to illustrate how we could contribute to the Project and disease research. Third, I would like to bring across the argument that study of genetic disease is an integral component of the Human Genome Project. In particular, I will use cystic fibrosis as an example to elaborate why I consider disease study is a part of functionalgenomics.

Generating a contiguous, ordered reference sequence of a complex genome such as hexaploid wheat (2n = 6x = 42; approximately 17 GB) is a challenging task due to its large, highly repetitive, and allopolyploid genome. In wheat, ordering of whole-genome or hierarchical shotgun sequencing contigs is primarily based on recombination and comparative genomics-based approaches. However, comparative genomics approaches are limited to syntenic inference and recombination is suppressed within the pericentromeric regions of wheat chromosomes, thus, precise ordering of physical maps and sequenced contigs across the whole-genome using these approaches is nearly impossible. We developed a whole-genome radiation hybrid (WGRH) resource and tested it by genotyping a set of 115 randomly selected lines on a high-density single nucleotide polymorphism (SNP) array. At the whole-genome level, 26 299 SNP markers were mapped on the RH panel and provided an average mapping resolution of approximately 248 Kb/cR1500 with a total map length of 6866 cR1500 . The 7296 unique mapping bins provided a five- to eight-fold higher resolution than genetic maps used in similar studies. Most strikingly, the RH map had uniform bin resolution across the entire chromosome(s), including pericentromeric regions. Our research provides a valuable and low-cost resource for anchoring and ordering sequenced BAC and next generation sequencing (NGS) contigs. The WGRH developed for reference wheat line Chinese Spring (CS-WGRH), will be useful for anchoring and ordering sequenced BAC and NGS based contigs for assembling a high-quality, reference sequence of hexaploid wheat. Additionally, this study provides an excellent model for developing similar resources for other polyploid species.

An intron is an extended genomic feature whose function requires multiple constrained positions—donor and acceptor splice sites, a branch point, a polypyrimidine tract and suitable splicing enhancers—that may be distributed over hundreds or thousands of nucleotides. New introns are therefore unlikely to emerge by incremental accumulation of functional sub-elements. Here we demonstrate that a functional intron can be created de novo in a single step by a segmental genomic duplication. This experiment recapitulates in vivo the birth of an intron that arose in the ancestral jawed vertebrate lineage nearly half-a-billion years ago. PMID:21878908

Targeting-induced local lesions in genomes (TILLING) is a general strategy for identifying induced point mutations that can be applied to almost any organism. Here, we describe the basic methodology for high-throughput TILLING. Gene segments are amplified using fluorescently tagged primers, and products are denatured and reannealed to form heteroduplexes between the mutated sequence and its wild-type counterpart. These heteroduplexes are substrates for cleavage by the endonuclease CEL I. Following cleavage, products are analyzed on denaturing polyacrylamide gels using the LI-COR DNA analyzer system. High-throughput TILLING has been adopted by the Arabidopsis TILLING Project (ATP) to provide allelic series of point mutations for the general Arabidopsis community.

The sequencing of the human genome has generated a drug discovery process that is based on sequence analysis and hypothesis-driven (inductive) prediction of gene function. This approach, which we term inductive genomics, is currently dominating the efforts of the pharmaceutical industry to identify new drug targets. According to recent studies, this sequence-driven discovery process is paradoxically increasing the average cost of drug development, thus falling short of the promise of the Human Genome Project to simplify the creation of much needed novel therapeutics. In the early stages of discovery, the flurry of new gene sequences makes it difficult to pick and prioritize the most promising product candidates for product development, as with existing technologies important decisions have to be based on circumstantial evidence that does not strongly predict therapeutic potential. This is because the physiological function of a potential target cannot be predicted by gene sequence analysis and in vitro technologies alone. In contrast, deductive genomics, or large-scale forward genetics, bridges the gap between sequence and function by providing a function-driven in vivo screen of a highly orthologous mammalian model genome for medically relevant physiological functions and drug targets. This approach allows drug discovery to move beyond the focus on sequence-driven identification of new members of classical drug-able protein families towards the biology-driven identification of innovative targets and biological pathways.

Comparative genomics approaches provide a means of leveraging functionalgenomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genomeresources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomicresources will further enhance their value.

Comparative genomics approaches provide a means of leveraging functionalgenomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genomeresources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomicresources will further enhance their value. PMID:27200191

Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat GenomeResource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. Availability and implementation: The Naked Mole Rat GenomeResource is freely available online at http://www.naked-mole-rat.org. This resource is open source and the source code is available at https://github.com/maglab/naked-mole-rat-portal. Contact: jp@senescence.info PMID:25172923

Background The Oomycete genus Aphanomyces comprises devastating plant and animal pathogens. However, little is known about the molecular mechanisms underlying pathogenicity of Aphanomyces species. In this study, we report on the development of a public database called AphanoDB which is dedicated to Aphanomyces genomic data. As a first step, a large collection of Expressed Sequence Tags was obtained from the legume pathogen A. euteiches, which was then processed and collected into AphanoDB. Description Two cDNA libraries of A. euteiches were created: one from mycelium growing on synthetic medium and one from mycelium grown in contact to root tissues of the model legume Medicago truncatula. From these libraries, 18,684 expressed sequence tags were obtained and assembled into 7,977 unigenes which were compared to public databases for annotation. Queries on AphanoDB allow the users to retrieve information for each unigene including similarity to known protein sequences, protein domains and Gene Ontology classification. Statistical analysis of EST frequency from the two different growth conditions was also added to the database. Conclusion AphanoDB is a public database with a user-friendly web interface. The sequence report pages are the main web interface which provides all annotation details for each unigene. These interactive sequence report pages are easily available through text, BLAST, Gene Ontology and expression profile search utilities. AphanoDB is available from URL: . PMID:18096036

VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.

Thermoanaerobacterium saccharolyticum is a hemicellulose-degrading thermophilic anaerobe that was previously engineered to produce ethanol at high yield. For this research, a major project was undertaken to develop this organism into an industrial biocatalyst, but the lack of genome information and resources were recognized early on as a key limitation.

Conifers have been understudied at the genomic level despite their worldwide ecological and economic importance but the situation is rapidly changing with the development of next generation sequencing (NGS) technologies. With NGS, genomics research has simultaneously gained in speed, magnitude and scope. In just a few years, genomes of 20-24 gigabases have been sequenced for several conifers, with several others expected in the near future. Biological insights have resulted from recent sequencing initiatives as well as genetic mapping, gene expression profiling and gene discovery research over nearly two decades. We review the knowledge arising from conifer genomics research emphasizing genome evolution and the genomic basis of adaptation, and outline emerging questions and knowledge gaps. We discuss future directions in three areas with potential inputs from NGS technologies: the evolutionary impacts of adaptation in conifers based on the adaptation-by-speciation model; the contributions of genetic variability of gene expression in adaptation; and the development of a broader understanding of genetic diversity and its impacts on genomefunction. These research directions promise to sustain research aimed at addressing the emerging challenges of adaptation that face conifer trees.

Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomicresources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology) search and comparative gene expression analysis. The current release of CTDB (v2.0) hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types) and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms) between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html. PMID:26322998

This is an interim report on the FunctionalGenomics Experiment (FuGE) Object Model. FuGE is a framework for creating data standards for high-throughput biological experiments, developed by a consortium of researchers from academia and industry. FuGE supports rich annotation of samples, protocols, instruments, and software, as well as providing extension points for technology specific details. It has been adopted by microarray and proteomics standards bodies as a basis for forthcoming standards. It is hoped that standards developers for other omics techniques will join this collaborative effort; widespread adoption will allow uniform annotation of common parts of functionalgenomics workflows, reduce standard development and learning times through the sharing of consistent practice, and ease the construction of software for accessing and integrating functionalgenomics data.

Candida species are the most prevalent human fungal pathogens, with Candida albicans being the most clinically relevant species. Candida albicans resides as a commensal of the human gastrointestinal tract but is a frequent cause of opportunistic mucosal and systemic infections. Investigation of C. albicans virulence has traditionally relied on candidate gene approaches, but recent advances in functionalgenomics have now facilitated global, unbiased studies of gene function. Such studies include comparative genomics (both between and within Candida species), analysis of total RNA expression, and regulation and delineation of protein-DNA interactions. Additionally, large collections of mutant strains have begun to aid systematic screening of clinically relevant phenotypes. Here, we will highlight the development of functionalgenomics in C. albicans and discuss the use of these approaches to addressing both commensalism and pathogenesis in this species.

The volume of human genome sequence and the variety of web-based tools to access it continue to grow at an impressive rate, but a working knowledge of certain key resources can be sufficient to get the most from your genome. This article provides an update to Genome Biology 2000, 1(4):reviews2001.1-2001.5. PMID:11423014

The torrent of data emerging from the application of new technologies to functionalgenomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functionalgenomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functionalgenomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.

The WW domain-containing oxidoreductase (WWOX) spans one of the most active common fragile sites (CFSs) involved in cancer, FRA16D. WWOX encodes a 46-kDa protein that contains two N-terminal WW domains and a central short-chain dehydrogenase/reductase (SDR) domain. Through its WW domain, Wwox interacts with its partners and modulates their functions. Our data indicate that Wwox suppresses the transactivation function of several transcription factors implied in neoplasia by sequestering them in the cytoplasm. Work from our laboratory and other research groups have demonstrated that Wwox participates in a number of cellular processes including growth, differentiation, apoptosis, and tumor suppression. Targeted deletion of the Wwox gene in mice causes increased spontaneous and chemically induced tumor incidence supporting bona fide tumor suppressor function of WWOX. Moreover, generation of the Wwox-deficient mice uncovers, at least in part, some of the physiological in vivo functions of the WWOX gene. This review focuses on recent progress that elucidates Wwox functions in biology and pathology.

Forest trees are an unparalleled group of organisms in their combined ecological, economic and societal importance. With widespread distributions, predominantly random mating systems and large population sizes, most tree species harbour extensive genetic variation both within and among populations. At the same time, demographic processes associated with Pleistocene climate oscillations and land-use change have affected contemporary range-wide diversity and may impinge on the potential for future adaptation. Understanding how these adaptive and neutral processes have shaped the genomes of trees species is therefore central to their management and conservation. As for many other taxa, the advent of high-throughput sequencing methods is expected to yield an understanding of the interplay between the genome and environment at a level of detail and depth not possible only a few years ago. An international conference entitled 'Genomics and Forest Tree Genetics' was held in May 2016, in Arcachon (France), and brought together forest geneticists with a wide range of research interests to disseminate recent efforts that leverage contemporary genomic tools to probe the population, quantitative and evolutionary genomics of trees. An important goal of the conference was to discuss how such data can be applied to both genome-enabled breeding and the conservation of forest genetic resources under land use and climate change. Here, we report discoveries presented at the meeting and discuss how the ecological genomic toolkit can be used to address both basic and applied questions in tree biology.

Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functionalgenome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genomeresources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functionalgenome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genomeresources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

Retinoids and rexinoids, as all other ligands of the nuclear receptor (NR) family, act as ligand-regulated trans-acting transcription factors that bind to cis-acting DNA regulatory elements in the promoter regions of target genes (for reviews see [12, 22, 23, 26, 36]). Ligand binding modulates the communication functions of the receptor with the intracellular environment, which essentially entails receptor-protein and receptor-DNA or receptor-chromatin interactions. In this communication network, the receptor simultaneously serves as both intracellular sensor and regulator of cell/organ functions. Receptors are "intelligent" mediators of the information encoded in the chemical structure of a nuclear receptor ligand, as they interpret this information in the context of cellular identity and cell-physiological status and convert it into a dynamic chain of receptor-protein and receptor-DNA interactions. To process input and output information, they are composed of a modular structure with several domains that have evolved to exert particular molecular recognition functions. As detailed in other chapters in this volume, the main functional domains are the DNA-binding (DBD) and ligand-binding (LBD) [5-7, 38, 56, 71]. The LBD serves as a dual input-output information processor. Inputs, such as ligand binding or receptor phosphorylations, induce allosteric changes in receptor surfaces that serve as docking sites for outputs, such as subunits of transcription and epigenetic machineries or enzyme complexes. The complexity of input and output signals and their interdependencies is far from being understood.

Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e − 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e − 14) in GeneRIFs and GOA shows our annotation resource is very reliable. PMID:27635398

Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

Alterations in cancer genomes strongly influence clinical responses to treatment and in many instances are potent biomarkers for response to drugs. The Genomics of Drug Sensitivity in Cancer (GDSC) database (www.cancerRxgene.org) is the largest public resource for information on drug sensitivity in cancer cells and molecular markers of drug response. Data are freely available without restriction. GDSC currently contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. To identify molecular markers of drug response, cell line drug sensitivity data are integrated with large genomic datasets obtained from the Catalogue of Somatic Mutations in Cancer database, including information on somatic mutations in cancer genes, gene amplification and deletion, tissue type and transcriptional data. Analysis of GDSC data is through a web portal focused on identifying molecular biomarkers of drug sensitivity based on queries of specific anticancer drugs or cancer genes. Graphical representations of the data are used throughout with links to related resources and all datasets are fully downloadable. GDSC provides a unique resource incorporating large drug sensitivity and genomic datasets to facilitate the discovery of new therapeutic biomarkers for cancer therapies.

The National Cancer Institute's Cancer Genome Anatomy Project (CGAP) is developing publicly accessible information, technology, and material resources that provide a platform for the interface of cancer research and genomics. CGAP's efforts have focused toward (1) building and annotating catalogues of genes expressed during cancer development, (2) identifying polymorphisms in those genes, and (3) developing resources for the molecular characterization of cancer-related chromosomal aberrations. To date, CGAP has produced more than 1,000,000 expressed sequence tags, approximately 3,300,000 serial analysis of gene expression tags, and identified more than 10,000 human gene-based single-nucleotide polymorphisms. To enhance access to these datasets by the research community, a new Cancer Genome Project web site (http://cgap.nci.nih.gov/) is being introduced. The web site includes genomic data for humans and mice, including transcript sequence, gene expression patterns, single-nucleotide polymorphisms, clone resources, and cytogenetic information. Descriptions of the methods and reagents used in deriving the CGAP datasets are also provided. An extensive suite of informatics tools facilitates queries and analysis of the CGAP data by the community. One of the newest features of the CGAP web site is an electronic version of the Mitelman Database of Chromosome Aberrations in Cancer.

This article illustrates the use of the Encyclopedia of DNA Elements (ENCODE) resource to generate or refine hypotheses from genomic data on disease and other phenotypic traits. First, the goals and history of ENCODE and related epigenomics projects are reviewed. Second, the rationale for ENCODE and the major data types used by ENCODE are briefly described, as are some standard heuristics for their interpretation. Third, the use of the ENCODE resource is examined. Standard use cases for ENCODE, accessing the ENCODE resource, and accessing data from related projects are discussed. Although the focus of this article is the use of ENCODE data, some of the same approaches can be used with data from other projects.

We have implemented a ligand-alignment algorithm into our developed computational pipeline for identifying specificity-determining features (SDFs) in protein-ligand complexes. Given a set of protein-ligand complex structures, the algorithm aligns the complexes by ligand rather than by the C -RMSD or standard approach, providing a single reference frame for extracting SDFs. We anticipate that this ligand-alignment capability will be highly useful for protein function prediction. We already have a database containing > 20 K ligand-protein complex crystal structures taken from the Protein Data Bank. By aligning these proteins to single reference frames using ligand alignment, we can submit the complexes to our pipeline for SDF extraction. The SDFs derived from this training procedure can be used as thumbprints that are hallmarks of individual enzyme classes. These SDF thumbprints may then serve as guides to the prediction of function of new unknown proteins.

A major goal of our laboratory is to understand the molecular mechanisms that underlie the development and functions of diverse macrophage phenotypes in health and disease. Recent studies using genetic and genomic approaches suggest a relatively simple model of collaborative and hierarchical interactions between lineage-determining and signal-dependent transcription factors that enable selection and activation of transcriptional enhancers that specify macrophage identity and function. In addition, we have found that it is possible to use natural genetic variation as a powerful tool for advancing our understanding of how the macrophage deciphers the information encoded by the genome to attain specific phenotypes in a context-dependent manner. Here, I will describe our recent efforts to extend genetic and genomic approaches to investigate the roles of distinct tissue environments in determining the phenotypes of different resident populations of macrophages.

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genomeresources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp. PMID:27867804

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genomeresources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomicresources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes

ABSTRACT Acinetobacter baumannii is a Gram-negative bacterial pathogen notorious for causing serious nosocomial infections that resist antibiotic therapy. Research to identify factors responsible for the pathogen's success has been limited by the resources available for genome-scale experimental studies. This report describes the development of several such resources for A. baumannii strain AB5075, a recently characterized wound isolate that is multidrug resistant and displays robust virulence in animal models. We report the completion and annotation of the genome sequence, the construction of a comprehensive ordered transposon mutant library, the extension of high-coverage transposon mutant pool sequencing (Tn-seq) to the strain, and the identification of the genes essential for growth on nutrient-rich agar. These resources should facilitate large-scale genetic analysis of virulence, resistance, and other clinically relevant traits that make A. baumannii a formidable public health threat. IMPORTANCE Acinetobacter baumannii is one of six bacterial pathogens primarily responsible for antibiotic-resistant infections that have become the scourge of health care facilities worldwide. Eliminating such infections requires a deeper understanding of the factors that enable the pathogen to persist in hospital environments, establish infections, and resist antibiotics. We present a set of resources that should accelerate genome-scale genetic characterization of these traits for a reference isolate of A. baumannii that is highly virulent and representative of current outbreak strains. PMID:25845845

Variation in resource availability commonly exerts strong effects on fitness-related traits in wild animals. However, we know little about the molecular mechanisms that mediate these effects, or about their persistence over time. To address these questions, we profiled genome-wide whole-blood DNA methylation levels in two sets of wild baboons: (i) 'wild-feeding' baboons that foraged naturally in a savanna environment and (ii) 'Lodge' baboons that had ready access to spatially concentrated human food scraps, resulting in high feeding efficiency and low daily travel distances. We identified 1014 sites (0.20% of sites tested) that were differentially methylated between wild-feeding and Lodge baboons, providing the first evidence that resource availability shapes the epigenome in a wild mammal. Differentially methylated sites tended to occur in contiguous stretches (i.e., in differentially methylated regions or DMRs), in promoters and enhancers, and near metabolism-related genes, supporting their functional importance in gene regulation. In agreement, reporter assay experiments confirmed that methylation at the largest identified DMR, located in the promoter of a key glycolysis-related gene, was sufficient to causally drive changes in gene expression. Intriguingly, all dispersing males carried a consistent epigenetic signature of their membership in a wild-feeding group, regardless of whether males dispersed into or out of this group as adults. Together, our findings support a role for DNA methylation in mediating ecological effects on phenotypic traits in the wild and emphasize the dynamic environmental sensitivity of DNA methylation levels across the life course.

The budding yeast has long served as a model eukaryote for the functionalgenomic analysis of highly conserved signaling pathways, cellular processes and mechanisms underlying human disease. The collection of reagents available for genomics in yeast is extensive, encompassing a growing diversity of mutant collections beyond gene deletion sets in the standard wild-type S288C genetic background. We review here three main types of mutant allele collections: transposon mutagen collections, essential gene collections and overexpression libraries. Each collection provides unique and identifiable alleles that can be utilized in genome-wide, high-throughput studies. These genomic reagents are particularly informative in identifying synthetic phenotypes and functions associated with essential genes, including those modeled most effectively in complex genetic backgrounds. Several examples of genomic studies in filamentous/pseudohyphal backgrounds are provided here to illustrate this point. Additionally, the limitations of each approach are examined. Collectively, these mutant allele collections in Saccharomyces cerevisiae and the related pathogenic yeast Candida albicans promise insights toward an advanced understanding of eukaryotic molecular and cellular biology.

Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genomeresources for cowpea have lagged behind most other major crops. Here we describe foundational genomeresources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress.

Generation of biologic diversity is a cornerstone of immunity, yet the tools to investigate the causal influence of genetic and environmental factors have been greatly limited. Studies from the Human FunctionalGenomics Project, presented in Cell and other Cell Press journals, integrate environmental and genetic factors with the direction and magnitude of immune responses to decipher inflammatory disease pathogenesis.

There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600

We determined a series of quality control (QC) analyses to assess the usability of DNA collected and processed from different countries utilizing different DNA extraction techniques prior to genome-wide association studies (GWAS). The quality of DNA collected utilizing four different DNA extraction techniques and the impact of shipping DNA at different temperatures on array performance were evaluated. Fifteen maternal-fetal pairs were used from four countries. DNA was extracted using four approaches: whole blood, blood spots with whole genome amplification (WGA), saliva and buccal swab. Samples were sent to a genotyping facility, either on dry ice or at room temperature and genotyped using Affymetrix SNP array 6.0. QC measured included extraction techniques, effect of shipping temperatures, accuracy and Mendelian concordance. Significantly fewer (50 % ) single nucleotide polymorphisms (SNPs) passed QC metrics for buccal swab DNA (P < 0.0001) due to missing genotype data (P < 0.0001). Whole blood or saliva DNA had the highest call rates (99.2 0.4 % and 99.3 0.2 % , respectively) and Mendelian concordance. Shipment temperature had no effect. DNA from blood or saliva had the highest call rate accuracy, and buccal swabs had the lowest. DNA extracted from blood, saliva and blood spots were found suitable for GWAS in our study.

Wayne Reeve of Murdoch University on "Genomics Encyclopedia of Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB): a resource for microsymbiont genomes" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

The giant panda is one of the most critically endangered species due to the fragmentation and loss of its habitat. Studying the functions of proteins in this animal, especially specific trait-related proteins, is therefore necessary to protect the species. In this work, the functions of these proteins were investigated using the genome sequence of the giant panda. Data on 21,001 proteins and their functions were stored in the Giant Panda Protein Database, in which the proteins were divided into two groups: 20,179 proteins whose functions can be predicted by GeneScan formed the known-function group, whereas 822 proteins whose functions cannot be predicted by GeneScan comprised the unknown-function group. For the known-function group, we further classified the proteins by molecular function, biological process, cellular component, and tissue specificity. For the unknown-function group, we developed a strategy in which the proteins were filtered by cross-Blast to identify panda-specific proteins under the assumption that proteins related to the panda-specific traits in the unknown-function group exist. After this filtering procedure, we identified 32 proteins (2 of which are membrane proteins) specific to the giant panda genome as compared against the dog and horse genomes. Based on their amino acid sequences, these 32 proteins were further analyzed by functional classification using SVM-Prot, motif prediction using MyHits, and interacting protein prediction using the Database of Interacting Proteins. Nineteen proteins were predicted to be zinc-binding proteins, thus affecting the activities of nucleic acids. The 32 panda-specific proteins will be further investigated by structural and functional analysis.

The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases.

In modern healthcare systems, the available resources may influence the morbidity, mortality, and-consequently-the level of healthcare provided in every country. This is of particular interest in developing countries where the resources are limited and must be spent wisely to address social justice and the right for equal access in healthcare services by all the citizens in economically viable terms. In this light, the current allocation is, in practice, inefficient and rests mostly on each country's individual political and historical context and, thus, does not always incorporate decision-making enabled by economic models. In this study, we present a new economic model, specifically for resource allocation for genomic medicine, based on performance ratio, with potential applications in diverse healthcare sectors, which are particularly appealing for developing countries and low-resource environments. The model proposes a new method for resource allocation taking into account (1) the size of innovation of a new technology, (2) the relative effectiveness in comparison with social preferences, and (3) the cost of the technology, which permits the measurement of effectiveness to be determined differently in the context of a specific disease and then to be expressed in a relative form using a common performance ratio. The present work expands on previous work for innovation in economic models pertaining to genomic medicine and supports translational science.

Chondrichthyan fishes are a diverse class of gnathostomes that provide a valuable perspective on fundamental characteristics shared by all jawed and limbed vertebrates. Studies of phylogeny, species diversity, population structure, conservation, and physiology are accelerated by genomic, transcriptomic and protein sequence data. These data are widely available for many sarcopterygii (coelacanth, lungfish and tetrapods) and actinoptergii (ray-finned fish including teleosts) taxa, but limited for chondrichthyan fishes. In this study, we summarize available data for chondrichthyes and describe resources for one of the largest projects to characterize one of these fish, Leucoraja erinacea, the little skate. SkateBase ( http://skatebase.org) serves as the skate genome project portal linking data, research tools, and teaching resources. PMID:25309735

Background The availability of genetic and genomicresources for melon has increased significantly, but functionalgenomicsresources are still limited for this crop. TILLING is a powerful reverse genetics approach that can be utilized to generate novel mutations in candidate genes. A TILLING resource is available for cantalupensis melons, but not for inodorus melons, the other main commercial group. Results A new ethyl methanesulfonate-mutagenized (EMS) melon population was generated for the first time in an andromonoecious non-climacteric inodorus Piel de Sapo genetic background. Diverse mutant phenotypes in seedlings, vines and fruits were observed, some of which were of possible commercial interest. The population was first screened for mutations in three target genes involved in disease resistance and fruit quality (Cm-PDS, Cm-eIF4E and Cm-eIFI(iso)4E). The same genes were also tilled in the available monoecious and climacteric cantalupensis EMS melon population. The overall mutation density in this first Piel de Sapo TILLING platform was estimated to be 1 mutation/1.5 Mb by screening four additional genes (Cm-ACO1, Cm-NOR, Cm-DET1 and Cm-DHS). Thirty-three point mutations were found for the seven gene targets, six of which were predicted to have an impact on the function of the protein. The genotype/phenotype correlation was demonstrated for a loss-of-function mutation in the Phytoene desaturase gene, which is involved in carotenoid biosynthesis. Conclusions The TILLING approach was successful at providing new mutations in the genetic background of Piel de Sapo in most of the analyzed genes, even in genes for which natural variation is extremely low. This new resource will facilitate reverse genetics studies in non-climacteric melons, contributing materially to future genomic and breeding studies. PMID:21834982

Background Elucidating the genomic basis of adaptation and speciation is a major challenge in natural systems with large quantities of environmental and phenotypic data, mostly because of the scarcity of genomicresources for non-model organisms. The Atlantic molly (Poecilia mexicana, Poeciliidae) is a small livebearing fish that has been extensively studied for evolutionary ecology research, particularly because this species has repeatedly colonized extreme environments in the form of caves and toxic hydrogen sulfide containing springs. In such extreme environments, populations show strong patterns of adaptive trait divergence and the emergence of reproductive isolation. Here, we used RNA-sequencing to assemble and annotate the first transcriptome of P. mexicana to facilitate ecological genomics studies in the future and aid the identification of genes underlying adaptation and speciation in the system. Description We provide the first annotated reference transcriptome of P. mexicana. Our transcriptome shows high congruence with other published fish transcriptomes, including that of the guppy, medaka, zebrafish, and stickleback. Transcriptome annotation uncovered the presence of candidate genes relevant in the study of adaptation to extreme environments. We describe general and oxidative stress response genes as well as genes involved in pathways induced by hypoxia or involved in sulfide metabolism. To facilitate future comparative analyses, we also conducted quantitative comparisons between P. mexicana from different river drainages. 106,524 single nucleotide polymorphisms were detected in our dataset, including potential markers that are putatively fixed across drainages. Furthermore, specimens from different drainages exhibited some consistent differences in gene regulation. Conclusions Our study provides a valuable genomicresource to study the molecular underpinnings of adaptation to extreme environments in replicated sulfide spring and cave environments. In

Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomicresources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomicresource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478bp and N50 length of 506bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomicresource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species.

The increasing availability of insect genomes has revealed a large number of genes with unknown functions and the resulting problem of how to discover these functions. The RNA interference (RNAi) technique, which generates loss-of-function phenotypes by depletion of a chosen transcript, can help to overcome this challenge. RNAi can unveil the functions of new genes, lead to the discovery of new functions for old genes, and find the genes for old functions. Moreover, the possibility of studying the functions of homologous genes in different species can allow comparisons of the genetic networks regulating a given function in different insect groups, thereby facilitating an evolutionary insight into developmental processes. RNAi also has drawbacks and obscure points, however, such as those related to differences in species sensitivity. Disentangling these differences is one of the main challenges in the RNAi field.

Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology 1. Without this information, it becomes impossible to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Ecological adaptations among bacterial populations have been linked to genomic islands, strain-specific regions of DNA that house functionally adaptive traits 2. In the case of environmental bacteria, these traits are largely inferred from bioinformatic or gene expression analyses 2, thus leaving few examples in which the functions of island genes have been experimentally characterized. Here we report the complete genome sequences of Salinispora tropica and S. arenicola, the first cultured, obligate marine Actinobacteria 3. These two species inhabit benthic marine environments and dedicate 8-10percent of their genomes to the biosynthesis of secondary metabolites. Despite a close phylogenetic relationship, 25 of 37 secondary metabolic pathways are species-specific and located within 21 genomic islands, thus providing new evidence linking secondary metabolism to ecological adaptation. Species-specific differences are also observed in CRISPR sequences, suggesting that variations in phage immunity provide fitness advantages that contribute to the cosmopolitan distribution of S. arenicola 4. The two Salinispora genomes have evolved by complex processes that include the duplication and acquisition of secondary metabolite genes, the products of which provide immediate opportunities for molecular diversification and ecological adaptation. Evidence that secondary metabolic pathways are exchanged by Horizontal Gene Transfer (HGT) yet are fixed among globally distributed populations 5 supports a functional role for their products and suggests that pathway acquisition represents a previously unrecognized force driving bacterial diversification

The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomicresources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my. PMID:27138013

A decapod crustacean model is needed for understanding the molecular mechanisms underlying physiological processes, such as reproduction, sex determination, molting and growth, immunity, regeneration, and response to stress. Criteria for selection are: life-history traits, adult size, availability and ease of culture, and genomics and genetic manipulation. Three freshwater species are considered: cherry shrimp, Neocaridina denticulata; red swamp crayfish, Procambarus clarkii; and redclaw crayfish, Cherax quadricarinatus. All three are readily available, reproduce year round, and grow rapidly. The crayfish species require more space for culture than does N. denticulata. The transparent cuticle of cherry shrimp provides for direct assessment of reproductive status, stage of molt, and tissue-specific expression of reporter genes, and facilitates screening of mutations affecting phenotype. Moreover, a preliminary genome of N. denticulata is available and efforts toward complete genome sequencing and transcriptome sequencing have been initiated. Neocaridina denticulata possesses the best combination of traits that make it most suitable as a model for functionalgenomics. The next step is to obtain the complete genome sequence and to develop molecular technologies for the screening of mutants and for manipulating tissue-specific gene expression.

Lipases are key enzymes involved in lipid digestion, storage and mobilization of reserves during fasting or heightened metabolic demand. This is a highly conserved process, essential for survival. The genomes of five marine invertebrate species with distinctive digestive system were screened for the six major lipase families. The two most common families in marine invertebrates, the neutral an acid lipases, are also the main families in mammals and insects. The number of lipases varies two-fold across analyzed genomes. A high degree of orthology with mammalian lipases was observed. Interestingly, 19% of the marine invertebrate lipases have lost motifs required for catalysis. Analysis of the lid and loop regions of the neutral lipases suggests that many marine invertebrates have a functional triacylglycerol hydrolytic activity as well as some acid lipases. A revision of the expression profiles and functional activity on sequences in databases and scientific literature provided information regarding the function of these families of enzymes in marine invertebrates.

Tomato unquestionably occupies a significant position in world vegetable production owing to its world-wide consumption. The tomato genome sequencing efforts being recently concluded, it becomes more imperative to recognize important functional genes from this treasure of generated information for improving tomato yield. While much progress has been made in conventional tomato breeding, post-transcriptional gene silencing (PTGS) offers an alternative approach for advancement of tomato functionalgenomics. In particular, virus-induced gene silencing (VIGS) is increasingly being used as rapid, reliable, and lucrative screening strategy to elucidate gene function. In this review, we focus on the recent advancement made through exploiting the potential of this technique for manipulating different agronomically important traits in tomato by discussing several case studies.

We provide a means to formally explain the relationship between HTTP URLs and the representations returned when they are requested. According to existing World Wide Web architecture, the URL serves as an identier for a semiotic referent while the document returned via HTTP serves as a representation of the same referent. This begins with two sides of a semiotic triangle; the third side is the relationship between the URL and the representation received. We complete this description by extending the library science resource model Functional Requirements for Bibliographic Resources (FRBR) with cryptographic message and content digests to create a Functional Requirements for Information Resources (FRIR). We show how applying the FRIR model to HTTP GET and POST transactions disambiguates the many relationships between a given URL and all representations received from its request, provides fine-grained explanations that are complementary to existing explanations of web resources, and integrates easily into the emerging W3C provenance standard.

The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functionalgenomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data.

The recent improvement of the high-throughput sequencing technologies is having a strong impact on the detection of genetic variations associated with cancer. Several institutions worldwide have been sequencing the whole exomes and or genomes of cancer patients in the thousands, thereby providing an invaluable collection of new somatic mutations in different cancer types. These initiatives promoted the development of methods and tools for the analysis of cancer genomes that are aimed at studying the relationship between genotype and phenotype in cancer. In this article we review the online resources and computational tools for the analysis of cancer genome. First, we describe the available repositories of cancer genome data. Next, we provide an overview of the methods for the detection of genetic variation and computational tools for the prioritization of cancer related genes and causative somatic variations. Finally, we discuss the future perspectives in cancer genomics focusing on the impact of computational methods and quantitative approaches for defining personalized strategies to improve the diagnosis and treatment of cancer.

The recent improvement of the high-throughput sequencing technologies is having a strong impact on the detection of genetic variations associated with cancer. Several institutions worldwide have been sequencing the whole exomes and or genomes of cancer patients in the thousands, thereby providing an invaluable collection of new somatic mutations in different cancer types. These initiatives promoted the development of methods and tools for the analysis of cancer genomes that are aimed at studying the relationship between genotype and phenotype in cancer. In this article we review the online resources and computational tools for the analysis of cancer genome. First, we describe the available repositories of cancer genome data. Next, we provide an overview of the methods for the detection of genetic variation and computational tools for the prioritization of cancer related genes and causative somatic variations. Finally, we discuss the future perspectives in cancer genomics focusing on the impact of computational methods and quantitative approaches for defining personalized strategies to improve the diagnosis and treatment of cancer. PMID:26111056

Although the composition of the gut microbiota and its symbiotic contribution to key host physiological functions are well established, little is known as yet about the bacterial factors that account for this symbiosis. We selected Lactobacillus casei as a model microorganism to proceed to genomewide identification of the functions required for a symbiont to establish colonization in the gut. As a result of our recent development of a transposon-mutagenesis tool that overcomes the barrier that had prevented L. casei random mutagenesis, we developed a signature-tagged mutagenesis approach combining whole-genome reverse genetics using a set of tagged transposons and in vivo screening using the rabbit ligated ileal loop model. After sequencing transposon insertion sites in 9,250 random mutants, we assembled a library of 1,110 independent mutants, all disrupted in a different gene, that provides a representative view of the L. casei genome. By determining the relative quantity of each of the 1,110 mutants before and after the in vivo challenge, we identified a core of 47 L. casei genes necessary for its establishment in the gut. They are involved in housekeeping functions, metabolism (sugar, amino acids), cell wall biogenesis, and adaptation to environment. Hence we provide what is, to our knowledge, the first global functionalgenomics analysis of L. casei symbiosis. PMID:25024222

Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomicresources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomicresources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my. PMID:27017950

The Rice TOGO Browser is an online public resource designed to facilitate integration and visualization of mapping data of bacterial artificial chromosome (BAC)/P1-derived artificial chromosome (PAC) clones, genes, restriction fragment length polymorphism (RFLP)/simple sequence repeat (SSR) markers and phenotype data represented as quantitative trait loci (QTLs) onto the genome sequence, and to provide a platform for more efficient utilization of genome information from the point of view of applied genomics as well as functionalgenomics. Three search options, namely keyword search, region search and trait search, generate various types of data in a user-friendly interface with three distinct viewers, a chromosome viewer, an integrated map viewer and a sequence viewer, thereby providing the opportunity to view the position of genes and/or QTLs at the chromosomal level and to retrieve any sequence information in a user-defined genome region. Furthermore, the gene list, marker list and genome sequence in a specified region delineated by RFLP/SSR markers and any sequences designed as primers can be viewed and downloaded to support forward genetics approaches. An additional feature of this database is the graphical viewer for BLAST search to reveal information not only for regions with significant sequence similarity but also for regions adjacent to those with similarity but with no hits between sequences. An easy to use and intuitive user interface can help a wide range of users in retrieving integrated mapping information including agronomically important traits on the rice genome sequence. The database can be accessed at http://agri-trait.dna.affrc.go.jp/.

Candida albicans is an important etiological agent of superficial and life-threatening infections in individuals with compromised immune systems. To date, we know of several overlapping genetic networks that govern virulence attributes in this fungal pathogen. Classical use of deletion mutants has led to the discovery of numerous virulence factors over the years, and genome-wide functional analysis has propelled gene discovery at an even faster pace. Indeed, a number of recent studies using large-scale genetic screens followed by genome-wide functional analysis has allowed for the unbiased discovery of many new genes involved in C. albicans biology. Here we share our perspectives on the role of these studies in analyzing fundamental aspects of C. albicans virulence properties.

Mitochondria are essential for cellular energy production in most eukaryotic organisms. However, when glucose is abundant, yeast species that underwent whole-genome duplication (WGD) mostly conduct fermentation even under aerobic conditions, and most can survive without a functional mitochondrial genome. In this study, we show that the rate of evolution for the nuclear-encoded mitochondrial genes was greater in post-WGD species than pre-WGD species. Furthermore, codon usage bias was relaxed for these genes in post-WGD yeast species. The codon usage pattern and the distribution of a particular transcription regulatory element suggest that the change to an efficient aerobic fermentation lifestyle in this lineage might have emerged after WGD between the divergence of Kluyveromyces polysporus and Saccharomyces castellii from their common ancestor. This new energy production strategy could have led to the relaxation of mitochondrial function in the relevant yeast species. PMID:18669479

Respect for human life--a notion of worth uniting all members of the human race--constitutes a sense of anthropocentrism that has long been the justification for the enrollment of animals in experimentation executed to develop therapies to alleviate human suffering. Currently, however, advances in functionalgenomics are causing a qualitative transformation of the rationale for medical research performed on animals. The notion of human distinctness is being fundamentally challenged when gene sequences similar to those found in humans are identified in different species. In this Opinion article, we would like to highlight an inherent tension brought about by the current developments in functionalgenomics: a tension between the scientific and the ethical status of gene sequences. Is it reasonable to argue that they are the same for all practical purposes but different in ethical status?

leads for therapeutic targeting in breast cancer. We are employing the high throughput functionalgenomic screens using epithelial mesenchymal...of sequencing from in vitro and in vivo hits in stream. We anticipate completion in the coming year. Body Task 1: To identify gene products...focus on Vim induction at the invasive edge of formed tumors generated by shRNA transduction. Task 2: To identify gene products that may

The presence of chromosomes with diffuse centromeres (holocentric chromosomes) has been reported in several taxa since more than fifty years, but a full understanding of their origin is still lacking. Comparative and functionalgenomics are nowadays furnishing new data to better understand holocentric chromosome evolution thus opening new perspectives to analyse karyotype rearrangements in species with holocentric chromosomes in particular evidencing unusual common features, such as the uniform GC content and gene distribution along chromosomes. PMID:23372420

John Crow from the National Center for GenomeResources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

John Crow from the National Center for GenomeResources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

The quest to characterize each of the genes of the yeast Saccharomyces cerevisiae has propelled the development and application of novel high-throughput (HTP) experimental techniques. To handle the enormous amount of information generated by these techniques, new bioinformatics tools and resources are needed. Gene Ontology (GO) annotations curated by the Saccharomyces Genome Database (SGD) have facilitated the development of algorithms that analyze HTP data and help predict functions for poorly characterized genes in S. cerevisiae and other organisms. Here, we describe how published results are incorporated into GO annotations at SGD and why researchers can benefit from using these resources wisely to analyze their HTP data and predict gene functions. PMID:19577472

With dwindling fossil oil resources and increased economic growth of many developing countries due to globalization, energy driven from an alternative source such as bio-energy in a sustainable fashion is the need of the hour. However, production of energy from biological source is relatively expensive due to low starch and sugar contents of bioenergy plants leading to lower oil yield and reduced quality along with lower conversion efficiency of feedstock. In this context genetic improvement of bio-energy plants offers a viable solution. In this manuscript, we reviewed the current status of functionalgenomics studies and related patent activities in bio-energy plants. Currently, genomes of considerable bio-energy plants have been sequenced or are in progress and also large amount of expression sequence tags (EST) or cDNA sequences are available from them. These studies provide fundamental data for more reliable genome annotation and as a result, several genomes have been annotated in a genome-wide level. In addition to this effort, various mutagenesis tools have also been employed to develop mutant populations for characterization of genes that are involved in bioenergy quantitative traits. With the progress made on functionalgenomics of important bio-energy plants, more patents were filed with a significant number of them focusing on genes and DNA sequences which may involve in improvement of bio-energy traits including higher yield and quality of starch, sugar and oil. We also believe that these studies will lead to the generation of genetically altered plants with improved tolerance to various abiotic and biotic stresses.

Genetic conditions are individually rare but are common in aggregate, and they often present in the neonatal and early pediatric periods. These conditions are often severe, can be difficult to diagnose and manage, and may heavily affect patients, families, health care systems, and society. Because of recent technological advances, the availability and uptake of genetic and genomic testing are increasing rapidly. However, there is a dearth of trained geneticists and genetic counselors to help guide and explain these conditions and relevant tests. To help hospitalists, neonatologists, and related practitioners navigate this complex and evolving field, we have compiled a list of free (mostly Web-based) resources relevant to the diagnosis and management of genetic conditions and related disorders. These resources, which we describe individually, can be useful for nongeneticist clinicians, and some also include material that can be used to explain concepts and conditions to patients or families. The resources presented are divided into the following categories (which overlap): general information, databases of genetic conditions, resources that can help generate differential diagnoses, databases of genetic testing laboratories (to help with logistics of ordering tests), information on newborn screening, and other resources. We also include a separate list of helpful textbooks and manuals. We conclude with 2 examples describing how some of these resources would be used by a pediatric hospitalist or neonatologist during the inpatient management of a child with a suspected genetic condition.

We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3' UTRs, the area rich with regulatory regions.

Background Mind-body therapies (MBTs) are used throughout the world in treatment, disease prevention, and health promotion. However, the mechanisms by which MBTs exert their positive effects are not well understood. Investigations into MBTs using functionalgenomics have revolutionized the understanding of MBT mechanisms and their effects on human physiology. Methods We searched the literature for the effects of MBTs on functionalgenomics determinants using MEDLINE, supplemented by a manual search of additional journals and a reference list review. Results We reviewed 15 trials that measured global or targeted transcriptomic, epigenomic, or proteomic changes in peripheral blood. Sample sizes ranged from small pilot studies (n=2) to large trials (n=500). While the reliability of individual genes from trial to trial was often inconsistent, genes related to inflammatory response, particularly those involved in the nuclear factor-kappa B (NF-κB) pathway, were consistently downregulated across most studies. Conclusion In general, existing trials focusing on gene expression changes brought about by MBTs have revealed intriguing connections to the immune system through the NF-κB cascade, to telomere maintenance, and to apoptotic regulation. However, these findings are limited to a small number of trials and relatively small sample sizes. More rigorous randomized controlled trials of healthy subjects and specific disease states are warranted. Future research should investigate functionalgenomics areas both upstream and downstream of MBT-related gene expression changes—from epigenomics to proteomics and metabolomics. PMID:25598735

We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions. PMID:27774999

The ability to measure and quantify the fitness of an entire organism requires considerably more complex approaches than simply using traditional "omic" methods that examine, for example, the abundance of RNA transcripts, proteins, or metabolites. The yeast deletion collections represent the only systematic, comprehensive set of null alleles for any organism in which such fitness measurements can be assayed. Generated by the Saccharomyces Genome Deletion Project, these collections allow the systematic and parallel analysis of gene functions using any measurable phenotype. The unique 20-bp molecular barcodes engineered into the genome of each deletion strain facilitate the massively parallel analysis of individual fitness. Here, we present functionalgenomic protocols for use with the yeast deletion collections. We describe how to maintain, propagate, and store the deletion collections and how to perform growth fitness assays on single and parallel screening platforms. Phenotypic fitness analyses of the yeast mutants, described in brief here, provide important insights into biological functions, mechanisms of drug action, and response to environmental stresses. It is important to bear in mind that the specific assays described in this protocol represent some of the many ways in which these collections can be assayed, and in this description particular attention is paid to maximizing throughput using growth as the phenotypic measure.

With the completion of genome sequencing projects, the next challenge is to close the gap between gene annotation and gene functional assignment. Genomic tools to identify gene functions are based on the analysis of phenotypic variations between a wild-type and its mutant; hence, mutant collections are a valuable resource. In this sense, T-DNA collections allow for an easy and straightforward identification of the tagged gene, serving as the basis of both forward and reverse genetic strategies. This study reports on the phenotypic and molecular characterization of an enhancer trap T-DNA collection in tomato (Solanum lycopersicum L.), which has been produced by Agrobacterium-mediated transformation using a binary vector bearing a minimal promoter fused to the uidA reporter gene. Two genes have been isolated from different T-DNA mutants, one of these genes codes for a UTP-glucose-1-phosphate uridylyltransferase involved in programmed cell death and leaf development, which means a novel gene function reported in tomato. Together, our results support that enhancer trapping is a powerful tool to identify novel genes and regulatory elements in tomato and that this T-DNA mutant collection represents a highly valuable resource for functional analyses in this fleshy-fruited model species. This article is protected by copyright. All rights reserved.

The increasing availability of complete genomic sequences for cultured phototrophic bacteria and assembled metagenomes from environments dominated by phototrophs has reinforced the need for a "post-genomic" analytical effort to test models of cellular structure and function proposed from genomic data. Comparative genomics has produced a testable model for pathways of sulfur compound oxidation in the phototrophic bacteria. In the case of sulfide, two enzymes are predicted to oxidize sulfide: sulfide:quinone oxidoreductase and flavocytochrome c sulfide dehydrogenase. However, these models do not predict which enzyme is important under what conditions. In Chlorobaculum tepidum, a model green sulfur bacterium, a combination of genetics and physiological analysis of mutant strains has led to the realization that this organism contains at least two active sulfide:quinone oxidoreductases and that there is significant interaction between sulfide oxidation and light harvesting. In the case of elemental sulfur, an organothiol intermediate of unknown structure has been proposed to activate elemental sulfur for transport into the cytoplasm where it can be oxidized or assimilated, and recent approaches using classical metabolite analysis have begun to shed light on this issue both in C. tepidum and the purple sulfur bacterium Allochromatium vinosum.

The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at http://huref.jcvi.org. PMID:19036787

The goal of this article is to examine differential aging in everyday functioning between resource-rich and resource-poor older adults. Four groups of older adults were identified on the basis of 2 distinct resource factors: a Sensorimotor-Cognitive factor and a Social-Personality factor. The resource-richest group consisted of those participants who were above the median in both factors; those falling below the median in both factors comprised the resource-poorest group; and 2 additional groups consisted of older adults who were above the median in either 1 of the 2 factors. At the level of mean differences, the 4 groups differed in the length of the waking day, the variability in activities, the frequency of intellectual-cultural and social-relational activities, and resting times. Considering age differences there are more and larger negative age effects in the resource-poorest group than in the resource-richest one. The metamodel of selective optimization with compensation is used to interpret the findings.

We report here the whole-genome shotgun sequences of the strain UASWS1009 of the species Mesorhizobium hungaricum sp. nov., which are different from any other known Mesorhizobium species. This is the first genome registered for this new species, which could be considered as a potential resource for agriculture and environmental uses. PMID:27738050

The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomicresources such as carrot (Daucus carota) to access t...

We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genomeresources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.

We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genomeresources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents. PMID:20075255

The SOL Genomics Network (SGN; http://sgn.cornell.edu) is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such as potato (Solanum tuberosum), eggplant (Solanum melongena), pepper (Capsicum annuum), and tomato (Solanum lycopersicum). The aim of SGN is to relate these species to one another using a comparative genomics approach and to tie them to the other dicots through the fully sequenced genome of Arabidopsis (Arabidopsis thaliana). SGN currently houses map and marker data for Solanaceae species, a large expressed sequence tag collection with computationally derived unigene sets, an extensive database of phenotypic information for a mutagenized tomato population, and associated tools such as real-time quantitative trait loci. Recently, the International Solanaceae Project (SOL) was formed as an umbrella organization for Solanaceae research in over 30 countries to address important questions in plant biology. The first cornerstone of the SOL project is the sequencing of the entire euchromatic portion of the tomato genome. SGN is collaborating with other bioinformatics centers in building the bioinformatics infrastructure for the tomato sequencing project and implementing the bioinformatics strategy of the larger SOL project. The overarching goal of SGN is to make information available in an intuitive comparative format, thereby facilitating a systems approach to investigations into the basis of adaptation and phenotypic diversity in the Solanaceae family, other species in the Asterid clade such as coffee (Coffea arabica), Rubiaciae, and beyond. PMID:16010005

To facilitate the practical application of highly-efficient semiautomated methods for general application in genomic analyses, the authors have developed a fluorescence-based microsatellite marker resource. Ninety highly polymorphic microsatellite markers were combined to provide a rapid, accurate, and highly efficient initial genome-wide screening system. These markers are spaced on average every 33 cM, with a mean heterozygosity of 81% (range 65-94%), covering 22 autosomes and the X and Y chromosomes. Less than 10% of the genome lies beyond 20 cM of the nearest marker. Since this genomic analysis system is fully compatible with automated fragment analyzers using simultaneous four-color fluorescence-based detection systems, the 5 groups of 18 markers can be detected concurrently. This multiplex detection provides a throughput of 1944 genotypes daily per instrument. This system will be highly beneficial in a number of clinical and research applications including linkage, cancer genetics, forensics, and cytogenetics. 16 refs., 1 fig., 2 tabs.

The assessment of genomefunction requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functionalgenomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

A study explored current practice in organizing and resourcing training and development (T&D) using survey responses from over 100 major private and public sector employers and case studies of T&D functions in 6 organizations. Business drivers for T&D were senior management as customers; diagnosis of training as "the…

Understanding how environmental changes influence the pathogenicity and virulence of infectious agents is critical for predicting epidemiological patterns of disease. Thraustochytrids, part of the larger taxonomic class Labyrinthulomycetes, contain several highly pathogenic species, including the hard clam pathogen quahog parasite unknown (QPX). QPX has been associated with large-scale mortality events along the northeastern coast of North America. Growth and physiology of QPX is temperature-dependent, and changes in local temperature profiles influence pathogenicity. In this study we characterize the partial genome of QPX and examine the influence of temperature on gene expression. Genes involved in several biological processes are differentially expressed upon temperature change, including those associated with altered growth and metabolism and virulence. The genomic and transcriptomic resources developed in this study provide a foundation for better understanding virulence, pathogenicity and life history of thraustochytrid pathogens. PMID:24069279

Chickpea (Cicer arietinum) is the second most widely grown legume crop after soybean, accounting for a substantial proportion of human dietary nitrogen intake and playing a crucial role in food security in developing countries. We report the ∼738-Mb draft whole genome shotgun sequence of CDC Frontier, a kabuli chickpea variety, which contains an estimated 28,269 genes. Resequencing and analysis of 90 cultivated and wild genotypes from ten countries identifies targets of both breeding-associated genetic sweeps and breeding-associated balancing selection. Candidate genes for disease resistance and agronomic traits are highlighted, including traits that distinguish the two main market classes of cultivated chickpea--desi and kabuli. These data comprise a resource for chickpea improvement through molecular breeding and provide insights into both genome diversity and domestication.

Nuclear bodies contribute to non-random organization of the human genome and nuclear function. Using a major prototypical nuclear body, the Cajal body, as an example, we suggest that these structures assemble at specific gene loci located across the genome as a result of high transcriptional activity. Subsequently, target genes are physically clustered in close proximity in Cajal body-containing cells. However, Cajal bodies are observed in only a limited number of human cell types, including neuronal and cancer cells. Ultimately, Cajal body depletion perturbs splicing kinetics by reducing target small nuclear RNA (snRNA) transcription and limiting the levels of spliceosomal snRNPs, including their modification and turnover following each round of RNA splicing. As such, Cajal bodies are capable of shaping the chromatin interaction landscape and the transcriptome by influencing spliceosome kinetics. Future studies should concentrate on characterizing the direct influence of Cajal bodies upon snRNA gene transcriptional dynamics. Also see the video abstract here.

The use of CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein) for targeted genome editing has been widely adopted and is considered a "game changing" technology. The ease and rapidity by which this approach can be used to modify endogenous loci in a wide spectrum of cell types and organisms makes it a powerful tool for customizable genetic modifications as well as for large-scale functionalgenomics. The development of retrovirus-based expression platforms to simultaneously deliver the Cas9 nuclease and single guide (sg) RNAs provides unique opportunities by which to ensure stable and reproducible expression of the editing tools and a broad cell targeting spectrum, while remaining compatible with in vivo genetic screens. Here, we describe methods and highlight considerations for designing and generating sgRNA libraries in all-in-one retroviral vectors for such applications.

A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.

Some areas in plant abiotic stress research are not frequently addressed by genomic and molecular tools. One such area is the cross reaction of gravitational force with upward capillary pull of water and the mechanical-functional trade-off in plant vasculature. Although frost, drought and flooding stress greatly impact these physiological processes and consequently plant performance, the genomic and molecular basis of such trade-off is only sporadically addressed and so is its adaptive value. Embolism resistance is an important multiple stress- opposition trait and do offer scopes for critical insight to unravel and modify the input of living cells in the process and their biotechnological intervention may be of great importance. Vascular plants employ different physiological strategies to cope with embolism and variation is observed across the kingdom. The genomicresources in this area have started to emerge and open up possibilities of synthesis, validation and utilization of the new knowledge-base. This review article assesses the research till date on this issue and discusses new possibilities for bridging physiology and genomics of a plant, and foresees its implementation in crop science. PMID:24904619

Phomopsis seed decay of soybean is caused primarily by the seed-borne fungal pathogen Phomopsis longicolla (syn. Diaporthe longicolla). This disease severely decreases soybean seed quality, reduces seedling vigor and stand establishment, and suppresses yield. It is one of the most economically important soybean diseases. In this study we annotated the entire genome of P. longicolla isolate MSPL 10-6, which was isolated from field-grown soybean seed in Mississippi, USA. This study represents the first reported genome-wide functional annotation of a seed borne fungal pathogen in the Diaporthe-Phomopsis complex. The P. longicolla genome annotation will enable research into the genetic basis of fungal infection of soybean seed and provide information for the study of soybean-fungal interactions. The genome annotation will also be a valuable resource for the research and agricultural communities. It will aid in the development of new control strategies for this pathogen. The annotations can be found from: http://bioinformatics.towson.edu/phomopsis_longicolla/download.html. NCBI accession number is: AYRD00000000.

A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.

Chronic respiratory disorders are important contributors to the global burden of disease. Genome-wide association studies (GWASs) of lung function measures have identified several trait-associated loci, but explain only a modest portion of the phenotypic variability. We postulated that integrating pathway-based methods with GWASs of pulmonary function and airflow obstruction would identify a broader repertoire of genes and processes influencing these traits. We performed two independent GWASs of lung function and applied gene set enrichment analysis to one of the studies and validated the results using the second GWAS. We identified 131 significantly enriched gene sets associated with lung function and clustered them into larger biological modules involved in diverse processes including development, immunity, cell signaling, proliferation and arachidonic acid. We found that enrichment of gene sets was not driven by GWAS-significant variants or loci, but instead by those with less stringent association P-values. Next, we applied pathway enrichment analysis to a meta-analyzed GWAS of airflow obstruction. We identified several biologic modules that functionally overlapped with those associated with pulmonary function. However, differences were also noted, including enrichment of extracellular matrix (ECM) processes specifically in the airflow obstruction study. Network analysis of the ECM module implicated a candidate gene, matrix metalloproteinase 10 (MMP10), as a putative disease target. We used a knockout mouse model to functionally validate MMP10's role in influencing lung's susceptibility to cigarette smoke-induced emphysema. By integrating pathway analysis with population-based genomics, we unraveled biologic processes underlying pulmonary function traits and identified a candidate gene for obstructive lung disease.

Chronic respiratory disorders are important contributors to the global burden of disease. Genome-wide association studies (GWASs) of lung function measures have identified several trait-associated loci, but explain only a modest portion of the phenotypic variability. We postulated that integrating pathway-based methods with GWASs of pulmonary function and airflow obstruction would identify a broader repertoire of genes and processes influencing these traits. We performed two independent GWASs of lung function and applied gene set enrichment analysis to one of the studies and validated the results using the second GWAS. We identified 131 significantly enriched gene sets associated with lung function and clustered them into larger biological modules involved in diverse processes including development, immunity, cell signaling, proliferation and arachidonic acid. We found that enrichment of gene sets was not driven by GWAS-significant variants or loci, but instead by those with less stringent association P-values. Next, we applied pathway enrichment analysis to a meta-analyzed GWAS of airflow obstruction. We identified several biologic modules that functionally overlapped with those associated with pulmonary function. However, differences were also noted, including enrichment of extracellular matrix (ECM) processes specifically in the airflow obstruction study. Network analysis of the ECM module implicated a candidate gene, matrix metalloproteinase 10 (MMP10), as a putative disease target. We used a knockout mouse model to functionally validate MMP10's role in influencing lung's susceptibility to cigarette smoke-induced emphysema. By integrating pathway analysis with population-based genomics, we unraveled biologic processes underlying pulmonary function traits and identified a candidate gene for obstructive lung disease. PMID:26395457

Photolithotrophs are divided between those that use water as their electron donor (Cyanobacteria and the photosynthetic eukaryotes) and those that use a different electron donor (the anoxygenic photolithotrophs, all of them Bacteria). Photolithotrophs with the most reduced genomes have more genes than do the corresponding chemoorganotrophs, and the fastest-growing photolithotrophs have significantly lower specific growth rates than the fastest-growing chemoorganotrophs. Slower growth results from diversion of resources into the photosynthetic apparatus, which accounts for about half of the cell protein. There are inherent dangers in (especially oxygenic) photosynthesis, including the formation of reactive oxygen species (ROS) and blue light sensitivity of the water spitting apparatus. The extent to which photolithotrophs incur greater DNA damage and repair, and faster protein turnover with increased rRNA requirement, needs further investigation. A related source of environmental damage is ultraviolet B (UVB) radiation (280–320 nm), whose flux at the Earth's surface decreased as oxygen (and ozone) increased in the atmosphere. This oxygenation led to the requirements of defence against ROS, and decreasing availability to organisms of combined (non-dinitrogen) nitrogen and ferrous iron, and (indirectly) phosphorus, in the oxygenated biosphere. Differential codon usage in the genome and, especially, the proteome can lead to economies in the use of potentially growth-limiting elements PMID:23754816

Enhancers establish spatial or temporal patterns of gene expression that are critical for development, yet our understanding of how these DNA cis-regulatory elements function from a distance to increase transcription of their target genes and shape the cellular transcriptome has been gleaned primarily from studies of individual genes or gene families. High-throughput sequencing studies place enhancer-gene interactions within the 3D context of chromosome folding, inviting a new look at enhancer function and stimulating provocative new questions. Here, we integrate these whole-genome studies with recent mechanistic studies to illuminate how enhancers physically interact with target genes, how enhancer activity is regulated during development, and the role of noncoding RNAs transcribed from enhancers in their function.

The Algal Functional Annotation Tool is a bioinformatics resource to visualize pathway maps, identify enriched biological terms, or convert gene identifiers to elucidate biological function in silico. These types of analysis have been catered to support lists of gene identifiers, such as those coming from transcriptome gene expression analysis. By analyzing the functional annotation of an interesting set of genes, common biological motifs may be elucidated and a first-pass analysis can point further research in the right direction. Currently, the following databases have been parsed, processed, and added to the tool: 1( Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways Database, 2) MetaCyc Encyclopedia of Metabolic Pathways, 3) Panther Pathways Database, 4) Reactome Pathways Database, 5) Gene Ontology, 6) MapMan Ontology, 7) KOG (Eukaryotic Clusters of Orthologous Groups), 5)Pfam, 6) InterPro.

The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomicresources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomicresources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomicresources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

Rice is one of the main pillars of food security in India. Its improvement for higher yield in sustainable agriculture system is also vital to provide energy and nutritional needs of growing world population, expected to reach more than 9 billion by 2050. The high quality genome sequence of rice has provided a rich resource to mine information about diversity of genes and alleles which can contribute to improvement of useful agronomic traits. Defining the function of each gene and regulatory element of rice remains a challenge for the rice community in the coming years. Subsequent to participation in IRGSP, India has continued to contribute in the areas of diversity analysis, transcriptomics, functionalgenomics, marker development, QTL mapping and molecular breeding, through national and multi-national research programs. These efforts have helped generate resources for rice improvement, some of which have already been deployed to mitigate loss due to environmental stress and pathogens. With renewed efforts, Indian researchers are making new strides, along with the international scientific community, in both basic research and realization of its translational impact.

Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822

The differences between countries in national income, growth, human development and many other factors are used to classify countries into developed and developing countries. There are several classification systems that use different sets of measures and criteria. The most common classifications are the United Nations (UN) and the World Bank (WB) systems. The UN classification system uses the UN Human Development Index (HDI), an indicator that uses statistic of life expectancy, education, and income per capita for countries' classification. While the WB system uses gross national income (GNI) per capita that is calculated using the World Bank Atlas method. According to the UN and WB classification systems, there are 151 and 134 developing countries, respectively, with 89% overlap between the two systems. Developing countries have limited human development, and limited expenditure in education and research, among several other limitations. The biggest challenge facing genomic researchers and clinicians is limited resources. As a result, genomic tools, specifically genome sequencing technologies, which are rapidly becoming indispensable, are not widely available. In this report, we explore the current status of sequencing technologies in developing countries, describe the associated challenges and emphasize potential solutions.

Populations of Australia's largest terrestrial marsupial carnivore, the Tasmanian devil (Sarcophilus harrisii), are rapidly declining in the wild due to Tasmanian Devil Facial Tumour Disease (TDFTD). One tool which can reduce the loss of genetic diversity is genomeresource banking. This study examines the application of an oocyte vitrification protocol, initially developed in a model marsupial carnivore, to the endangered Tasmanian devil. Ovarian tissue was transported to the laboratory on ice from Tasmania which took up to 48 h. Individual granulosa oocyte complexes (GOC) were isolated enzymatically and the viability of oocytes from primary GOC was assessed immediately following isolation or after exposure to cold shock, vitrification and thawing media without exposure to liquid nitrogen or the full vitrification and thawing process. There was no decline in oocyte viability following cold shock or exposure to the vitrification and thawing media. Following the full vitrification and thawing process there was a decline in oocyte viability (chi(2)=20.0, P<0.001) but approximately 70% of oocytes remained viable. This study provides further evidence that oocyte vitrification is a promising strategy for genomeresource banking in carnivorous marsupials and suggests that it should be considered in conservation plans for the survival of the iconic Tasmanian devil.

Enterprise resource planning (ERP) software applications are designed to facilitate the systemwide integration of complex processes and functions across a large enterprise consisting of many internal and external constituents. Although most currently available ERP applications generally are tailored to the needs of the manufacturing industry, many large healthcare systems are investigating these applications. Due to the significant differences between manufacturing and patient care, ERP-based systems do not easily translate to the healthcare setting. In particular, the lack of clinical standardization impedes the use of ERP systems for clinical integration. Nonetheless, an ERP-based system can help a healthcare organization integrate many functions, including patient scheduling, human resources management, workload forecasting, and management of workflow, that are not directly dependent on clinical decision making.

Background Vegetables of the genus Allium are widely consumed but remain poorly understood genetically. Genetic mapping has been conducted in intraspecific crosses of onion (Allium cepa L.), A. fistulosum and interspecific crosses between A. roylei and these two species, but it has not been possible to access genetic maps and underlying data from these studies easily. Description An online comparative genomics database, AlliumMap, has been developed based on the GMOD CMap tool at http://alliumgenetics.org. It has been populated with curated data linking genetic maps with underlying markers and sequence data from multiple studies. It includes data from multiple onion mapping populations as well as the most closely related species A. roylei and A. fistulosum. Further onion EST-derived markers were evaluated in the A. cepa x A. roylei interspecific population, enabling merging of the AFLP-based maps. In addition, data concerning markers assigned in multiple studies to the Allium physical map using A. cepa-A. fistulosum alien monosomic addition lines have been compiled. The compiled data reveal extensive synteny between onion and A. fistulosum. Conclusions The database provides the first online resource providing genetic map and marker data from multiple Allium species and populations. The additional markers placed on the interspecific Allium map confirm the value of A. roylei as a valuable bridge between the genetics of onion and A. fistulosum and as a means to conduct efficient mapping of expressed sequence markers in Allium. The data presented suggest that comparative approaches will be valuable for genetic and genomic studies of onion and A. fistulosum. This online resource will provide a valuable means to integrate genetic and sequence-based explorations of Allium genomes. PMID:22559261

The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functionalgenomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens.

Animals sense light primarily by an opsin-based photopigment present in a photoreceptor cell. Cnidaria are arguably the most basal phylum containing a well-developed visual system. The evolutionary history of opsins in the animal kingdom has not yet been resolved. Here, we study the evolution of animal opsins by genome-wide analysis of the cubozoan jellyfish Tripedalia cystophora, a cnidarian possessing complex lens-containing eyes and minor photoreceptors. A large number of opsin genes with distinct tissue- and stage-specific expression were identified. Our phylogenetic analysis unequivocally classifies cubozoan opsins as a sister group to c-opsins and documents lineage-specific expansion of the opsin gene repertoire in the cubozoan genome. Functional analyses provided evidence for the use of the Gs-cAMP signaling pathway in a small set of cubozoan opsins, indicating the possibility that the majority of other cubozoan opsins signal via distinct pathways. Additionally, these tests uncovered subtle differences among individual opsins, suggesting possible fine-tuning for specific photoreceptor tasks. Based on phylogenetic, expression and biochemical analysis we propose that rapid lineage- and species-specific duplications of the intron-less opsin genes and their subsequent functional diversification promoted evolution of a large repertoire of both visual and extraocular photoreceptors in cubozoans.

The genome of budding yeast (Saccharomyces cerevisiae) contains approximately 5800 protein-encoding genes, the majority of which are associated with some known biological function. Yet the extent of amino acid sequence conservation of these genes over all phyla has only been partially examined. Here we provide a more comprehensive overview and visualization of the conservation of yeast genes and a means for browsing and exploring the data in detail, down to the individual yeast gene, at http://yeast-phylogroups.princeton.edu. We used data from the OrthoMCL database, which has defined orthologs from approximately 150 completely sequenced genomes, including diverse representatives of the archeal, bacterial, and eukaryotic domains. By clustering genes based on similar patterns of conservation, we organized and visualized all the protein-encoding genes in yeast as a single heat map. Most genes fall into one of eight major clusters, called “phylogroups.” Gene ontology analysis of the phylogroups revealed that they were associated with specific, distinct trends in gene function, generalizations likely to be of interest to a wide range of biologists. PMID:23749449

Animals sense light primarily by an opsin-based photopigment present in a photoreceptor cell. Cnidaria are arguably the most basal phylum containing a well-developed visual system. The evolutionary history of opsins in the animal kingdom has not yet been resolved. Here, we study the evolution of animal opsins by genome-wide analysis of the cubozoan jellyfish Tripedalia cystophora, a cnidarian possessing complex lens-containing eyes and minor photoreceptors. A large number of opsin genes with distinct tissue- and stage-specific expression were identified. Our phylogenetic analysis unequivocally classifies cubozoan opsins as a sister group to c-opsins and documents lineage-specific expansion of the opsin gene repertoire in the cubozoan genome. Functional analyses provided evidence for the use of the Gs-cAMP signaling pathway in a small set of cubozoan opsins, indicating the possibility that the majority of other cubozoan opsins signal via distinct pathways. Additionally, these tests uncovered subtle differences among individual opsins, suggesting possible fine-tuning for specific photoreceptor tasks. Based on phylogenetic, expression and biochemical analysis we propose that rapid lineage- and species-specific duplications of the intron-less opsin genes and their subsequent functional diversification promoted evolution of a large repertoire of both visual and extraocular photoreceptors in cubozoans. PMID:26154478

The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question.

Current Zika virus (ZIKV) outbreaks that spread in several areas of Africa, Southeast Asia, and in pacific islands is declared as a global health emergency by World Health Organization (WHO). It causes Zika fever and illness ranging from severe autoimmune to neurological complications in humans. To facilitate research on this virus, we have developed an integrative multi-omics platform; ZikaVR (http://bioinfo.imtech.res.in/manojk/zikavr/), dedicated to the ZIKV genomic, proteomic and therapeutic knowledge. It comprises of whole genome sequences, their respective functional information regarding proteins, genes, and structural content. Additionally, it also delivers sophisticated analysis such as whole-genome alignments, conservation and variation, CpG islands, codon context, usage bias and phylogenetic inferences at whole genome and proteome level with user-friendly visual environment. Further, glycosylation sites and molecular diagnostic primers were also analyzed. Most importantly, we also proposed potential therapeutically imperative constituents namely vaccine epitopes, siRNAs, miRNAs, sgRNAs and repurposing drug candidates. PMID:27633273

Current Zika virus (ZIKV) outbreaks that spread in several areas of Africa, Southeast Asia, and in pacific islands is declared as a global health emergency by World Health Organization (WHO). It causes Zika fever and illness ranging from severe autoimmune to neurological complications in humans. To facilitate research on this virus, we have developed an integrative multi-omics platform; ZikaVR (http://bioinfo.imtech.res.in/manojk/zikavr/), dedicated to the ZIKV genomic, proteomic and therapeutic knowledge. It comprises of whole genome sequences, their respective functional information regarding proteins, genes, and structural content. Additionally, it also delivers sophisticated analysis such as whole-genome alignments, conservation and variation, CpG islands, codon context, usage bias and phylogenetic inferences at whole genome and proteome level with user-friendly visual environment. Further, glycosylation sites and molecular diagnostic primers were also analyzed. Most importantly, we also proposed potential therapeutically imperative constituents namely vaccine epitopes, siRNAs, miRNAs, sgRNAs and repurposing drug candidates.

Multi-scale resource selection modeling is used to identify factors that limit species distributions across scales of space and time. This multi-scale nature of habitat suitability complicates the translation of inferences to single, spatial depictions of habitat required for conservation of species. We estimated resource selection functions (RSFs) across three scales for a threatened ungulate, woodland caribou (Rangifer tarandus caribou), with two objectives: (1) to infer the relative effects of two forms of anthropogenic disturbance (forestry and linear features) on woodland caribou distributions at multiple scales and (2) to estimate scale-integrated resource selection functions (SRSFs) that synthesize results across scales for management-oriented habitat suitability mapping. We found a previously undocumented scale-specific switch in woodland caribou response to two forms of anthropogenic disturbance. Caribou avoided forestry cut-blocks at broad scales according to first- and second-order RSFs and avoided linear features at fine scales according to third-order RSFs, corroborating predictions developed according to predator-mediated effects of each disturbance type. Additionally, a single SRSF validated as well as each of three single-scale RSFs when estimating habitat suitability across three different spatial scales of prediction. We demonstrate that a single SRSF can be applied to predict relative habitat suitability at both local and landscape scales in support of critical habitat identification and species recovery.

Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica–specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing

Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica-specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing

Altering gene dosage through variation in gene copy number is a powerful approach to addressing questions regarding gene regulation, quantitative trait loci, and heterosis, but one that is not easily applied to sexually transmitted species. Elite poplar (Populus spp) varieties are created through interspecific hybridization, followed by clonal propagation. Altered gene dosage relationships are believed to contribute to hybrid performance. Clonal propagation allows for replication and maintenance of meiotically unstable ploidy or structural variants and provides an alternative approach to investigating gene dosage effects not possible in sexually propagated species. Here, we built a genome-wide structural variation system for dosage-based functionalgenomics and breeding of poplar. We pollinated Populus deltoides with gamma-irradiated Populus nigra pollen to produce >500 F1 seedlings containing dosage lesions in the form of deletions and insertions of chromosomal segments (indel mutations). Using high-precision dosage analysis, we detected indel mutations in ∼55% of the progeny. These indels varied in length, position, and number per individual, cumulatively tiling >99% of the genome, with an average of 10 indels per gene. Combined with future phenotype and transcriptome data, this population will provide an excellent resource for creating and characterizing dosage-based variation in poplar, including the contribution of dosage to quantitative traits and heterosis. PMID:26320226

The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functionalgenomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390

The advent of deep sequencing technologies has resulted in the deciphering of tremendous amounts of genetic information. These data have led to major discoveries, and many anecdotes now exist of individual patients whose clinical outcomes have benefited from novel, genetically guided therapeutic strategies. However, the majority of genetic events in cancer are currently undrugged, leading to a biological gap between understanding of tumor genetic etiology and translation to improved clinical approaches. Functional screening has made tremendous strides in recent years with the development of new experimental approaches to studying ex vivo and in vivo drug sensitivity. Numerous discoveries and anecdotes also exist for translation of functional screening into novel clinical strategies; however, the current clinical application of functional screening remains largely confined to small clinical trials at specific academic centers. The intersection between genomic and functional approaches represents an ideal modality to accelerate our understanding of drug sensitivities as they relate to specific genetic events and further understand the full mechanisms underlying drug sensitivity patterns. PMID:28299357

The advent of deep sequencing technologies has resulted in the deciphering of tremendous amounts of genetic information. These data have led to major discoveries, and many anecdotes now exist of individual patients whose clinical outcomes have benefited from novel, genetically guided therapeutic strategies. However, the majority of genetic events in cancer are currently undrugged, leading to a biological gap between understanding of tumor genetic etiology and translation to improved clinical approaches. Functional screening has made tremendous strides in recent years with the development of new experimental approaches to studying ex vivo and in vivo drug sensitivity. Numerous discoveries and anecdotes also exist for translation of functional screening into novel clinical strategies; however, the current clinical application of functional screening remains largely confined to small clinical trials at specific academic centers. The intersection between genomic and functional approaches represents an ideal modality to accelerate our understanding of drug sensitivities as they relate to specific genetic events and further understand the full mechanisms underlying drug sensitivity patterns.

Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications.

Increasing food production is essential to meet the demands of a growing human population, with its rising income levels and nutritional expectations. To address the demand, plant breeders seek new sources of genetic variation to enhance the productivity, sustainability and resilience of crop varieties. Here we launch a high-resolution, open-access research platform to facilitate genome-wide association mapping in rice, a staple food crop. The platform provides an immortal collection of diverse germplasm, a high-density single-nucleotide polymorphism data set tailored for gene discovery, well-documented analytical strategies, and a suite of bioinformatics resources to facilitate biological interpretation. Using grain length, we demonstrate the power and resolution of our new high-density rice array, the accompanying genotypic data set, and an expanded diversity panel for detecting major and minor effect QTLs and subpopulation-specific alleles, with immediate implications for rice improvement. PMID:26842267

Pectin acetylation influences the gelling ability of this important plant polysaccharide for the food industry. Plant apoplastic pectinacetylesterases (PAEs) play a key role in regulating the degree of pectin acetylation and modifying their expression thus represents one way to engineer plant polysaccharides for food applications. Identifying the major active enzymes within the PAE gene family will aid in our understanding of this biological phenomena as well as provide the tools for direct trait manipulation. Using comparative genomics we propose that there is a minimal set of 4 distinct PAEs in plants. Possible functional diversification of the PAE family in the grasses is also explored with the identification of 3 groups of PAE genes specific to grasses. PMID:26237162

Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

We report here the complete annotated genome sequence of Flavobacterium psychrophilum OSU THCO2-90, isolated from Coho salmon (Oncorhynchus kisutch) in Oregon. The genome consists of a circular chromosome with 2,343 predicted open reading frames. This strain has proved to be a valuable tool for functionalgenomics.

Background The horn fly, Haematobia irritans (Linnaeus, 1758) (Diptera: Muscidae) is one of the most important ectoparasites of pastured cattle. Horn flies infestations reduce cattle weight gain and milk production. Additionally, horn flies are mechanical vectors of different pathogens that cause disease in cattle. The aim of this study was to conduct a functionalgenomics study in female horn flies using Expressed Sequence Tags (EST) analysis and RNA interference (RNAi). Results A cDNA library was made from whole abdominal tissues collected from partially fed adult female horn flies. High quality horn fly ESTs (2,160) were sequenced and assembled into 992 unigenes (178 contigs and 814 singlets) representing molecular functions such as serine proteases, cell metabolism, mitochondrial function, transcription and translation, transport, chromatin structure, vitellogenesis, cytoskeleton, DNA replication, cell response to stress and infection, cell proliferation and cell-cell interactions, intracellular trafficking and secretion, and development. Functional analyses were conducted using RNAi for the first time in horn flies. Gene knockdown by RNAi resulted in higher horn fly mortality (protease inhibitor functional group), reduced oviposition (vitellogenin, ferritin and vATPase groups) or both (immune response and 5'-NUC groups) when compared to controls. Silencing of ubiquitination ESTs did not affect horn fly mortality and ovisposition while gene knockdown in the ferritin and vATPse functional groups reduced mortality when compared to controls. Conclusions These results advanced the molecular characterization of this important ectoparasite and suggested candidate protective antigens for the development of vaccines for the control of horn fly infestations. PMID:21310032

A high-quality draft genome for Proctacanthus coquilletti (Insecta: Diptera: Asilidae) is presented along with transcriptomes for 16 Diptera species from five families: Asilidae, Apioceridae, Bombyliidae, Mydidae, and Tabanidae. Genome sequencing reveals that P. coquilletti has a genome size of approximately 210 Mbp and remarkably low heterozygosity (0.47%) and few repeats (15%). These characteristics helped produce a highly contiguous (N50 = 862 kbp) assembly, particularly given that only a single 2 × 250 bp PCR-free Illumina library was sequenced. A phylogenomic hypothesis is presented based on thousands of putative orthologs across the 16 transcriptomes. Phylogenetic relationships support the sister group relationship of Apioceridae + Mydidae to Asilidae. A time-calibrated phylogeny is also presented, with seven fossil calibration points, which suggests an older age of the split among Apioceridae, Asilidae, and Mydidae (158 mya) and Apioceridae and Mydidae (135 mya) than proposed in the AToL FlyTree project. Future studies will be able to take advantage of the resources presented here in order to produce large scale phylogenomic and evolutionary studies of assassin fly phylogeny, life histories, or venom. The bioinformatics tools and workflow presented here will be useful to others wishing to generate de novo genomicresources in species-rich taxa without a closely-related reference genome.

A high-quality draft genome for Proctacanthus coquilletti (Insecta: Diptera: Asilidae) is presented along with transcriptomes for 16 Diptera species from five families: Asilidae, Apioceridae, Bombyliidae, Mydidae, and Tabanidae. Genome sequencing reveals that P. coquilletti has a genome size of approximately 210 Mbp and remarkably low heterozygosity (0.47%) and few repeats (15%). These characteristics helped produce a highly contiguous (N50 = 862 kbp) assembly, particularly given that only a single 2 × 250 bp PCR-free Illumina library was sequenced. A phylogenomic hypothesis is presented based on thousands of putative orthologs across the 16 transcriptomes. Phylogenetic relationships support the sister group relationship of Apioceridae + Mydidae to Asilidae. A time-calibrated phylogeny is also presented, with seven fossil calibration points, which suggests an older age of the split among Apioceridae, Asilidae, and Mydidae (158 mya) and Apioceridae and Mydidae (135 mya) than proposed in the AToL FlyTree project. Future studies will be able to take advantage of the resources presented here in order to produce large scale phylogenomic and evolutionary studies of assassin fly phylogeny, life histories, or venom. The bioinformatics tools and workflow presented here will be useful to others wishing to generate de novo genomicresources in species-rich taxa without a closely-related reference genome. PMID:28168115

The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

A great deal of data in functionalgenomics studies needs to be annotated with low-resolution anatomical terms. For example, gene expression assays based on manually dissected samples (microarray, SAGE, etc.) need high-level anatomical terms to describe sample origin. First-pass annotation in high-throughput assays (e.g. large-scale in situ gene expression screens or phenotype screens) and bibliographic applications, such as selection of keywords, would also benefit from a minimum set of standard anatomical terms. Although only simple terms are required, the researcher faces serious practical problems of inconsistency and confusion, given the different aims and the range of complexity of existing anatomy ontologies. A Standards and Ontologies for FunctionalGenomics (SOFG) group therefore initiated discussions between several of the major anatomical ontologies for higher vertebrates. As we report here, one result of these discussions is a simple, accessible, controlled vocabulary of gross anatomical terms, the SOFG Anatomy Entry List (SAEL). The SAEL is available from http://www.sofg.org and is intended as a resource for biologists, curators, bioinformaticians and developers of software supporting functionalgenomics. It can be used directly for annotation in the contexts described above. Importantly, each term is linked to the corresponding term in each of the major anatomy ontologies. Where the simple list does not provide enough detail or sophistication, therefore, the researcher can use the SAEL to choose the appropriate ontology and move directly to the relevant term as an entry point. The SAEL links will also be used to support computational access to the respective ontologies. PMID:18629134

Understanding how the metazoan genome is used during development and cell differentiation is one of the major challenges in the postgenomic era. Early studies in Drosophila suggested that three-dimensional (3D) chromosome organization plays important regulatory roles in this process and recent technological advances started to reveal connections at the molecular level. Here we will consider general features of the architectural organization of the Drosophila genome, providing historical perspective and insights from recent work. We will compare the linear and spatial segmentation of the fly genome and focus on the two key regulators of genome architecture: insulator components and Polycomb group proteins. With its unique set of genetic tools and a compact, well annotated genome, Drosophila is poised to remain a model system of choice for rapid progress in understanding principles of genome organization and to serve as a proving ground for development of 3D genome-engineering techniques. PMID:28049701

We report an update of the Bovine Genome Database (BGD) (http://BovineGenome.org). The goal of BGD is to support bovine genomics research by providing genome annotation and data mining tools. We have developed new genome and annotation browsers using JBrowse and WebApollo for two Bos taurus genome assemblies, the reference genome assembly (UMD3.1.1) and the alternate genome assembly (Btau_4.6.1). Annotation tools have been customized to highlight priority genes for annotation, and to aid annotators in selecting gene evidence tracks from 91 tissue specific RNAseq datasets. We have also developed BovineMine, based on the InterMine data warehousing system, to integrate the bovine genome, annotation, QTL, SNP and expression data with external sources of orthology, gene ontology, gene interaction and pathway information. BovineMine provides powerful query building tools, as well as customized query templates, and allows users to analyze and download genome-wide datasets. With BovineMine, bovine researchers can use orthology to leverage the curated gene pathways of model organisms, such as human, mouse and rat. BovineMine will be especially useful for gene ontology and pathway analyses in conjunction with GWAS and QTL studies.

Autism spectrum disorders (ASD) are highly heritable complex neurodevelopmental disorders with a 4:1 male: female ratio. Common genetic variation could explain 40–60% of the variance in liability to autism. Because of their small effect, genome-wide association studies (GWASs) have only identified a small number of individual single-nucleotide polymorphisms (SNPs). To increase the power of GWASs in complex disorders, methods like convergent functionalgenomics (CFG) have emerged to extract true association signals from noise and to identify and prioritize genes from SNPs using a scoring strategy combining statistics and functionalgenomics. We adapted and applied this approach to analyze data from a GWAS performed on families with multiple children affected with autism from Autism Speaks Autism Genetic Resource Exchange (AGRE). We identified a set of 133 candidate markers that were localized in or close to genes with functional relevance in ASD from a discovery population (545 multiplex families); a gender specific genetic score (GS) based on these common variants explained 1% (P = 0.01 in males) and 5% (P = 8.7 × 10−7 in females) of genetic variance in an independent sample of multiplex families. Overall, our work demonstrates that prioritization of GWAS data based on functionalgenomics identified common variants associated with autism and provided additional support for a common polygenic background in autism. PMID:24600472

The availability of the complete DNA sequence of the bacterial genome of Nitrosomonas europaea offered the opportunity for unprecedented and detailed investigations of function. We studied the function of genes involved in carbohydrate and Fe metabolism. N. europaea has genes for the synthesis and degradation of glycogen and sucrose but cannot grow on substrates other than ammonia and CO2. Granules of glycogen were detected in whole cells by electron microscopy and quantified in cell-free extracts by enzymatic methods. The cellular glycogen and sucrose content varied depending on the composition of the growth medium and cellular growth stage. N. europaea also depends heavily on iron for metabolism of ammonia, is particularly interesting since it lacks genes for siderophore production, and has genes with only low similarity to known iron reductases, yet grows relatively well in medium containing low Fe. By comparing the transcriptomes of cells grown in iron-replete medium versus iron-limited medium, 247 genes were identified as differentially expressed. Mutant strains deficient in genes for sucrose, glycogen and iron metabolism were created and are being used to further our understanding of ammonia oxidizing bacteria.

The Octamer-binding proteins (Oct) are a group of highly conserved transcription factors that specifically bind to the octamer motif (ATGCAAAT) and closely related sequences that are found in promoters and enhancers of a wide variety of both ubiquitously expressed and cell type-specific genes. Oct factors belong to the larger family of POU domain factors that are characterized by the presence of a highly conserved bipartite DNA binding domain, consisting of an amino-terminal specific subdomain (POUS) and a carboxyl-terminal homeo-subdomain (POUH). Eleven Oct proteins have been named (Oct1-11), and currently, eight genes encoding Oct proteins (Oct1, Oct2, Oct3/4, Oct6, Oct7, Oct8, Oct9, and Oct11) have been cloned and characterized. Oct1 and Oct2 are widely expressed in adult tissues, while other Oct proteins are much more restricted in their expression patterns. Oct proteins are implicated in crucial and versatile biological events, such as embryogenesis, neurogenesis, immunity, and body glucose and amino acid metabolism. The aberrant expression and null function of Oct proteins have also been linked to various diseases, including deafness, diabetes and cancer. In this review, I will report both the genomic structure and major functions of individual Oct proteins in physiological and pathological processes. PMID:23747866

The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomicresources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...

A prototype oligonucleotide ''functional chip'' has been developed to screen novel DNA repair proteins for their ability to bind or alter different forms of DNA. This chip has been developed as a functionalgenomics screen for analysis of protein-DNA interactions for novel proteins identified from the Human Genome Project The process of novel gene identification that has ensued as a consequence of available sequence information is remarkable. The challenge how lies in determining the function of newly identified gene products in a time-and cost-effective high-throughput manner. The functional chip is generated by the robotic application of DNA spotted in a microarray format onto a glass slide. Individual proteins are then analyzed against the different form of DNA bound to the slide. Several prototype functional chips were designed to contain various DNA fragments tethered to a glass slide for analysis of protein-DNA binding or enzymatic activity of known proteins. The technology has been developed to screen novel, putative DNA repair proteins for their ability to bind various types of DNA alone and in concert with protein partners. An additional scheme has been devised to screen putative repair enzymes for their ability to process different types of DNA molecules. Current methods to analyze gene expression primarily utilize either of two technologies. The oligonucleotide chip, pioneered by Fodor and co-workers and Affymetrix, Inc., consists of greater than 64,000 oligonucleotides attached in situ to a glass support. The oligonucleotide chip has been used primarily to identify specific mutations in a given gene by hybridization against a fluorescently-labeled substrate. The second method is the microarray, whereby DNA targets are systematically arranged on a glass slide and then hybridized with fluorescently-labeled complex targets for gene expression analysis (Jordan, 1998). By this technique, a large amount of information can be obtained examining global

The availability of genomic information significantly increases the number of potential targets available for drug discovery, although the function of many targets and their relationship to disease is unknown. In a chemical genomic research approach, ultra-high throughput screening (uHTS) of genomic targets takes place early in the drug discovery process, before target validation. Target-selective modulators then provide drug leads and pharmacological research tools to validate target function. Effective implementation of a chemical genomic strategy requires assays that can perform uHTS for large numbers of genomic targets. Cell-based functional assays are capable of the uHTS throughput required for chemical genomic research, and their functional nature provides distinct advantages over ligand-binding assays in the identification of target-selective modulators.

A set of new data types emerged from functionalgenomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functionalgenomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functionalgenomic data. GeNemo searches user-input data against online functionalgenomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functionalgenomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functionalgenomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org.

Genomic features such as rate of recombination and differentiation have been suggested to play a role in species divergence. However, the relationship of these phenomena to functional organization of the genome in the context of reproductive isolation remains unexplored. Here, we examine genomic characteristics of the species boundaries between two house mouse subspecies (Mus musculus musculus/M. m. domesticus). These taxa form a narrow semipermeable zone of secondary contact across Central Europe. Due to the incomplete nature of reproductive isolation, gene flow in the zone varies across the genome. We present an analysis of genomic differentiation, rate of recombination, and functional composition of genes relative to varying amounts of introgression. We assessed introgression using 1,316 autosomal single nucleotide polymorphism markers, previously genotyped in hybrid populations from three transects. We found a significant relationship between amounts of introgression and both genomic differentiation and rate of recombination with genomic regions of reduced introgression associated with higher genomic differentiation and lower rates of recombination, and the opposite for genomic regions of extensive introgression. We also found a striking functional polarization of genes based on where they are expressed in the cell. Regions of elevated introgression exhibit a disproportionate number of genes involved in signal transduction functioning at the cell periphery, among which olfactory receptor genes were found to be the most prominent group. Conversely, genes expressed intracellularly and involved in DNA binding were the most prevalent in regions of reduced introgression. We hypothesize that functional organization of the genome is an important driver of species divergence.

... can be found on the web, through local libraries, your health care provider, and the yellow pages under "social service organizations." AIDS - resources Alcoholism - resources Allergy - resources ...

Pregnancy success and life-long health depend on a cooperative interaction between the mother and the fetus in the allocation of resources. As the site of materno-fetal nutrient transfer, the placenta is central to this interplay; however, the relative importance of the maternal versus fetal genotypes in modifying the allocation of resources to the fetus is unknown. Using genetic inactivation of the growth and metabolism regulator, Pik3ca (encoding PIK3CA also known as p110α, α/+), we examined the interplay between the maternal genome and the fetal genome on placental phenotype in litters of mixed genotype generated through reciprocal crosses of WT and α/+ mice. We demonstrate that placental growth and structure were impaired and associated with reduced growth of α/+ fetuses. Despite its defective development, the α/+ placenta adapted functionally to increase the supply of maternal glucose and amino acid to the fetus. The specific nature of these changes, however, depended on whether the mother was α/+ or WT and related to alterations in endocrine and metabolic profile induced by maternal p110α deficiency. Our findings thus show that the maternal genotype and environment programs placental growth and function and identify the placenta as critical in integrating both intrinsic and extrinsic signals governing materno-fetal resource allocation.

Mammals, including human beings, have evolved a unique viviparous reproductive system and a highly developed central nervous system. How did these unique characteristics emerge in mammalian evolution, and what kinds of changes did occur in the mammalian genomes as evolution proceeded? A key conceptual term in approaching these issues is “mammalian-specific genomic functions”, a concept covering both mammalian-specific epigenetics and genetics. Genomic imprinting and LTR retrotransposon-derived genes are reviewed as the representative, mammalian-specific genomicfunctions that are essential not only for the current mammalian developmental system, but also mammalian evolution itself. First, the essential roles of genomic imprinting in mammalian development, especially related to viviparous reproduction via placental function, as well as the emergence of genomic imprinting in mammalian evolution, are discussed. Second, we introduce the novel concept of “mammalian-specific traits generated by mammalian-specific genes from LTR retrotransposons”, based on the finding that LTR retrotransposons served as a critical driving force in the mammalian evolution via generating mammalian-specific genes. PMID:26666304

Genes important to aphid biology, survival and reproduction were successfully identified by use of a genomics approach. We created and described the Sequencing, compilation, and annotation of the approxiamtely 525Mb nuclear genome of the pea aphid, Acyrthosiphon pisum, which represents an important ...

Technological advancement has opened the door to systematic genetics in mammalian cells. Genome-scale loss-of-function screens can assay fitness defects induced by partial gene knockdown, using RNA interference, or complete gene knockout, using new CRISPR techniques. These screens can reveal the basic blueprint required for cellular proliferation. Moreover, comparing healthy to cancerous tissue can uncover genes that are essential only in the tumor; these genes are targets for the development of specific anticancer therapies. Unfortunately, progress in this field has been hampered by off-target effects of perturbation reagents and poorly quantified error rates in large-scale screens. To improve the quality of information derived from these screens, and to provide a framework for understanding the capabilities and limitations of CRISPR technology, we derive gold-standard reference sets of essential and nonessential genes, and provide a Bayesian classifier of gene essentiality that outperforms current methods on both RNAi and CRISPR screens. Our results indicate that CRISPR technology is more sensitive than RNAi and that both techniques have nontrivial false discovery rates that can be mitigated by rigorous analytical methods. PMID:24987113

The development of bioadhesives inspired from marine animals is a promising approach to generate new tissue-compatible medical components. A number of marine species, through their adhesive properties, also represent significant foulers that become increasingly problematic to aquaculture, shipping or local biodiversity. In order to develop more sophisticated man-made glues and/or efficient fouling resistant surfaces, it is important to understand the mechanical, structural and molecular properties of adhesive organs in selected species. Ascidians are marine invertebrates with larvae that opportunistically attach to almost any type of submerged surface to undergo metamorphosis into permanently sessile adults. Not only do they represent a globally important fouling organism, but they are becoming increasingly popular as model organisms for developmental biology. The latter is due to their phylogenetic position as the sister group to the vertebrates and their cellular and molecular accessibility for experimentation. In this paper, we review the mechanisms of larval adhesion in ascidians and draw conclusions from comparative analyses of selected species. We further discuss how knowledge from a developmental and functionalgenomics point of view can advance our understanding of cellular and molecular signatures and their hierarchical usage in animal adhesive organs. PMID:25657840

Background Several data formats have been developed for large scale biological experiments, using a variety of methodologies. Most data formats contain a mechanism for allowing extensions to encode unanticipated data types. Extensions to data formats are important because the experimental methodologies tend to be fairly diverse and rapidly evolving, which hinders the creation of formats that will be stable over time. Results In this paper we review the data formats that exist in functionalgenomics, some of which have become de facto or de jure standards, with a particular focus on how each domain has been modelled, and how each format allows extensions. We describe the tasks that are frequently performed over data formats and analyse how well each task is supported by a particular modelling structure. Conclusion From our analysis, we make recommendations as to the types of modelling structure that are most suitable for particular types of experimental annotation. There are several standards currently under development that we believe could benefit from systematically following a set of guidelines. PMID:16188029

Strigolactones (SLs) represent an important new plant hormone class marked by their multifunctional role in plant and rhizosphere interactions. These compounds stimulate hyphal branching in arbuscular mycorrhizal fungi (AMF) and seed germination of root parasitic plants. In addition, they are involved in the control of plant architecture by inhibiting bud outgrowth as well as many other morphological and developmental processes together with other plant hormones such as auxins and cytokinins. The biosynthetic pathway of SLs that are derived from carotenoids was partially decrypted based on the identification of mutants from a variety of plant species. Only a few SL biosynthetic and regulated genes and related regulatory transcription factors have been identified. However, functionalgenomics and epigenetic studies started to give first elements on the modality of the regulation of SLs related genes. Since they control plant architecture and plant-rhizosphere interaction, SLs start to be used for agronomical and biotechnological applications. Furthermore, the genes involved in the SL biosynthetic pathway and genes regulated by SL constitute interesting targets for plant breeding. Therefore, it is necessary to decipher and better understand the genetic determinants of their regulation at different levels.

Advances in pig gene identification, mapping and functional analysis have continued to make rapid progress. The porcine genetic linkage map now has nearly 3000 loci, including several hundred genes, and is likely to expand considerably in the next few years, with many more genes and amplified fragment length polymorphism (AFLP) markers being added to the map. The physical genetic map is also growing rapidly and has over 3000 genes and markers. Several recent quantitative trait loci (QTL) scans and candidate gene analyses have identified important chromosomal regions and individual genes associated with traits of economic interest. The commercial pig industry is actively using this information and traditional performance information to improve pig production by marker-assisted selection (MAS). Research to study the co-expression of thousands of genes is now advancing and methods to combine these approaches to aid in gene discovery are under way. The pig's role in xenotransplantation and biomedical research makes the study of its genome important for the study of human disease. This review will briefly describe advances made, directions for future research and the implications for both the pig industry and human health. PMID:18629119

Here, we describe the construction of a phylogenetically deep, whole-genome alignment of 20 flowering plants, along with an analysis of plant genome conservation. Each included angiosperm genome was aligned to a reference genome, Arabidopsis thaliana, using the LASTZ/MULTIZ paradigm and tools from the University of California-Santa Cruz Genome Browser source code. In addition to the multiple alignment, we created a local genome browser displaying multiple tracks of newly generated genome annotation, as well as annotation sourced from published data of other research groups. An investigation into A. thaliana gene features present in the aligned A. lyrata genome revealed better conservation of start codons, stop codons, and splice sites within our alignments (51% of features from A. thaliana conserved without interruption in A. lyrata) when compared with previous publicly available plant pairwise alignments (34% of features conserved). The detailed view of conservation across angiosperms revealed not only high coding-sequence conservation but also a large set of previously uncharacterized intergenic conservation. From this, we annotated the collection of conserved features, revealing dozens of putative noncoding RNAs, including some with recorded small RNA expression. Comparing conservation between kingdoms revealed a faster decay of vertebrate genome features when compared with angiosperm genomes. Finally, conserved sequences were searched for folding RNA features, including but not limited to noncoding RNA (ncRNA) genes. Among these, we highlight a double hairpin in the 5'-untranslated region (5'-UTR) of the PRIN2 gene and a putative ncRNA with homology targeting the LAF3 protein.

The cytochrome P450 superfamily is responsible primarily for human drug metabolism, which is of critical importance for the drug discovery and development. Rapid advancement of bioinformatics, functionalgenomics and metabolomics has been made over the last decade. These disciplines are essential in target identification, lead discovery and optimization. In this review, we summarize the recent progress on cytochrome P450 and its role on drug metabolism in the context of bioinformatics, functionalgenomics and metabolomics. Data are integrated into various databases and web-based platforms on cytochrome P450. These research tools and resources are playing an increasingly important role in drug discovery, and are helping in achieving the ultimate goal of personalized medicine, that is, to prescribe personalized drugs according to each person's genetic makeup, metabolic level, and drug disposition.

Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.

Comparative genomics combined with phylogenetic reconstructions are powerful approaches to study the evolution of genes and genomes. However, the current rapid expansion of the volume of genomic information makes it increasingly difficult to interrogate, integrate and synthesize comparative genome data while taking into account the maximum breadth of information available. GenomicusPlants (http://www.genomicus.biologie.ens.fr/genomicus-plants) is an extension of the Genomicus webserver that addresses this issue by allowing users to explore flowering plant genomes in an intuitive way, across the broadest evolutionary scales. Extant genomes of 26 flowering plants can be analyzed, as well as 23 ancestral reconstructed genomes. Ancestral gene order provides a long-term chronological view of gene order evolution, greatly facilitating comparative genomics and evolutionary studies. Four main interfaces ('views') are available where: (i) PhyloView combines phylogenetic trees with comparisons of genomic loci across any number of genomes; (ii) AlignView projects loci of interest against all other genomes to visualize its topological conservation; (iii) MatrixView compares two genomes in a classical dotplot representation; and (iv) Karyoview visualizes chromosome karyotypes 'painted' with colours of another genome of interest. All four views are interconnected and benefit from many customizable features.

The function annotation process in computational biology has increasingly shifted from the traditional characterization of individual biochemical roles of protein molecules to the system-wide detection of entire metabolic pathways and genomic structures. The so-called genome-aware methods broaden misannotation inconsistencies in genome sequences beyond protein function assignments, encompassing phylogenetic anomalies and artifactual genomic regions. We outline three categories of error propagation in databases by providing striking examples - at various levels of appreciation by the community from traditional to emerging, thus raising awareness for future solutions.

The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.

The understanding of marine microbial ecology and metabolism has been hampered by the paucity of sequenced reference genomes. To this end, we report the sequencing of 137 diverse marine isolates collected from around the world. We analysed these sequences, along with previously published marine prokaryotic genomes, in the context of marine metagenomic data, to gain insights into the ecology of the surface ocean prokaryotic picoplankton (0.1-3.0 μm size range). The results suggest that the sequenced genomes define two microbial groups: one composed of only a few taxa that are nearly always abundant in picoplanktonic communities, and the other consisting of many microbial taxa that are rarely abundant. The genomic content of the second group suggests that these microbes are capable of slow growth and survival in energy-limited environments, and rapid growth in energy-rich environments. By contrast, the abundant and cosmopolitan picoplanktonic prokaryotes for which there is genomic representation have smaller genomes, are probably capable of only slow growth and seem to be relatively unable to sense or rapidly acclimate to energy-rich conditions. Their genomic features also lead us to propose that one method used to avoid predation by viruses and/or bacterivores is by means of slow growth and the maintenance of low biomass.

The discovery of cis-regulatory elements is a challenging problem in bioinformatics, owing to distal locations and context-specific roles of these elements in controlling gene regulation. Here we review the current bioinformatics methodologies and resources available for systematic discovery of cis-acting regulatory elements and conserved transcription factor binding sites in the chick genome. In addition, we propose and make available, a novel workflow using computational tools that integrate CTCF analysis to predict putative insulator elements, enhancer prediction, and TFBS analysis. To demonstrate the usefulness of this computational workflow, we then use it to analyze the locus of the gene Sox2 whose developmental expression is known to be controlled by a complex array of cis-acting regulatory elements. The workflow accurately predicts most of the experimentally verified elements along with some that have not yet been discovered. A web version of the CTCF tool, together with instructions for using the workflow can be accessed from http://toolshed.g2.bx.psu.edu/view/mkhan1980/ctcf_analysis. For local installation of the tool, relevant Perl scripts and instructions are provided in the directory named "code" in the supplementary materials.

The discovery of cis-regulatory elements is a challenging problem in bioinformatics, owing to distal locations and context-specific roles of these elements in controlling gene regulation. Here we review the current bioinformatics methodologies and resources available for systematic discovery of cis-acting regulatory elements and conserved transcription factor binding sites in the chick genome. In addition, we propose and make available, a novel workflow using computational tools that integrate CTCF analysis to predict putative insulator elements, enhancer prediction and TFBS analysis. To demonstrate the usefulness of this computational workflow, we then use it to analyze the locus of the gene Sox2 whose developmental expression is known to be controlled by a complex array of cis-acting regulatory elements. The workflow accurately predicts most of the experimentally verified elements along with some that have not yet been discovered. A web version of the CTCF tool, together with instructions for using the workflow can be accessed from http://toolshed.g2.bx.psu.edu/view/mkhan1980/ctcf_analysis. For local installation of the tool, relevant Perl scripts and instructions are provided in the directory named “code” in the supplementary materials. PMID:23355428

We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible on a cloud platform and through a controlled-access internet portal. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertions and deletions or copy number variations per ASD subject. We identified 18 new candidate ASD-risk genes and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (P = 6 × 10(-4)). In 294 of 2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried copy number variations and/or chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.

Genomic imprinting, an inherently epigenetic phenomenon defined by parent of origin-dependent gene expression, is observed in mammals and flowering plants. Genome-scale surveys of imprinted expression and the underlying differential epigenetic marks have led to the discovery of hundreds of imprinted plant genes and confirmed DNA and histone methylation as key regulators of plant imprinting. However, the biological roles of the vast majority of imprinted plant genes are unknown, and the evolutionary forces shaping plant imprinting remain rather opaque. Here, we review the mechanisms of plant genomic imprinting and discuss theories of imprinting evolution and biological significance in light of recent findings. PMID:26680300

Genomic imprinting, an inherently epigenetic phenomenon defined by parent of origin-dependent gene expression, is observed in mammals and flowering plants. Genome-scale surveys of imprinted expression and the underlying differential epigenetic marks have led to the discovery of hundreds of imprinted plant genes and confirmed DNA and histone methylation as key regulators of plant imprinting. However, the biological roles of the vast majority of imprinted plant genes are unknown, and the evolutionary forces shaping plant imprinting remain rather opaque. Here, we review the mechanisms of plant genomic imprinting and discuss theories of imprinting evolution and biological significance in light of recent findings.

Plant genome can be modified via current biotechnology with high specificity and excellent efficiency. Zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system are the key engineered nucleases used in the genome editing. Genome editing techniques enable gene targeted mutagenesis, gene knock-out, gene insertion or replacement at the target sites during the endogenous DNA repair process, including non-homologous end joining (NHEJ) and homologous recombination (HR), triggered by the induction of DNA double-strand break (DSB). Genome editing has been successfully applied in the genome modification of diverse plant species, such as Arabidopsis thaliana, Oryza sativa, and Nicotiana tabacum. In this review, we summarize the application of genome editing in identification of plant gene function and crop breeding. Moreover, we also discuss the improving points of genome editing in crop precision genetic improvement for further study.

The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR’s biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR’s focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally-based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible. PMID:26201819

With the recent advances in genomic sciences, a knowledge-based approach can now be taken to optimize the bioremediation of trichloroethene (TCE). During the bioremediation of a heterogeneous subsurface, it is vital to identify and quantify the functionally important microorganisms present, characterize the microbial community and measure their physiological activity. In our field experiments, quantitative PCR (qPCR) was coupled with reverse-transcription (RT) to analyze both copy numbers and transcripts expressed by the 16S rRNA gene and three reductive dehalogenase (RDase) genes as biomarkers of Dehalococcoides spp. in the groundwater of a TCE-DNAPL site at Ft. Lewis (WA) that was serially subjected to biostimulation and bioaugmentation. Genes in the Dehalococcoides genus were targeted as they are the only known organisms that can completely dechlorinate TCE to the innocuous product ethene. Biomarker quantification revealed an overall increase of more than three orders of magnitude in the total Dehalococcoides population and quantification of the more liable and stringently regulated mRNAs confirmed that Dehalococcoides spp. were active. Parallel with our field experiments, laboratory studies were conducted to explore the physiology of Dehalococcoides isolates in order to develop relevant biomarkers that are indicative of the metabolic state of cells. Recently, we verified the function of the nitrogenase operon in Dehalococcoides sp. strain 195 and nitrogenase-encoding genes are ideal biomarker targets to assess cellular nitrogen requirement. To characterize the microbial community, we applied a high-density phylogenetic microarray (16S PhyloChip) that simultaneous monitors over 8,700 unique taxa to track the bacterial and archaeal populations through different phases of treatment. As a measure of species richness, 1,300 to 1,520 taxa were detected in groundwater samples extracted during different stages of treatment as well as in the bioaugmentation culture. We

A growing body of evidence suggests that insulators have a primary role in orchestrating the topological arrangement of higher-order chromatin architecture. Insulator-mediated long-range interactions can influence the epigenetic status of the genome and, in certain contexts, may have important effects on gene expression. Here we discuss higher-order chromatin organization as a unifying mechanism for diverse insulator actions across the genome. PMID:23706817

Background Microorganisms able to grow under artificial culture conditions comprise only a small proportion of the biosphere's total microbial community. Until recently, scientists have been unable to perform thorough analyses of difficult-to-culture microorganisms due to limitations in sequencing technology. As modern techniques have dramatically increased sequencing rates and rapidly expanded the number of sequenced genomes, in addition to traditional taxonomic classifications which focus on the evolutionary relationships of organisms, classifications of the genomes based on alternative points of view may help advance our understanding of the delicate relationships of organisms. Results We have developed a proteome-based method for classifying microbial species. This classification method uses a set of probes comprising short, highly conserved amino acid sequences. For each genome, in silico translation is performed to obtained its proteome, based on which a probe-set frequency pattern is generated. Then, the probe-set frequency patterns are used to cluster the proteomes/genomes. Conclusions Features of the proposed method include a high running speed in challenge of a large number of genomes, and high applicability for classifying organisms with incomplete genome sequences. Moreover, the probe-set clustering method is sensitive to the metabolic phenotypic similarities/differences among species and is thus supposed potential for the classification or differentiation of closely-related organisms. PMID:22537274

The functional analysis of microbial genomes often requires gene inactivation. We constructed a set of cassettes consisting of single antibiotic resistance genes flanked by the attL and attR sites resulting from site-specific integration of the Streptomyces pSAM2 element. These cassettes can easily be used to inactivate genes by in-frame deletion in Streptomyces by a three-step strategy. In the first step, in Escherichia coli, the cassette is inserted into a cloned copy of the gene to be inactivated. In the second step, the gene is replaced by homologous recombination in Streptomyces, allowing substitution of the wild-type target gene with its inactivated counterpart. In the third step, the cassette can be removed by expression of the pSAM2 genes xis and int. The resulting strains are marker-free and contain an "attB-like" sequence of 33, 34, or 35 bp with no stop codon if the cassette is correctly chosen. Thus, a gene can be disrupted by creating an in-frame deletion, avoiding polar effects if downstream genes are cotranscribed with the target gene. A set of cassettes was constructed to contain a hygromycin or gentamicin resistance gene flanked by the attL and attR sites. The initial constructions carrying convenient cloning sites allow the insertion of any other marker gene. We tested insertion and excision by inserting a cassette into orf3, the third gene of an operon involved in spiramycin biosynthesis. We verified that the cassette exerted a polar effect on the transcription of downstream genes but that, after excision, complementation with orf3 alone restored spiramycin production.

The functional analysis of microbial genomes often requires gene inactivation. We constructed a set of cassettes consisting of single antibiotic resistance genes flanked by the attL and attR sites resulting from site-specific integration of the Streptomyces pSAM2 element. These cassettes can easily be used to inactivate genes by in-frame deletion in Streptomyces by a three-step strategy. In the first step, in Escherichia coli, the cassette is inserted into a cloned copy of the gene to be inactivated. In the second step, the gene is replaced by homologous recombination in Streptomyces, allowing substitution of the wild-type target gene with its inactivated counterpart. In the third step, the cassette can be removed by expression of the pSAM2 genes xis and int. The resulting strains are marker-free and contain an “attB-like” sequence of 33, 34, or 35 bp with no stop codon if the cassette is correctly chosen. Thus, a gene can be disrupted by creating an in-frame deletion, avoiding polar effects if downstream genes are cotranscribed with the target gene. A set of cassettes was constructed to contain a hygromycin or gentamicin resistance gene flanked by the attL and attR sites. The initial constructions carrying convenient cloning sites allow the insertion of any other marker gene. We tested insertion and excision by inserting a cassette into orf3, the third gene of an operon involved in spiramycin biosynthesis. We verified that the cassette exerted a polar effect on the transcription of downstream genes but that, after excision, complementation with orf3 alone restored spiramycin production. PMID:16820478

Lipases have key roles in insect lipid acquisition, storage and mobilisation and are also fundamental to many physiological processes underpinning insect reproduction, development, defence from pathogens and oxidative stress, and pheromone signalling. We have screened the recently sequenced genomes of five species from four orders of holometabolous insects, the dipterans Drosophila melanogaster and Anopheles gambiae, the hymenopteran Apis mellifera, the moth Bombyx mori and the beetle Tribolium castaneum, for the six major lipase families that are also found in other organisms. The two most numerous families in the insects, the neutral and acid lipases, are also the main families in mammals, albeit not in Caenorhabditis elegans, plants or microbes. Total numbers of the lipases vary two-fold across the five insect species, from numbers similar to those in mammals up to numbers comparable to those seen in C. elegans. Whilst there is a high degree of orthology with mammalian lipases in the other four families, the great majority of the insect neutral and acid lipases have arisen since the insect orders themselves diverged. Intriguingly, about 10% of the insect neutral and acid lipases have lost motifs critical for catalytic function. Examination of the length of lid and loop regions of the neutral lipase sequences suggest that most of the insect lipases lack triacylglycerol (TAG) hydrolysis activity, although the acid lipases all have intact cap domains required for TAG hydrolysis. We have also reviewed the sequence databases and scientific literature for insights into the expression profiles and functions of the insect neutral and acid lipases and the orthologues of the mammalian adipose triglyceride lipase which has a pivotal role in lipid mobilisation. These data suggest that some of the acid and neutral lipase diversity may be due to a requirement for rapid accumulation of dietary lipids. The different roles required of lipases at the four discrete life stages of

Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.

Evolutionary solutions to the physiological challenges of life in highly variable habitats can span the continuum from evolution of a cosmopolitan plastic phenotype to the evolution of locally adapted phenotypes. Killifish (Fundulus sp.) have evolved both highly plastic and locally adapted phenotypes within different selective contexts, providing a comparative system in which to explore the genomic underpinnings of physiological plasticity and adaptive variation. Importantly, extensive variation exists among populations and species for tolerance to a variety of stressors, and we exploit this variation in comparative studies to yield insights into the genomic basis of evolved phenotypic variation. Notably, species of Fundulus occupy the continuum of osmotic habitats from freshwater to marine and populations within Fundulus heteroclitus span far greater variation in pollution tolerance than across all species of fish. Here, we explore how transcriptome regulation underpins extreme physiological plasticity on osmotic shock and how genomic and transcriptomic variation is associated with locally evolved pollution tolerance. We show that F. heteroclitus quickly acclimate to extreme osmotic shock by mounting a dramatic rapid transcriptomic response including an early crisis control phase followed by a tissue remodeling phase involving many regulatory pathways. We also show that convergent evolution of locally adapted pollution tolerance involves complex patterns of gene expression and genome sequence variation, which is confounded with body-weight dependence for some genes. Similarly, exploiting the natural phenotypic variation associated with other established and emerging model organisms is likely to greatly accelerate the pace of discovery of the genomic basis of phenotypic variation.

Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functionalgenomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.

The FAANG (Functional Annotation of Animal Genomes) Consortium recently held a Gathering On FAANG (GO-FAANG) Workshop in Washington, DC on October 7-8, 2015. This consortium is a grass-roots organization formed to advance the annotation of newly assembled genomes of non-model organisms (www.faang.or...

Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome-wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD. PMID:22479191

R loops are nucleic acid structures composed of an RNA-DNA hybrid and a displaced single-stranded DNA. Recently, evidence has emerged that R loops occur more often in the genome and have greater physiological relevance, including roles in transcription and chromatin structure, than was previously predicted. Importantly, however, R loops are also a major threat to genome stability. For this reason, several DNA and RNA metabolism factors prevent R-loop formation in cells. Dysfunction of these factors causes R-loop accumulation, which leads to replication stress, genome instability, chromatin alterations or gene silencing, phenomena that are frequently associated with cancer and a number of genetic diseases. We review the current knowledge of the mechanisms controlling R loops and their putative relationship with disease.

ABSTRACT Knowledge of the expression profile of a gene is a critical piece of information required to build an understanding of the normal and essential functions of that gene and any role it may play in the development or progression of disease. High-throughput, large-scale efforts are on-going internationally to characterise reporter-tagged knockout mouse lines. As part of that effort, we report an open access adult mouse expression resource, in which the expression profile of 424 genes has been assessed in up to 47 different organs, tissues and sub-structures using a lacZ reporter gene. Many specific and informative expression patterns were noted. Expression was most commonly observed in the testis and brain and was most restricted in white adipose tissue and mammary gland. Over half of the assessed genes presented with an absent or localised expression pattern (categorised as 0-10 positive structures). A link between complexity of expression profile and viability of homozygous null animals was observed; inactivation of genes expressed in ≥21 structures was more likely to result in reduced viability by postnatal day 14 compared with more restricted expression profiles. For validation purposes, this mouse expression resource was compared with Bgee, a federated composite of RNA-based expression data sets. Strong agreement was observed, indicating a high degree of specificity in our data. Furthermore, there were 1207 observations of expression of a particular gene in an anatomical structure where Bgee had no data, indicating a large amount of novelty in our data set. Examples of expression data corroborating and extending genotype-phenotype associations and supporting disease gene candidacy are presented to demonstrate the potential of this powerful resource. PMID:26398943

The role of the top human resources executive in major corporations has changed during the past decade into a directly involved business advisor, strategist, and implementer of business objectives. Intense competition has overridden previous human resources concerns and forced priorities toward internal, business-driven issues. Since cost-cutting…

FAT1, FAT2, FAT3 and FAT4 are human homologs of Drosophila Fat, which is involved in tumor suppression and planar cell polarity (PCP). FAT1 and FAT4 undergo the first proteolytic cleavage by Furin and are predicted to undergo the second cleavage by γ-secretase to release intracellular domain (ICD). Ena/VAPS-binding to FAT1 induces actin polymerization at lamellipodia and filopodia to promote cell migration, while Scribble-binding to FAT1 induces phosphorylation and functional inhibition of YAP1 to suppress cell growth. FAT1 is repressed in oral cancer owing to homozygous deletion or epigenetic silencing and is preferentially downregulated in invasive breast cancer. On the other hand, FAT1 is upregulated in leukemia and prognosis of preB-ALL patients with FAT1 upregulation is poor. FAT4 directly interacts with MPDZ/MUPP1 to recruit membrane-associated guanylate kinase MPP5/PALS1. FAT4 is involved in the maintenance of PCP and inhibition of cell proliferation. FAT4 mRNA is repressed in breast cancer and lung cancer due to promoter hypermethylation. FAT4 gene is recurrently mutated in several types of human cancers, such as melanoma, pancreatic cancer, gastric cancer and hepatocellular carcinoma. FAT1 and FAT4 suppress tumor growth via activation of Hippo signaling, whereas FAT1 promotes tumor migration via induction of actin polymerization. FAT1 is tumor suppressive or oncogenic in a context-dependent manner, while FAT4 is tumor suppressive. Copy number aberration, translocation and point mutation of FAT1, FAT2, FAT3, FAT4, FRMD1, FRMD6, NF2, WWC1, WWC2, SAV1, STK3, STK4, MOB1A, MOB1B, LATS1, LATS2, YAP1 and WWTR1/TAZ genes should be comprehensively investigated in various types of human cancers to elucidate the mutation landscape of the FAT-Hippo signaling cascades. Because YAP1 and WWTR1 are located at the crossroads of adhesion, GPCR, RTK and stem-cell signaling network, cancer genomics of the FAT signaling cascades could be applied for diagnostics, prognostics

Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea.

Abstract Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea. PMID:27567261

The genetic networks that govern the differentiation and growth of major tissues of economic importance in the chicken are largely unknown. Under a functionalgenomics project, our consortium has generated 30 609 expressed sequence tags (ESTs) and developed several chicken DNA microarrays, which represent the Chicken Metabolic/Somatic (10 K) and Neuroendocrine/Reproductive (8 K) Systems (http://udgenome.ags.udel.edu/cogburn/). One of the major challenges facing functionalgenomics is the development of mathematical models to reconstruct functional gene networks and regulatory pathways from vast volumes of microarray data. In initial studies with liver-specific microarrays (3.1 K), we have examined gene expression profiles in liver during the peri-hatch transition and during a strong metabolic perturbation-fasting and re-feeding-in divergently selected broiler chickens (fast vs. slow-growth lines). The expression of many genes controlling metabolic pathways is dramatically altered by these perturbations. Our analysis has revealed a large number of clusters of functionally related genes (mainly metabolic enzymes and transcription factors) that control major metabolic pathways. Currently, we are conducting transcriptional profiling studies of multiple tissues during development of two sets of divergently selected broiler chickens (fast vs. slow growing and fat vs. lean lines). Transcriptional profiling across multiple tissues should permit construction of a detailed genetic blueprint that illustrates the developmental events and hierarchy of genes that govern growth and development of chickens. This review will briefly describe the recent acquisition of chicken genomicresources (ESTs and microarrays) and our consortium's efforts to help launch the new era of functionalgenomics in the chicken.

low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. Conclusion The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources. PMID:18221520

Genome editing with engineered endonucleases is rapidly becoming a staple method in developmental biology studies. Engineered nucleases permit random or designed genomic modification at precise loci through the stimulation of endogenous double-strand break repair. Homology-directed repair following targeted DNA damage is mediated by co-introduction of a custom repair template, allowing the derivation of knock-out and knock-in alleles in animal models previously refractory to classic gene targeting procedures. Currently there are three main types of customizable site-specific nucleases delineated by the source mechanism of DNA binding that guides nuclease activity to a genomic target: zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR). Among these genome engineering tools, characteristics such as the ease of design and construction, mechanism of inducing DNA damage, and DNA sequence specificity all differ, making their application complementary. By understanding the advantages and disadvantages of each method, one may make the best choice for their particular purpose.

The yeast deletion collections comprise >21,000 mutant strains that carry precise start-to-stop deletions of ∼6000 open reading frames. This collection includes heterozygous and homozygous diploids, and haploids of both MATa and MATα mating types. The yeast deletion collection, or yeast knockout (YKO) set, represents the first and only complete, systematically constructed deletion collection available for any organism. Conceived during the Saccharomyces cerevisiae sequencing project, work on the project began in 1998 and was completed in 2002. The YKO strains have been used in numerous laboratories in >1000 genome-wide screens. This landmark genome project has inspired development of numerous genome-wide technologies in organisms from yeast to man. Notable spinoff technologies include synthetic genetic array and HIPHOP chemogenomics. In this retrospective, we briefly describe the yeast deletion project and some of its most noteworthy biological contributions and the impact that these collections have had on the yeast research community and on genomics in general. PMID:24939991

Endozoicomonas bacteria are globally distributed and often abundantly associated with diverse marine hosts including reef-building corals, yet their function remains unknown. In this study we generated novel Endozoicomonas genomes from single cells and metagenomes obtained directly from the corals Stylophora pistillata, Pocillopora verrucosa, and Acropora humilis. We then compared these culture-independent genomes to existing genomes of bacterial isolates acquired from a sponge, sea slug, and coral to examine the functional landscape of this enigmatic genus. Sequencing and analysis of single cells and metagenomes resulted in four novel genomes with 60–76% and 81–90% genome completeness, respectively. These data also confirmed that Endozoicomonas genomes are large and are not streamlined for an obligate endosymbiotic lifestyle, implying that they have free-living stages. All genomes show an enrichment of genes associated with carbon sugar transport and utilization and protein secretion, potentially indicating that Endozoicomonas contribute to the cycling of carbohydrates and the provision of proteins to their respective hosts. Importantly, besides these commonalities, the genomes showed evidence for differential functional specificity and diversification, including genes for the production of amino acids. Given this metabolic diversity of Endozoicomonas we propose that different genotypes play disparate roles and have diversified in concert with their hosts. PMID:28094347

Given the mounting convergent evidence implicating many more genes in complex disorders such as bipolar disorder than the small number identified unambiguously by the first-generation Genome-Wide Association studies (GWAS) to date, there is a strong need for improvements in methodology. One strategy is to include in the next generation GWAS larger numbers of subjects, and/or to pool independent studies into meta-analyses. We propose and provide proof of principle for the use of a complementary approach, convergent functionalgenomics (CFG), as a way of mining the existing GWAS datasets for signals that are there already, but did not reach significance using a genetics-only approach. With the CFG approach, the integration of genetics with genomics, of human and animal model data, and of multiple independent lines of evidence converging on the same genes offers a way of extracting signal from noise and prioritizing candidates. In essence our analysis is the most comprehensive integration of genetics and functionalgenomics to date in the field of bipolar disorder, yielding a series of novel (such as Klf12, Aldh1a1, A2bp1, Ak3l1, Rorb, Rora) and previously known (such as Bdnf, Arntl, Gsk3b, Disc1, Nrg1, Htr2a) candidate genes, blood biomarkers, as well as a comprehensive identification of pathways and mechanisms. These become prime targets for hypothesis driven follow-up studies, new drug development and personalized medicine approaches.

Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address. PMID:19725959

The wealth of new information coming from the many genome sequencing projects is providing unprecedented opportunities for major advances in all areas of biology, including the environmental health sciences. To facilitate this discovery process, experts in the fields of function...

The budding yeast Saccharomyces cerevisiae has been used extensively for the study of cell polarity, owing to both its experimental tractability and the high conservation of cell polarity and other basic biological processes among eukaryotes. The budding yeast has also served as a pioneer model organism for virtually all genome-scale approaches, including functionalgenomics, which aims to define gene function and biological pathways systematically through the analysis of high-throughput experimental data. Here, we outline the contributions of functionalgenomics and high-throughput methodologies to the study of cell polarity in the budding yeast. We integrate data from published genetic screens that use a variety of functionalgenomics approaches to query different aspects of polarity. Our integrated dataset is enriched for polarity processes, as well as some processes that are not intrinsically linked to cell polarity, and may provide new areas for future study.

Chlamydomonas reinhardtii, a unicellular green alga, has been exploited as a reference organism for identifying proteins and activities associated with the photosynthetic apparatus and the functioning of chloroplasts. Recently, the full genome sequence of Chlamydomonas was generated and a set of gene models, representing all genes on the genome, was developed. Using these gene models, and gene models developed for the genomes of other organisms, a phylogenomic, comparative analysis was performed to identify proteins encoded on the Chlamydomonas genome which were likely involved in chloroplast functions (or specifically associated with the green algal lineage); this set of proteins has been designated the GreenCut. Further analyses of those GreenCut proteins with uncharacterized functions and the generation of mutant strains aberrant for these proteins are beginning to unmask new layers of functionality/regulation that are integrated into the workings of the photosynthetic apparatus. PMID:20490922

Fragaria vesca, a diploid strawberry species commonly known as the alpine or woodland strawberry, is a versatile experimental plant system that is an emerging model for the Rosaceae family. An ancestral F. vesca genome contributed to the genome of the octoploid dessert strawberry (F. xananassa) and...

We have proposed an algorithm to smooth the transmission of the pre-recorded VBR media stream. It takes O(n) time complexity, where n is large, this algorithm is not suitable for online resource management and admission control in media servers. To resolve this drawback, we have explored the optimal tradeoff among resources by an O(nlogn) algorithm. Based on the pre-computed resource tradeoff function, the resource management and admission control procedure is as simple as table hashing. However, this approach requires O(n) space to store and maintain the resource tradeoff function. In this paper, while giving some extra resources, a linear-time algorithm is proposed to approximate the resource tradeoff function by piecewise line segments. We can prove that the number of line segments in the obtained approximation function is minimized for the given extra resources. The proposed algorithm has been applied to approximate the bandwidth-buffer-tradeoff function of the real-world Star War movie. While an extra 0.1 Mbps bandwidth is given, the storage space required for the approximation function is over 2000 times smaller than that required for the original function. While an extra 10 KB buffer is given, the storage space for the approximation function is over 2200 over times smaller than that required for the original function. The proposed algorithm is really useful for resource management and admission control in real-world media servers.

Research on executive functioning and on self-regulation have each identified a critical resource that is central to that domain and is susceptible to depletion. In addition, studies have shown that self-regulation tasks and executive-functioning tasks interact with each other, suggesting that they may share resources. Other research has focused specifically on restoring what we propose is the shared resource between self-regulation and executive functioning. Utilizing a theory-based natural environment intervention, these studies have found improvements in executive-functioning performance and self-regulation effectiveness, suggesting that the natural environment intervention restores this shared resource.

Functional classification of proteins is central to comparative genomics. The need for algorithms tuned to enable integrative interpretation of analytical data is felt globally. The availability of a general,automated software with built-in flexibility will significantly aid this activity. We have prepared ARC (Automated Resource Classifier), which is an open source software meeting the user requirements of flexibility. The default classification scheme based on keyword match is agglomerative and directs entries into any of the 7 basic non-overlapping functional classes: Cell wall, Cell membrane and Transporters (C), Cell division (D), Information (I), Translocation (L), Metabolism (M), Stress(R), Signal and communication (S) and 2 ancillary classes: Others (O) and Hypothetical (H). The keyword library of ARC was built serially by first drawing keywords from Bacillus subtilis and Escherichia coli K12. In subsequent steps,this library was further enriched by collecting terms from archaeal representative Archaeoglobus fulgidus, Gene Ontology, and Gene Symbols. ARC is 94.04% successful on 6,75,663 annotated proteins from 348 prokaryotes. Three examples are provided to illuminate the current perspectives on mycobacterial physiology and costs of proteins in 333 prokaryotes. ARC is available at http://arc.igib.res.in.

Deep characterization of molecular function of genetic variants in the human genome is becoming increasingly important for understanding genetic associations to disease and for learning to read the regulatory code of the genome. In this paper, I discuss how recent advances in both quantitative genetics and molecular biology have contributed to understanding functional effects of genetic variants, lessons learned from eQTL studies, and future challenges in this field.

The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species.

The Caliciviridae family of small positive sense RNA viruses contains a diverse range of pathogens of both man and animals. The molecular mechanisms of calicivirus genome replication and translation have not been as widely studied as many other RNA viruses. With the relatively recent development of robust cell culture and reverse genetics systems for several members of the Caliciviridae family, a more in-depth analysis of the finer detail of the viral life cycle has now been obtained. As a result, the identification and characterization of the role of RNA structures in the calicivirus life cycle has also been possible. This review aims to summarize the current state of knowledge with respect to the role of RNA structures at the termini of calicivirus genomes.

The multigenic nature of human tumours presents a fundamental challenge for cancer drug discovery. Here we use Drosophila to generate 32 multigenic models of colon cancer using patient data from The Cancer Genome Atlas. These models recapitulate key features of human cancer, often as emergent properties of multigenic combinations. Multigenic models such as ras p53 pten apc exhibit emergent resistance to a panel of cancer-relevant drugs. Exploring one drug in detail, we identify a mechanism of resistance for the PI3K pathway inhibitor BEZ235. We use this data to identify a combinatorial therapy that circumvents this resistance through a two-step process of emergent pathway dependence and sensitivity we term ‘induced dependence'. This approach is effective in cultured human tumour cells, xenografts and mouse models of colorectal cancer. These data demonstrate how multigenic animal models that reference cancer genomes can provide an effective approach for developing novel targeted therapies. PMID:27897178

To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.

To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less

We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web. PMID:16420673

The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomicresources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.

Gene dosage plays a critical role in a range of cellular phenotypes, yet most cellular expression systems use heterologous cDNA-based vectors which express proteins well above physiological levels. In contrast, genomic DNA expression vectors generate physiologically-relevant levels of gene expression by carrying the whole genomic DNA locus of a gene including its regulatory elements. Here we describe the first genomic DNA expression library generated using the high-capacity herpes simplex virus-1 amplicon technology to deliver bacterial artificial chromosomes (BACs) into cells by viral transduction. The infectious BAC (iBAC) library contains 184,320 clones with an average insert size of 134.5 kb. We show in a Chinese hamster ovary (CHO) disease model cell line and mouse embryonic stem (ES) cells that this library can be used for genetic rescue studies in a range of contexts including the physiological restoration of Ldlr deficiency, and viral receptor expression. The iBAC library represents an important new genetic analysis tool openly available to the research community. PMID:27353647

Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

The completion of the first draft of the human genome presents both a tremendous opportunity and enormous challenge to the pharmaceutical industry since the whole community, with few exceptions, will soon have access to the same pool of candidate gene sequences from which to select future therapeutic targets. The commercial imperative to select and pursue therapeutically relevant genes from within the overall content of the genome will be particularly intense for those gene families that currently represent the chemically tractable or 'drugable' gene targets. As a consequence the emphasis within exploratory research has shifted towards the evaluation and adoption of technology platforms that can add additional value to the gene selection process, either through functional studies or direct/indirect measures of disease alignment e.g., genetics, differential gene expression, proteomics, tissue distribution, comparative species data etc. The selection of biological targets for the development of potential new medicines relies, in part, on the quality of the in vivo biological data that correlates a particular molecular target with the underlying pathophysiology of a disease. Within the pharmaceutical industry, studies employing transgenic animals and, in particular, animals with specific gene deletions are playing an increasingly important role in the therapeutic target gene selection, drug candidate selection and product development phases of the overall drug discovery process. The potential of phenotypic information from gene knock-outs to contribute to a high-throughput target selection/validation strategy has hitherto been limited by the resources required to rapidly generate and characterise a large number of knock-out transgenics in a timely fashion. The offerings of several companies that provide an opportunity to overcome these hurdles, albeit at a cost, are assessed with respect to the strategic business needs of the pharmaceutical industry.

RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functionalgenomics of cotton (Gossypium ssp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function ...

The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomicresources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium. PMID:27164098

The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world's most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomicresources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.

Background Root and bulb vegetables (RBV) include carrots, celeriac (root celery), parsnips (Apiaceae), onions, garlic, and leek (Alliaceae)—food crops grown globally and consumed worldwide. Few data analysis platforms are currently available where data collection, annotation and integration initiatives are focused on RBV plant groups. Scientists working on RBV include breeders, geneticists, taxonomists, plant pathologists, and plant physiologists who use genomic data for a wide range of activities including the development of molecular genetic maps, delineation of taxonomic relationships, and investigation of molecular aspects of gene expression in biochemical pathways and disease responses. With genomic data coming from such diverse areas of plant science, availability of a community resource focused on these RBV data types would be of great interest to this scientific community. Description The RoBuST database has been developed to initiate a platform for collecting and organizing genomic information useful for RBV researchers. The current release of RoBuST contains genomics data for 294 Alliaceae and 816 Apiaceae plant species and has the following features: (1) comprehensive sequence annotations of 3663 genes 5959 RNAs, 22,723 ESTs and 11,438 regulatory sequence elements from Apiaceae and Alliaceae plant families; (2) graphical tools for visualization and analysis of sequence data; (3) access to traits, biosynthetic pathways, genetic linkage maps and molecular taxonomy data associated with Alliaceae and Apiaceae plants; and (4) comprehensive plant splice signal repository of 659,369 splice signals collected from 6015 plant species for comparative analysis of plant splicing patterns. Conclusions RoBuST, available at http://robust.genome.com, provides an integrated platform for researchers to effortlessly explore and analyze genomic data associated with root and bulb vegetables. PMID:20691054

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome-wide scan of 30 individuals. Whereas genomicresources are increasingly becoming available for this species, including the recent release of a draft genome, no genome-wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19,703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long-term effective population size was estimated to range between 132,000 and 1,320,000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82,425 loci and 376,918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.

A key feature of a gene's function is the variety of protein isoforms it encodes in a population. However, the genetic diversity in bovine whole genome databases tends to be underrepresented because these databases contain an abundance of sequence from the most influential sires. Our first aim was ...

Introduction Functionalgenomic screens apply knowledge gained from the sequencing of the human genome toward rapid methods of identifying genes involved in cellular function based on a specific phenotype. This approach has been made possible through the use of advances in both molecular biology and automation. The utility of this approach has been further enhanced through the application of image-based high content screening, an automated microscopy and quantitative image analysis platform. These approaches can significantly enhance acquisition of novel targets for drug discovery. Areas covered Both the utility and potential issues associated with functionalgenomic screening approaches are discussed along with examples that illustrate both. The considerations for high content screening applied to functionalgenomics are also presented. Expert opinion Functionalgenomic and high content screening are extremely useful in the identification of new drug targets. However, the technical, experimental, and computational parameters have an enormous influence on the results. Thus, although new targets are identified, caution should be applied toward interpretation of screening data in isolation. Genomic screens should be viewed as an integral component of a target identification campaign that requires both the acquisition of orthogonal data, as well as a rigorous validation strategy. PMID:22860749

Functionalgenomics (FG) screens, using RNAi or CRISPR technology, have become a standard tool for systematic, genome-wide loss-of-function studies for therapeutic target discovery. As in many large-scale assays, however, off-target effects, variable reagents' potency and experimental noise must be accounted for appropriately control for false positives.

On the basis of shotgun subclone libraries used in the sequencing of the Drosophila melanogaster genome, a minimal tiling path of subclones across much of the genome was determined. About 320,000 shotgun clones for chromosomes X(12-20), 2R, 2L, 3R, and 4 were available from the Berkeley Drosophila Genome Project. The clone inserts have an average length of 3.4 kb and are amenable to standard PCR amplification. The resulting tiling path covers 86.2% of chromosome X(12-20), 86.2% of chromosomal arm 2R, 79.0% of 2L, 89.6% of 3R, and 80.5% of chromosome 4. In total, the 25,135 clones represent 76.7 Mb--equivalent to about 67% of the genome--and would be suitable for producing a microarray on a single slide.

ABSTRACT Host infection by microbial pathogens cues global changes in microbial and host cell biology that facilitate microbial replication and disease. The complete maps of thousands of bacterial and viral genomes have recently been defined; however, the rate at which physiological or biochemical functions have been assigned to genes has greatly lagged. The National Institute of Allergy and Infectious Diseases (NIAID) addressed this gap by creating functionalgenomics centers dedicated to developing high-throughput approaches to assign gene function. These centers require broad-based and collaborative research programs to generate and integrate diverse data to achieve a comprehensive understanding of microbial pathogenesis. High-throughput functionalgenomics can lead to new therapeutics and better understanding of the next generation of emerging pathogens by rapidly defining new general mechanisms by which organisms cause disease and replicate in host tissues and by facilitating the rate at which functional data reach the scientific community. PMID:27703071

Roots are vital for the uptake of water and nutrients, and for anchorage in the soil. They are highly plastic, able to adapt developmentally and physiologically to changing environmental conditions. Understanding the molecular mechanisms behind this growth and development requires knowledge of root transcriptomics, proteomics, and metabolomics. Genomics approaches, including the recent publication of a root expression map, root proteome, and environment-specific root expression studies, are uncovering complex transcriptional and post-transcriptional networks underlying root development. The challenge is in further capitalizing on the information in these datasets to understand the fundamental principles of root growth and development. In this review, we highlight progress researchers have made toward this goal.

Macrophages play essential roles in the response to injury and infection and contribute to the development and/or homeostasis of the various tissues they reside in. Conversely, macrophages also influence the pathogenesis of metabolic, neurodegenerative, and neoplastic diseases. Mechanisms that contribute to the phenotypic diversity of macrophages in health and disease remain poorly understood. Here we review the recent application of genome-wide approaches to characterize the transcriptomes and epigenetic landscapes of tissue-resident macrophages. These studies are beginning to provide insights into how distinct tissue environments are interpreted by transcriptional regulatory elements to drive specialized programs of gene expression. PMID:28087927

The specific aims of this project were as follows: (1) to design primers to each predicted open reading frame (ORF) in M. jannaschii and M. thermoautotrophicum to allow the amplification of a unique target sequence that will represent the corresponding coding region on a complete genome chip (2) to amplify each target sequence from M. jannaschii and M. thermoautotrophicum and verify that these PCR products are the expected DNA fragment (3) to establish a relational database that will track the production of target DNAs and the nucleotide sequence used to represent each ORF.

The purpose of this study was to examine cohort comparisons in levels of resources (e.g., mental health, physical functioning, economic and social resources, and cognitive functioning) for 211 community-dwelling centenarians (whose Mini-Mental Status Examination score was 23 or higher) of phases I and III of the Georgia Centenarian Study. The…

The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functionalgenomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project. PMID:18629059

The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functionalgenomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.

Genomic islands have been shown to harbor functional traits that differentiate ecologically distinct populations of environmental bacteria. A comparative analysis of the complete genome sequences of the marine Actinobacteria Salinispora tropica and S. arenicola reveals that 75% of the species-specific genes are located in 21 genomic islands. These islands are enriched in genes associated with secondary metabolite biosynthesis providing evidence that secondary metabolism is linked to functional adaptation. Secondary metabolism accounts for 8.8% and 10.9% of the genes in the S. tropica and S. arenicola genomes, respectively, and represents the major functional category of annotated genes that differentiates the two species. Genomic islands harbor all 25 of the species-specific biosynthetic pathways, the majority of which occur in S. arenicola and may contribute to the cosmopolitan distribution of this species. Genome evolution is dominated by gene duplication and acquisition, which in the case of secondary metabolism provide immediate opportunities for the production of new bioactive products. Evidence that secondary metabolic pathways are exchanged horizontally, coupled with prior evidence for fixation among globally distributed populations, supports a functional role and suggests that the acquisition of natural product biosynthetic gene clusters represents a previously unrecognized force driving bacterial diversification. Species-specific differences observed in CRISPR (clustered regularly interspaced short palindromic repeat) sequences suggest that S. arenicola may possess a higher level of phage immunity, while a highly duplicated family of polymorphic membrane proteins provides evidence of a new mechanism of marine adaptation in Gram-positive bacteria. PMID:19474814

A genome-wide association (GWA) study of treatment outcomes (response and remission) of selective serotonin reuptake inhibitors (SSRIs) was conducted using 529 subjects with major depressive disorder. While no SNP associations reached the genome-wide level of significance, 14 SNPs of interest were identified for functional analysis. The rs11144870 SNP in the riboflavin kinase (RFK) gene on chromosome 9 was associated with 8-week treatment response (odds ratio (OR)=0.42, P=1.04 × 10⁻⁶). The rs915120 SNP in the G protein-coupled receptor kinase 5 (GRK5) gene on chromosome 10 was associated with 8-week remission (OR=0.50, P=1.15 × 10⁻⁵). Both SNPs were shown to influence transcription by a reporter gene assay and to alter nuclear protein binding using an electrophoretic mobility shift assay. This report represents an example of joining functionalgenomics with traditional GWA study results derived from a GWA analysis of SSRI treatment outcomes. The goal of this analytical strategy is to provide insights into the potential relevance of biologically plausible observed associations.

The Polydnaviridae (PDV), including the Bracovirus (BV) and Ichnovirus genera, originated from the integration of unrelated viruses in the genomes of two parasitoid wasp lineages, in a remarkable example of convergent evolution. Functionally active PDVs represent the most compelling evolutionary success among endogenous viral elements (EVEs). BV evolved from the domestication by braconid wasps of a nudivirus 100 Ma. The nudivirus genome has become an EVE involved in BV particle production but is not encapsidated. Instead, BV genomes have co-opted virulence genes, used by the wasps to control the immunity and development of their hosts. Gene transfers and duplications have shaped BV genomes, now encoding hundreds of genes. Phylogenomic studies suggest that BVs contribute largely to wasp diversification and adaptation to their hosts. A genome evolution model explains how multidirectional wasp adaptation to different host species could have fostered PDV genome extension. Integrative studies linking ecological data on the wasp to genomic analyses should provide new insights into the adaptive role of particular BV genes. Forthcoming genomic advances should also indicate if the associations between endoparasitoid wasps and symbiotic viruses evolved because of their particularly intimate interactions with their hosts, or if similar domesticated EVEs could be uncovered in other parasites.

Understanding the relationship between genome organization and expression is central to understanding genomefunction. Closely apposed genes in a head-to-head orientation share the same upstream region and are likely to be coregulated. Here we identify the Drosophila BEAF-32 insulator as a cis regulatory element separating close head-to-head genes with different transcription regulation modes. We then compare the binding landscapes of the BEAF-32 insulator protein in four different Drosophila genomes and highlight the evolutionarily conserved presence of this protein between close adjacent genes. We find that changes in binding of BEAF-32 to sites in the genome of different Drosophila species correlate with alterations in genome organization caused by DNA rearrangements or genome size expansion. The cross-talk between BEAF-32 genomic distribution and genome organization contributes to new gene-expression profiles, which in turn translate into specific and distinct phenotypes. The results suggest a mechanism for the establishment of differences in transcription patterns during evolution. PMID:22895281

We present a high-coverage draft genome assembly of the aye-aye (Daubentonia madagascariensis), a highly unusual nocturnal primate from Madagascar. Our assembly totals ~3.0 billion bp (3.0 Gb), roughly the size of the human genome, comprised of ~2.6 million scaffolds (N50 scaffold size = 13,597 bp) based on short paired-end sequencing reads. We compared the aye-aye genome sequence data with four other published primate genomes (human, chimpanzee, orangutan, and rhesus macaque) as well as with the mouse and dog genomes as nonprimate outgroups. Unexpectedly, we observed strong evidence for a relatively slow substitution rate in the aye-aye lineage compared with these and other primates. In fact, the aye-aye branch length is estimated to be ~10% shorter than that of the human lineage, which is known for its low substitution rate. This finding may be explained, in part, by the protracted aye-aye life-history pattern, including late weaning and age of first reproduction relative to other lemurs. Additionally, the availability of this draft lemur genome sequence allowed us to polarize nucleotide and protein sequence changes to the ancestral primate lineage-a critical period in primate evolution, for which the relevant fossil record is sparse. Finally, we identified 293,800 high-confidence single nucleotide polymorphisms in the donor individual for our aye-aye genome sequence, a captive-born individual from two wild-born parents. The resulting heterozygosity estimate of 0.051% is the lowest of any primate studied to date, which is understandable considering the aye-aye's extensive home-range size and relatively low population densities. Yet this level of genetic diversity also suggests that conservation efforts benefiting this unusual species should be prioritized, especially in the face of the accelerating degradation and fragmentation of Madagascar's forests.

Questions are being asked about how enzyme function is described at the molecular level and the strengths and weaknesses of the EC system for this purpose. A new approach to describing enzyme function has been proposed that might improve our capabilities for functional inference for members of enzyme superfamilies.

Functionalgenomics attempts to understand the genome by perturbing the flow of information from DNA to RNA to protein, in order to learn how gene dysfunction leads to disease. CRISPR/Cas9 technology is the newest tool in the geneticist's toolbox, allowing researchers to edit DNA with unprecedented ease, speed and accuracy, and representing a novel means to perform genome-wide genetic screens to discover gene function. In this review, we first summarize the discovery and characterization of CRISPR/Cas9, and then compare it to other genome engineering technologies. We discuss its initial use in screening applications, with a focus on optimizing on-target activity and minimizing off-target effects. Finally, we comment on future challenges and opportunities afforded by this technology.

As a scaffold, SLX4/FANCP interacts with multiple proteins involved in genome integrity. Although not having recognizable catalytic domains, SLX4 participates in diverse genome maintenance pathways by delivering nucleases where they are needed, and promoting their cooperative execution to prevent genomic instabilities. Physiological importance of SLX4 is emphasized by the identification of causative mutations of SLX4 genes in patients diagnosed with Fanconi anemia (FA), a rare recessive genetic disorder characterized by genomic instability and predisposition to cancers. Recent progress in understanding functional roles of SLX4 has greatly expanded our knowledge in the repair of DNA interstrand crosslinks (ICLs), Holliday junction (HJ) resolution, telomere homeostasis and regulation of DNA damage response induced by replication stress. Here, these diverse functions of SLX4 are reviewed in detail. PMID:24938228