Description:
Background: Discovery of precise specificity oftranscription factors is an important step on the way to understandingthe complex mechanisms of gene regulation in eukaryotes. Recently,doublestranded protein-binding microarrays were developed as apotentially scalable approach to tackle transcription factor binding siteidentification. Results: Here we present an algorithmic approach toexperimental design of a microarray that allows for testing fullspecificity of a transcription factor binding to all possible DNA bindingsites of a given length, with optimally efficient use of the array. Thisdesign is universal, works for any factor that binds a sequence motif andis not species-specific. Furthermore, simulation results show that dataproduced with the designed arrays is easier to analyze and would resultin more precise identification of binding sites. Conclusion: In thisstudy, we present a design of a double stranded DNA microarray forprotein-DNA interaction studies and show that our algorithm allowsoptimally efficient use of the arrays for this purpose. We believe such adesign will prove useful for transcription factor binding siteidentification and other biological problems.

Description:
The gain and loss of functional transcription-factor bindingsites has been proposed as a major source of evolutionary change incis-regulatory DNA and gene expression. We have developed an evolutionarymodel to study binding site turnover that uses multiple sequencealignments to assess the evolutionary constraint on individual bindingsites, and to map gain and loss events along a phylogenetic tree. Weapply this model to study the evolutionary dynamics of binding sites ofthe Drosophila melanogaster transcription factor Zeste, using genome-widein vivo (ChIP-chip) binding data to identify functional Zeste bindingsites, and the genome sequences of D. melanogaster, D. simulans, D.erecta and D. yakuba to study their evolution. We estimate that more than5 percent of functional Zeste binding sites in D. melanogaster weregained along the D. melanogaster lineage or lost along one of the otherlineages. We find that Zeste bound regions have a reduced rate of bindingsite loss and an increased rate of binding site gain relative to flankingsequences. Finally, we show that binding site gains and losses areasymmetrically distributed with respect to D. melanogaster, consistentwith lineage-specific acquisition and loss of Zeste-responsive regulatoryelements.

Description:
Physically interacting proteins or parts of proteins are expected to evolve in a coordinated manner that preserves proper interactions. Such coevolution at the amino acid-sequence level is well documented and has been used to predict interacting proteins, domains, and amino acids. Interacting proteins are also often precisely coexpressed with one another, presumably to maintain proper stoichiometry among interacting components. Here, we show that the expression levels of physically interacting proteins coevolve. We estimate average expression levels of genes from four closely related fungi of the genus Saccharomyces using the codon adaptation index and show that expression levels of interacting proteins exhibit coordinated changes in these different species. We find that this coevolution of expression is a more powerful predictor of physical interaction than is coevolution of amino acid sequence. These results demonstrate previously uncharacterized coevolution of gene expression, adding a different dimension to the study of the coevolution of interacting proteins and underscoring the importance of maintaining coexpression of interacting proteins over evolutionary time. Our results also suggest that expression coevolution can be used for computational prediction of protein protein interactions.

Description:
Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and phylogenetic inference methods might beimproved to minimize or control for alignment errors.

Description:
Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

Description:
The phylogenetic relationship of the now fully sequencedspecies Drosophila erecta and D. yakuba with respect to the D.melanogaster species complex has been a subject of controversy. All threepossible groupings of the species have been reported in the past, thoughrecent multi-gene studies suggest that D. erecta and D. yakuba are sisterspecies. Using the whole genomes of each of these species as well as thefour other fully sequenced species in the subgenus Sophophora, we set outto investigate the placement of D. erecta and D. yakuba in the D.melanogaster species group and to understand the cause of the pastincongruence. Though we find that the phylogeny grouping D. erecta and D.yakuba together is the best supported, we also find widespreadincongruence in nucleotide and amino acid substitutions, insertions anddeletions, and gene trees. The time inferred to span the two keyspeciation events is short enough that under the coalescent model, theincongruence could be the result of incomplete lineage sorting.Consistent with the lineage-sorting hypothesis, substitutions supportingthe same tree were spatially clustered. Support for the different treeswas found to be linked to recombination such that adjacent genes supportthe same tree most often in regions of low recombination andsubstitutions supporting the same tree are most enriched roughly on thesame scale as linkage disequilibrium, also consistent with lineagesorting. The incongruence was found to be statistically significant androbust to model and species choice. No systematic biases were found. Weconclude that phylogenetic incongruence in the D. melanogaster speciescomplex is the result, at least in part, of incomplete lineage sorting.Incomplete lineage sorting will likely cause phylogenetic incongruence inmany comparative genomics datasets. Methods to infer the correct speciestree, the history of every base in the genome, and comparative methodsthat control for and/or utilize this information will be valuableadvancements for the field of comparative genomics.

Description:
The relationship between genetic variation in gene expression and phenotypic variation observable in nature is not well understood. Identifying how many phenotypes are associated with differences in gene expression and how many gene-expression differences are associated with a phenotype is important to understanding the molecular basis and evolution of complex traits. Results: We compared levels of gene expression among nine natural isolates of Saccharomyces cerevisiae grown either in the presence or absence of copper sulfate. Of the nine strains, two show a reduced growth rate and two others are rust colored in the presence of copper sulfate. We identified 633 genes that show significant differences in expression among strains. Of these genes,20 were correlated with resistance to copper sulfate and 24 were correlated with rust coloration. The function of these genes in combination with their expression pattern suggests the presence of both correlative and causative expression differences. But the majority of differentially expressed genes were not correlated with either phenotype and showed the same expression pattern both in the presence and absence of copper sulfate. To determine whether these expression differences may contribute to phenotypic variation under other environmental conditions, we examined one phenotype, freeze tolerance, predicted by the differential expression of the aquaporin gene AQY2. We found freeze tolerance is associated with the expression of AQY2. Conclusions: Gene expression differences provide substantial insight into the molecular basis of naturally occurring traits and can be used to predict environment dependent phenotypic variation.

Description:
We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.

Description:
The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Here we analyze the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikataeto study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artifacts of computational motif finding algorithms. As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding ...

Description:
All organisms have elaborate mechanisms to control rates of protein production. However, protein production is also subject to stochastic fluctuations, or noise. Several recent studies in Saccharomyces cerevisiae and Escherichia coli have investigated the relationship between transcription and translation rates and stochastic fluctuations in protein levels, or more generally, how such randomness is a function of intrinsic and extrinsic factors. However, the fundamental question of whether stochasticity in protein expression is generally biologically relevant has not been addressed, and it remains unknown whether random noise in the protein production rate of most genes significantly affects the fitness of any organism. We propose that organisms should be particularly sensitive to variation in the protein levels of two classes of genes: genes whose deletion is lethal to the organism and genes that encode subunits of multiprotein complexes. Using an experimentally verified model of stochastic gene expression in S. cerevisiae, we estimate the noise in protein production for nearly every yeast gene, and confirm our prediction that the production of essential and complex-forming proteins involves lower levels of noise than does the production of most other genes. Our results support the hypothesis that noise in gene expression is a biologically important variable, is generally detrimental to organismal fitness, and is subject to natural selection.

Description:
The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.

Description:
Sequence changes in regulatory regions have often been invoked to explain phenotypic divergence among species, but molecular examples of this have been difficult to obtain. In this study we identified an anthropoid primate-specific sequence element that contributed to the regulatory evolution of the low-density lipoprotein receptor. Using a combination of close and distant species genomic sequence comparisons coupled with in vivo and in vitro studies, we found that a functional cholesterol-sensing sequence motif arose and was fixed within a pre-existing enhancer in the common ancestor of anthropoid primates. Our study demonstrates one molecular mechanism by which ancestral mammalian regulatory elements can evolve to perform new functions in the primate lineage leading to human.

Description:
Sequence changes in regulatory regions have often beeninvoked to explain phenotypic divergence among species, but molecularexamples of this have been difficult to obtain. In this study, weidentified an anthropoid primate specific sequence element thatcontributed to the regulatory evolution of the LDL receptor. Using acombination of close and distant species genomic sequence comparisonscoupled with in vivo and in vitro studies, we show that a functionalcholesterol-sensing sequence motif arose and was fixed within apre-existing enhancer in the common ancestor of anthropoid primates. Ourstudy demonstrates one molecular mechanism by which ancestral mammalianregulatory elements can evolve to perform new functions in the primatelineage leading to human.

Description:
On the basis of the observation that conserved positions in transcription factor binding sites are often clustered together, we propose a simple extension to the model-based motif discovery methods. We assign position-specific prior distributions to the frequency parameters of the model, penalizing deviations from a specified conservation profile. Examples with both simulated and real data show that this extension helps discover motifs as the data become noisier or when there is a competing false motif.

Description:
We demonstrate the feasibility of generating thousands of transgenic Drosophila melanogaster lines in which the expression of an exogenous gene is reproducibly directed to distinct small subsets of cells in the adult brain. We expect the expression patterns produced by the collection of 5,000 lines that we are currently generating to encompass all neurons in the brain in a variety of intersecting patterns. Overlapping 3-kb DNA fragments from the flanking noncoding and intronic regions of genes thought to have patterned expression in the adult brain were inserted into a defined genomic location by site-specific recombination. These fragments were then assayed for their ability to function as transcriptional enhancers in conjunction with a synthetic core promoter designed to work with a wide variety of enhancer types. An analysis of 44 fragments from four genes found that &gt;80% drive expression patterns in the brain; the observed patterns were, on average, comprised of &lt;100 cells. Our results suggest that the D. melanogaster genome contains &gt;50,000 enhancers and that multiple enhancers drive distinct subsets of expression of a gene in each tissue and developmental stage. We expect that these lines will be valuable tools for neuroanatomy as well as for the elucidation of neuronal circuits and information flow in the fly brain.

Description:
BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of function and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.

Description:
Knowledge discovery from large and complex scientific data is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration methods and tools. The combination and close integration of methods from scientific visualization, information visualization, automated data analysis, and other enabling technologies&quot;such as efficient data management&quot; supports knowledge discovery from multi-dimensional scientific data. This paper surveys two distinct applications in developmental biology and accelerator physics, illustrating the effectiveness of the described approach.

Description:
The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

Description:
The Berkeley Drosophila Transcription Network Project (BDTNP) has developed a suite of methods that support quantitative, computational analysis of three-dimensional (3D) gene expression patterns with cellular resolution in early Drosophila embryos, aiming at a more in-depth understanding of gene regulatory networks. We describe a new tool, called PointCloudXplore (PCX), that supports effective 3D gene expression data exploration. PCX is a visualization tool that uses the established visualization techniques of multiple views, brushing, and linking to support the analysis of high-dimensional datasets that describe many genes' expression. Each of the views in PointCloudXplore shows a different gene expression data property. Brushing is used to select and emphasize data associated with defined subsets of embryo cells within a view. Linking is used to show in additional views the expression data for a group of cells that have first been highlighted as a brush in a single view, allowing further data subset properties to be determined. In PCX, physical views of the data are linked to abstract data displays such as parallel coordinates. Physical views show the spatial relationships between different genes' expression patterns within an embryo. Abstract gene expression data displays on the other hand allow for an analysis of relationships between different genes directly in the gene expression space. We discuss on parallel coordinates as one example abstract data view currently available in PCX. We have developed several extensions to standard parallel coordinates to facilitate brushing and the visualization of 3D gene expression data.

Description:
It is well established that gene expression levels in many organisms change during the aging process, and the advent of DNA microarrays has allowed genome-wide patterns of transcriptional changes associated with aging to be studied in both model organisms and various human tissues. Understanding the effects of aging on gene expression in the human brain is of particular interest, because of its relation to both normal and pathological neurodegeneration. Here we show that human cerebral cortex, human cerebellum, and chimpanzee cortex each undergo different patterns of age-related gene expression alterations. In humans, many more genes undergo consistent expression changes in the cortex than in the cerebellum; in chimpanzees, many genes change expression with age in cortex, but the pattern of changes in expression bears almost no resemblance to that of human cortex. These results demonstrate the diversity of aging patterns present within the human brain, as well as how rapidly genome-wide patterns of aging can evolve between species; they may also have implications for the oxidative free radical theory of aging, and help to improve our understanding of human neurodegenerative diseases.

Description:
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more ...

Description:
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

Description:
To better understand how developmental regulatory networks are defined inthe genome sequence, the Berkeley Drosophila Transcription Network Project (BDNTP)has developed a suite of methods to describe 3D gene expression data, i.e.,the output of the network at cellular resolution for multiple time points. To allow researchersto explore these novel data sets we have developed PointCloudXplore (PCX).In PCX we have linked physical and information visualization views via the concept ofbrushing (cell selection). For each view dedicated operations for performing selectionof cells are available. In PCX, all cell selections are stored in a central managementsystem. Cells selected in one view can in this way be highlighted in any view allowingfurther cell subset properties to be determined. Complex cell queries can be definedby combining different cell selections using logical operations such as AND, OR, andNOT. Here we are going to provide an overview of PointCloudXplore 2 (PCX2), thelatest publicly available version of PCX. PCX2 has shown to be an effective tool forvisual exploration of 3D gene expression data. We discuss (i) all views available inPCX2, (ii) different strategies to perform cell selection, (iii) the basic architecture ofPCX2., and (iv) illustrate the usefulness of PCX2 using selected examples.

Filter: Years

This dialog allows you to filter your current search.
Each of the Years listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.

Filter: Months

This dialog allows you to filter your current search.
Each of the Months listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.

Filter: Days

This dialog allows you to filter your current search.
Each of the Days listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.