Genomes provide an abundance of putative binding sites for each transcription factor (TF). However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Because the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features, which consistently improved the modeling of in vitro TF−DNA binding specificities. Although additional factors affect TF binding in vivo, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone.

Abstract

DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF−DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included cross-validation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shape-augmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.

1Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL) Lausanne, Switzerland

Correspondence: Pierre Gönczy, E-mail: pierre.gonczy@epfl.ch

Dear Editor:

The two gametes make different contributions to the zygote at fertilization. Although both gametes contribute genetic material, in most animal species the oocyte donates the bulk of cytoplasmic constituents and cellular organelles, including mitochondria, whereas the sperm donates two centrioles. Centrioles are microtubule-based organelles that serve as templates for the axoneme of cilia and flagella across eukaryotic evolution, and as platforms for centrosome assembly in most animal cells1. How long the two centrioles contributed by the sperm persist in the developing embryo is not known in any system. More generally, although centrioles are reputed to be stable structures, the extent to which their constituents persist over several cell cycles has been scarcely studied. Two instances of centriolar components being stable for one (α/β tubulin in mammalian cells) or two (SAS-4 in C. elegans) cell cycles have been reported2,3,4, but whether these and other centriolar components remain stable for more cell cycles is not known.

We used C. elegans as a model system to assess the persistence of centriolar components in the context of a developing organism. We utilized a marked mating experimental strategy to specifically mark paternally contributed centriolar components and assess their persistence throughout embryonic development (Figure 1Aand Supplementary information, Data S1). In brief, feminized fog-2 (q71) mutant worms lacking sperm were mated with males expressing a given centriolar protein fused to GFP, and the resulting embryos were analyzed using immunofluorescence with antibodies against GFP and the pan-centriolar marker IFA. Five evolutionarily conserved proteins (SPD-2, ZYG-1, SAS-4, SAS-5 and SAS-6) are specifically required for centriole formation in C. elegans1, a process that also requires the centriolar constituents α/β-tubulin5,6. For our analysis, we selected SAS-4, which has been reported to persist for two cell cycles in C. elegans3,4, as well as the β-tubulin protein TBB-2, since centriolar α/β tubulin has been shown to persist for one cell cycle in mammalian cells2. Moreover, we analyzed the behavior of SAS-6, the founding member of a protein family critical for the onset of centriole assembly across eukaryotic evolution7,8.

About a dozen measurements of Newton's gravitational constant, G, since 1962 have yielded values that differ by far more than their reported random plus systematic errors. We find that these values for G are oscillatory in nature, with a period of , an amplitude of , and mean-value crossings in 1994 and 1997. However, we do not suggest that G is actually varying by this much, this quickly, but instead that something in the measurement process varies. Of other recently reported results, to the best of our knowledge, the only measurement with the same period and phase is the Length of Day (LOD —defined as a frequency measurement such that a positive increase in LOD values means slower Earth rotation rates and therefore longer days). The aforementioned period is also about half of a solar activity cycle, but the correlation is far less convincing. The 5.9 year periodic signal in LOD has previously been interpreted as due to fluid core motions and inner-core coupling. We report the G/LOD correlation, whose statistical significance is 0.99764 assuming no difference in phase, without claiming to have any satisfactory explanation for it. Least unlikely, perhaps, are currents in the Earth's fluid core that change both its moment of inertia (affecting LOD) and the circumstances in which the Earth-based experiments measure G. In this case, there might be correlations with terrestrial-magnetic-field measurements.

We use the WISE-2MASS infrared galaxy catalogue matched with Pan-STARRS1 (PS1) galaxies to search for a supervoid in the direction of the cosmic microwave background (CMB) cold spot (CS). Our imaging catalogue has median redshift z ≃ 0.14, and we obtain photometric redshifts from PS1 optical colours to create a tomographic map of the galaxy distribution. The radial profile centred on the CS shows a large low-density region, extending over tens of degrees. Motivated by previous CMB results, we test for underdensities within two angular radii, 5°, and 15°. The counts in photometric redshift bins show significantly low densities at high detection significance, ≳5σ and ≳6σ, respectively, for the two fiducial radii. The line-of-sight position of the deepest region of the void is z ≃ 0.15–0.25. Our data, combined with an earlier measurement by Granett, Szapudi & Neyrinck, are consistent with a largeRvoid = (220 ± 50) h−1 Mpc supervoid with δm ≃ −0.14 ± 0.04 centred at z = 0.22 ± 0.03. Such a supervoid, constituting at least a ≃3.3σ fluctuation in a Gaussian distribution of the Λ cold dark matter model, is a plausible cause for the CS.

The application of statistics to science is not a neutral act. Statistical tools have shaped and were also shaped by its objects. In the social sciences, statistical methods fundamentally changed research practice, making statistical inference its centerpiece. At the same time, textbook writers in the social sciences have transformed rivaling statistical systems into an apparently monolithic method that could be used mechanically. The idol of a universal method for scientific inference has been worshipped since the “inference revolution” of the 1950s. Because no such method has ever been found, surrogates have been created, most notably the quest for significant p values. This form of surrogate science fosters delusions and borderline cheating and has done much harm, creating, for one, a flood of irreproducible results. Proponents of the “Bayesian revolution” should be wary of chasing yet another chimera: an apparently universal inference procedure. A better path would be to promote both an understanding of the various devices in the “statistical toolbox” and informed judgment to select among these.

It has been observed that concentrated solutions of short DNA oligomers develop liquid crystal ordering as the result of a hierarchically structured supramolecular self-assembly. In mixtures of oligomers with various degree of complementarity, liquid crystal microdomains are formed via the selective aggregation of those oligomers that have a sufficient degree of duplexing and propensity for physical polymerization. Here we show that such domains act as fluid and permeable microreactors in which the order-stabilized molecular contacts between duplex terminals serve as physical templates for their chemical ligation. In the presence of abiotic condensing agents, liquid crystal ordering markedly enhances ligation efficacy, thereby enhancing its own phase stability. The coupling between order-templated ligation and selectivity provided by supramolecular ordering enables an autocatalytic cycle favouring the growth of DNA chains, up to biologically relevant lengths, from few-base long oligomers. This finding suggests a novel scenario for the abiotic origin of nucleic acids.