switching on genes

Researchers have modelled every atom in a key part of the process for switching on genes, revealing a whole new area for potential drug targets.

Proteins are essential for processes that sustain life. They are created in cells through a process called gene expression, which uses instructions from stretches of DNA called genes to build proteins. Sometimes genes are faulty and create proteins that contain errors, preventing the cell from functioning properly. These lead to genetic diseases like cystic fibrosis and haemophilia.

Gene expression is controlled by molecules called transcription factors, which bind to the start of a gene sequence at its ‘basal machinery’ and tell it to switch on and start creating certain proteins.

The way transcription factors bind to the basal machinery is a ‘fuzzy’ process, meaning the exact sequence of events is unknown because the steps do not exist for long enough to be captured by traditional imaging techniques.

But now, by creating a computer simulation of all of the tens of thousands of atoms making up the process and modelling their movements in 50 million separate steps, researchers at Imperial College London have been able to determine the sequence of events that lead to genes being switched on.

DISRUPTING DETRIMENTAL GENES

The simulated process revealed ‘pockets’ in the gene basal machinery, which the transcription factors move in and out of during binding. Knowing how these structures fit together could lead to the design of molecules that interfere with or disrupt the process, potentially tackling diseases.

Lead researcher Dr Robert Weinzierl from Imperial’s Department of Life Sciences said: “For the first time, we can fill in the dynamic landscape of interaction between transcription factors and basal machinery. This is a central mechanism for gene expression – the interactions here determine whether a gene gets switched on and creates proteins.”

“Gene regulation is a completely new drug target that has previously been too challenging to explore,” added Dr Weinzierl. “This process influences biology on a really fundamental level, and could allow us to prevent the expression of detrimental genes.”

FASTER DRUG SCREENING

The researchers’ new technique predicts the movements of all the atoms in order to build up a picture of the structures involved changing every couple of femtoseconds – quadrillionths of a second. The results of the first trial of the technique are reported today in PLOS Computational Biology.

Dr Weinzierl has submitted a patent application for his computer-based approach to studying gene expression interactions. Using this, compounds could be screened for possible fit into the basal machinery pockets.

“With computer simulation, it becomes easy to identify candidate compounds that could target these interactions without the need to test them first in real life, cutting down the time required to sift for new drugs,” said Dr Weinzierl.

Steps that lead to genes being switched on revealed in atomic simulation

Researchers have modelled every atom in a key part of the process for switching on genes, revealing a whole new area for potential drug targets.

Proteins are essential for processes that sustain life. They are created in cells through a process called gene expression, which uses instructions from stretches of DNA called genes to build proteins. Sometimes genes are faulty and create proteins that contain errors, preventing the cell from functioning properly. These lead to genetic diseases like cystic fibrosis and haemophilia.

Gene regulation is a completely new drug target that has previously been too challenging to explore.

Gene expression is controlled by molecules called transcription factors, which bind to the start of a gene sequence at its ‘basal machinery’ and tell it to switch on and start creating certain proteins.

The way transcription factors bind to the basal machinery is a ‘fuzzy’ process, meaning the exact sequence of events is unknown because the steps do not exist for long enough to be captured by traditional imaging techniques.

But now, by creating a computer simulation of all of the tens of thousands of atoms making up the process and modelling their movements in 50 million separate steps, researchers at Imperial College London have been able to determine the sequence of events that lead to genes being switched on.

Transcriptional activation domains (ADs) are generally thought to be intrinsically unstructured, but capable of adopting limited secondary structure upon interaction with a coactivator surface. The indeterminate nature of this interface made it hitherto difficult to study structure/function relationships of such contacts. Here we used atomistic accelerated molecular dynamics (aMD) simulations to study the conformational changes of the GCN4 AD and variants thereof, either free in solution, or bound to the GAL11 coactivator surface. We show that the AD-coactivator interactions are highly dynamic while obeying distinct rules. The data provide insights into the constant and variable aspects of orientation of ADs relative to the coactivator, changes in secondary structure and energetic contributions stabilizing the various conformers at different time points. We also demonstrate that a prediction of α-helical propensity correlates directly with the experimentally measured transactivation potential of a large set of mutagenized ADs. The link between α-helical propensity and the stimulatory activity of ADs has fundamental practical and theoretical implications concerning the recruitment of ADs to coactivators.Author Summary

The regulated transcription of eukaryotic genes is governed by gene-specific transcription factors that contain activation domains to stimulate the expression of nearby genes. Activation domains are unable to take up a defined three-dimensional conformation. Nevertheless, as we demonstrate in our study, molecular dynamics simulations reveal that the key docking point of such domains (centered around several large hydrophobic amino acid sidechains) folds into fluctuating α-helical conformations. Analysis of published data shows that this tendency of adopting such local structures correlates directly with stimulation activity. We also investigate the interaction of these structurally unstable domains with a coactivator interaction partner. Computational simulations are ideally suited for analysing the rapidly changing, “fuzzy” interactions occurring between these protein partners. We gained new insights into the competitive nature of the key hydrophobic sidechains in binding to a pocket on the coactivator surface and documented for the first time the rapidly changing movements of an activation domain during these interactions.

The last decade has seen an incredible breakthrough in technologies that allow histones, transcription factors (TFs), and RNA polymerases to be precisely mapped throughout the genome. From this research, it is clear that there is a complex interaction between the chromatin landscape and the general transcriptional machinery and that the dynamic control of this interface is central to gene regulation. However, the chromatin remodeling enzymes and general TFs cannot, on their own, recognize and stably bind to promoter or enhancer regions. Rather, they are recruited to cis regulatory regions through interaction with site-specific DNA binding TFs and/or proteins that recognize epigenetic marks such as methylated cytosines or specifically modified amino acids in histones. These “recruitment” factors are modular in structure, reflecting their ability to interact with the genome via one region of the protein and to simultaneously bind to other regulatory proteins via “effector” domains. In this chapter, we provide examples of common effector domains that can function in transcriptional regulation via their ability to (a) interact with the basal transcriptional machinery and general co-activators, (b) interact with other TFs to allow cooperative binding, and (c) directly or indirectly recruit histone and chromatin modifying enzymes.

Transcriptional activation is a stepwise process that requires (a) creating and maintaining an open chromatin structure, (b) assembly of the preinitiation complex, and (c) transition to productive elongation (Fig. 12.1). Successful completion of each of these steps involves a diverse group of proteins, some of which function in a relatively promoter-specific manner whereas others regulate large sets of genes. Recent advances in molecular and computational biology allow histone and DNA modifications, TFs, and RNA polymerases to be precisely mapped throughout the genome, relative to active or silent promoters (see [1–3] for reviews). From this research, it is becoming clear that there is a complex interaction between the chromatin landscape and the transcriptional machinery and that the dynamic relationship of this interface is central to biological control over gene expression [4]. It is now recognized that regulatory factors can exert their influence on transcriptional activation either via co-localization with other proteins that are bound at or near core promoter regions or they can be recruited to distal enhancer regions and interact with promoter-bound proteins via looping mechanisms. However, generally speaking, the chromatin remodeling enzymes and the general transcription factors involved in initiation and elongation cannot, on their own, recognize and stably bind to the promoter or enhancer regions.

Regulation of transcription. Shown is a schematic representing the three steps needed for productive transcription, including Step 1: the creation of open chromatin, which involves interactions between DNA-bound proteins and histone modifying enzymes …

One way in which chromatin remodeling enzymes and general transcription factors are recruited to cis-regulatory regions is through interaction with site-specific DNA binding TFs (Fig. 12.2a). The three largest classes of site-specific DNA binding proteins in mammals contact the genome via conserved DNA binding domains called zinc fingers, homeodomains, and helix–loop–helix domains [5] (Chapter 3 of this volume provides a catalog of eukaryotic DNA binding domains, and Chapters 4 and 5 specifically review C2H2 zinc fingers and homeodomains). Each of these classes of site-specific DNA binding factors contains many different proteins; for example, in humans there are over 650 zinc finger proteins, ~ 250 homeodomain proteins, and ~80 helix-loop-helix proteins [5]. Within each class, individual TFs can bind to and regulate hundreds to thousands of different genes. Site-specific TFs are modular in their structure reflecting their ability to bind to DNA via their DNA binding domains and simultaneously bind to other transcriptional regulatory proteins via so-called effector domains. The modular nature of site-specific TFs has been repeatedly demonstrated using in vitro and in vivo reporter assays. In these experiments, effector domains are separated from their natural DNA binding domains and then engineered to be part of a fusion protein having a heterologous DNA binding domain. Numerous studies have shown that simply bringing such effector domains to promoter regions can modulate transcription [6–8].

Another way in which chromatin remodeling enzymes and general transcription factors can be brought to the genome is via effector domains that reside in proteins that can recognize epigenomic marks. Similar to recognition of a short nucleotide motif by a DNA binding protein, other proteins can distinguish distinctively modified DNA and histone protein “motifs”. For example, methylated cytosine in the 5′-CpG-3′ dinucleotide sequence is specifically recognized by members of a family of proteins containing a conserved methyl-CpG binding domain (MBD). MBD-containing proteins, which include MeCP2, MBD1, MBD2 and MBD4, bind specifically to methyl-CpG motifs located throughout the genome [9]; see Fig. 12.2b. MBD-containing proteins function by recruiting various co-regulators to methyl-CpG sites. For example, MeCP2 simultaneously binds promoter regions containing methyl-CpG motifs and the Sin3-containing histone deacetylase complex via a transcriptional repression domain (TRD), resulting in histone deacetylation and transcriptional silencing [10, 11]. Likewise, MBD1 and MBD2 copurify with distinct cellular complexes which link DNA methylation with chromatin modification and transcriptional repression. Similarly, posttranslational modifications of the amino termini of core histones are correlated to transcriptional states and are recognized by relevant chromatin-associated proteins (Fig. 12.2c). Several different histone modifications have been identified, including acetylation, phosphorylation, and methylation, and specific protein domains have evolved to recognize several of these different modifications. For example, different methylation states of histone H3 at lysine 4 can be recognized by tudor, chromo, and plant homeodomains (PHD), by malignant brain tumor (MBT) domains, and by WD40 repeat domains (many of these domains are structurally related and are collectively referred to as the “royal family” [12], reviewed [13, 14]). Other examples of this family include the chromodomain of HP1, which interacts with lower (mono- and di-) methylation states of lysine 9 of histone H3 but preferentially binds to the trimethylated state [15, 16] and the tudor domain of 53BP1, which can discriminate between the diand tri-methyl state of H4K20, preferring the dimethyl form [17, 18]. Acetylated lysine is also recognized by specific protein modules called the bromodomain [19], which is found in many chromatin-associated proteins and in nearly all known nuclear histone acetyltransferases (HATs). Of course, epigenetic marks such as DNA methylation and histone modifications are located at specific genomic regions (which can vary in different cell types), indicating that DNA methylases and histone modifying enzymes must be recruited to the genome by sequence-specific mechanisms such as site-specific TFs or RNAs. For example, KRAB-ZNFs can recruit the KAP1/SETDB1 histone methylating complex and long non-coding RNAs can recruit the PRC2 histone methylation complex [20–23].

The focus of this chapter is on the effector domains that are brought to specific sites of the genome by DNA binding proteins, methyl-CpG binding proteins, or histone binding proteins. (The interaction of TFs with chromatin more generally is discussed in Chapter 11). We provide examples of common effector domains that can function in transcriptional regulation via their ability to influence each of the steps outlined in Fig. 12.1. Specifically, we discuss effector domains that can: (a) interact with the basal transcriptional machinery and general co-activators, (b) interact with other TFs to allow cooperative binding, and (c) directly or indirectly recruit histone and chromatin modifying enzymes.

Eukaryotic transcriptional dynamics: from single molecules to cell populations

Transcriptional regulation is achieved through combinatorial interactions between regulatory elements in the human genome and a vast range of factors that modulate the recruitment and activity of RNA polymerase. Experimental approaches for studying transcription in vivo now extend from single-molecule techniques to genome-wide measurements. Parallel to these developments is the need for testable quantitative and predictive models for understanding gene regulation. These conceptual models must also provide insight into the dynamics of transcription and the variability that is observed at the single-cell level. In this Review, we discuss recent results on transcriptional regulation and also the models those results engender. We show how a non-equilibrium description informs our view of transcription by explicitly considering time-and energy-dependence at the molecular level.

Transcriptional regulation in the nucleus is the culmination of the actions of a diverse range of factors, such as transcription factors, chromatin remodellers, polymerases, helicases, topoisomerases, kinases, chaperones, proteasomes, acetyltransferases, deacetylases and methyltransferases. Determining how these molecules work in concert in the eukaryotic nucleus to regulate genes remains a central challenge in molecular biology. Dynamics lie at the heart of this mystery. Megadalton complexes assemble and disassemble on genes within seconds1,2; nucleosome turnover ranges from minutes to hours3; and gene activity demonstrates complex temporal patterns such as oscillation and transcriptional bursting4,5. Exciting new experimental advances have enabled the study of dynamic transcriptional regulation at the single-molecule6 and genome-wide7levels, thus enhancing our understanding of transcriptional regulation in vivo. These approaches also necessitate new models for describing gene expression. In this Review, we discuss recent in vivo results and the quantitative models that are motivated by those results.

Chromatin immunoprecipitation (ChIP) provides genome-wide occupancy profiles for chromatin-interacting factors at near base-pair resolution in populations of cells8,9. Using this approach on a genome-wide level has generated comprehensive maps of regulation on a gene-by-gene basis7,8,10. This population approach has been complemented by single-cell imaging techniques. Almost all factors that have been studied by live-cell microscopy exhibit dwell times on chromatin on the order of seconds11, and single-cell studies demonstrate a great variability in gene expression among cells in a population, owing in part to the stochastic nature of transcription12. Despite these tremendous advances in understanding the behaviour of individual factors, both methods fall short of capturing the sequence of events that is required to activate or repress a gene in vivo. Ideally, the occupancy of many factors that are coincident on a single stretch of DNA would be measured to obtain a sense of the complexes and intermediates that assemble in vivo. However, this experimental challenge is a daunting one. Current re-ChIP (also known as sequential ChIP) experiments usually look at two factors4,13 but it would be necessary to look at an order of magnitude more factors to begin to capture the combinatorial complexity of transcriptional regulation in metazoans4,14–16.

The gulf between actual mechanisms of transcriptional regulation and experimental capabilities could be bridged by using quantitative models of transcription. Decades of biochemical, structural and genetic data have spawned multiple models of transcriptional regulation, several of which we discuss below (FIG. 1). Even though these views are not mutually exclusive and boundaries between them are not clear, they reflect fundamental differences regarding the mechanisms of the underlying molecular processes. Currently, most quantitative theoretical models describe transcriptional regulation as an equilibrium thermodynamic phenomenon — an assumption that allows model building without explicitly considering the dynamics. Here we explain how this description is fundamentally inconsistent with the canonical view of gene regulation based on a sequential, ordered recruitment of factors, which is an example of a non-equilibrium model. In the context of a non-equilibrium model, the transcriptional dynamics can exhibit a form of molecular memory so that the future behaviour of the system depends on its history. We will outline this gap between the molecular biologist’s canonical view of transcription and the quantitative approaches that are often used to describe it. We argue for a non-equilibrium view of transcriptional regulation that is informed and constrained by single-cell observations. With the ability to observe single transcription factors17 and single transcribing genes18 in living cells, new experimental and modelling possibilities are emerging for understanding transcription dynamics in vivo.