CRISPR screens illuminate enhancer function

The noncoding regions around a gene that control the transcription of the protein-coding region are difficult to identify. Leveraging a CRISPR interference system (CRISPRi), Fulco et al. identified enhancer-promoter connections to map specific noncoding regions affecting gene regulation for the GATA1 and MYC loci (see the Perspective by Einstein and Yeo). Going forward, such CRISPRi-mapping can be used to evaluate promoter-enhancer screens functionally in an unbiased way.

Abstract

Gene expression in mammals is regulated by noncoding elements that can affect physiology and disease, yet the functions and target genes of most noncoding elements remain unknown. We present a high-throughput approach that uses clustered regularly interspaced short palindromic repeats (CRISPR) interference (CRISPRi) to discover regulatory elements and identify their target genes. We assess >1 megabase of sequence in the vicinity of two essential transcription factors, MYC and GATA1, and identify nine distal enhancers that control gene expression and cellular proliferation. Quantitative features of chromatin state and chromosome conformation distinguish the seven enhancers that regulate MYC from other elements that do not, suggesting a strategy for predicting enhancer–promoter connectivity. This CRISPRi-based approach can be applied to dissect transcriptional networks and interpret the contributions of noncoding genetic variation to human disease.

A fundamental goal in modern biology is to identify and characterize the noncoding regulatory elements that control gene expression in development and disease, yet we have lacked systematic approaches to do so. Studies of individual regulatory elements have revealed principles of their function, such as the ability of enhancers to recruit activating transcription factors, modify chromatin state, and physically interact with target genes (1, 2). From these insights, systematic mapping of chromatin state and chromosome conformation across cell types has been used to identify putative regulatory elements (3–6). However, these measurements do not determine which (if any) genes are regulated or assess the quantitative effects on gene expression. Indeed, the rules that connect regulatory elements with their target genes in the genome appear to be complex. Regulatory elements do not necessarily affect the closest gene, but instead may act across long distances (7, 8). It remains unclear how many regulatory elements control any given gene, or how many genes are regulated by any given element (2, 3, 8).

We developed a high-throughput approach that uses the programmable properties of clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 to characterize the regulatory functions of noncoding elements in their native contexts. We use pooled CRISPR screens in combination with CRISPR interference (CRISPRi)—which alters chromatin state at targeted loci through recruitment of a KRAB effector domain fused to catalytically dead Cas9 (dCas9) (9–12)—to simultaneously characterize the regulatory effects of up to 1 megabase (Mb) of sequence on a gene of interest (Fig. 1A) (13).

(A) CRISPRi method for identifying gene regulatory elements. Cells expressing KRAB-dCas9 from a dox-inducible promoter are infected with a pool of single guide RNAs (sgRNAs) targeting every possible site across a region of interest. In a proliferation-based screen, cells expressing sgRNAs that target essential regulatory elements are depleted in the final population. (B) CRISPRi screen results in the GATA1 locus. A high CRISPRi score indicates strong depletion over the course of the screen. Red boxes: windows showing significant depletion compared to negative control sgRNAs (13). DNase I hypersensitivity, H3K27ac ChIP-seq, and histone modification annotations (ChromHMM) in K562 cells are from ENCODE (4). (C) Close-up of e-GATA1 and e-HDAC6. sgRNA track shows CRISPRi scores for each individual sgRNA in the region. White bar in GATA1 ChIP-seq track represents the GATA1 motif. (D) qPCR for GATA1 and HDAC6 mRNA in cells expressing individual sgRNAs. KRAB-dCas9 expression was activated for 24 hours before measurement. Gray bars: different sgRNAs for each target. Ctrl: negative control sgRNAs without a genomic target. Error bars: 95% confidence intervals (CI) for the mean of 3 biological replicates (13). *P < 0.05 in t test versus Ctrl.

We studied two gene loci, GATA1 and MYC, that affect proliferation of K562 erythroleukemia cells in a dose-dependent manner (fig. S1). This allowed us to search for regulatory elements that quantitatively tune GATA1 or MYC expression using a proliferation-based pooled assay (Fig. 1A). GATA1 and MYC are not located near other strongly essential genes (fig. S1); thus, proliferation defects caused by single guide RNAs (sgRNAs) targeted to sequences near these genes can be attributed to elements regulating GATA1 or MYC. We designed a library containing 98,000 sgRNAs tiling across a total of 1.29 Mb of genomic sequence around GATA1 and MYC, as well as 85 kb of control noncoding regions (13). We infected K562 cells expressing KRAB-dCas9 under a doxycycline-inducible promoter with a lentiviral sgRNA library and sequenced the representation of sgRNAs before and after growing cells in doxycycline for 14 population doublings (Fig. 1A). As expected, internal control sgRNAs targeting the promoters of known essential genes (10) were depleted (fig. S2A) and correlated across biological replicates (Pearson’s R = 0.91, fig. S2B).

We examined the quantitative depletion of sgRNAs in a 74-kb region surrounding GATA1, which encodes a key erythroid transcription factor (Fig. 1B). Because the efficiency of different sgRNAs for CRISPRi can vary markedly (10), we used a sliding window approach, averaging the scores of 20 consecutive sgRNAs and assessing the false discovery rate (FDR) of this metric through comparison to negative control, nonessential regions (13) (fig. S3). Because the average spacing between consecutive sgRNAs was 16 base pairs (bp), the regions targeted by 20 consecutive sgRNAs spanned an average of 314 bp (fig. S3, C and D). With this approach, the window with the highest score (strongest depletion) overlapped the GATA1 transcription start site (TSS) itself (Fig. 1B and fig. S3F). In addition, we identified three distal elements that significantly affected cellular proliferation (FDR < 0.05, Fig. 1B) (13). One such element (e-GATA1) is located ~3.6 kb upstream of GATA1 and corresponds to a deoxyribonuclease I (DNase I) hypersensitive site (DHS) marked by acetylation of histone 3 at lysine-27 (H3K27ac) (Fig. 1C); notably, this element shows high sequence conservation among vertebrates, and the syntenic sequence in mouse is required for proper Gata1 expression in murine erythroid progenitor cells (14). The second distal element (e-HDAC6) corresponds to a conserved DHS located ~1.5 kb upstream of HDAC6 (Fig. 1C). The third significant element is located at a DHS near the promoter of GLOD5, which itself is not essential and only weakly expressed in K562 cells. The first two elements overlap GATA1 chromatin immunoprecipitation–sequencing (ChIP-seq) peaks and sequence motifs (Fig. 1C), consistent with known autoregulatory loops in which GATA1 activates its own expression (15). All three elements reside in close linear and spatial proximity to GATA1 (fig. S4A). Finally, multiple regions in the gene body of GATA1 scored as significantly depleted in the screen (Fig. 1B), but, because recruitment of KRAB-dCas9 to these sites may directly interfere with transcription (9), we focused on distal regulatory elements in subsequent analysis.

To characterize these elements, we measured GATA1 expression using quantitative PCR (qPCR) in cell lines stably expressing individual sgRNAs (13). As expected, targeting KRAB-dCas9 to the GATA1 TSS reduced GATA1 expression (76% reduction, Fig. 1D). sgRNAs targeting e-GATA1 or e-HDAC6 reduced GATA1 expression by 44 and 33%, respectively (Fig. 1D), and affected the expression of genes known to be regulated by the GATA1 transcription factor (fig. S4B), confirming that these enhancers regulate GATA1. By contrast, sgRNAs targeting the HDAC6 TSS did not reduce GATA1 expression despite reducing HDAC6 expression (Fig. 1D), indicating that (i) the pooled screen accurately predicted that this region does not reduce GATA1 expression and (ii) the effects seen for the e-GATA1 and e-HDAC6 sgRNAs are not due to general effects of targeting KRAB-dCas9 to the gene neighborhood. Additionally, both e-GATA1 and e-HDAC6 can activate the expression of a plasmid-based reporter gene (fig. S4C) (13). Together, these results support the specificity of this CRISPRi-based approach and demonstrate that e-GATA1 and e-HDAC6 quantitatively control GATA1 expression in K562 cells.

Considering the close proximity of GATA1 to HDAC6 (Fig. 1B and fig. S4A), we tested whether this pair of enhancers also regulates HDAC6. sgRNAs targeting e-GATA1 and e-HDAC6 reduced HDAC6 expression by 42 and 22%, respectively, comparable to their effects on GATA1 (Fig. 1D). Intriguingly, inhibition of the GATA1 promoter led to an increase in HDAC6 expression (+47%, Fig. 1D), and inhibition of the HDAC6 promoter modestly activated GATA1 (+9%, Fig. 1D); this suggests that GATA1 and HDAC6 may compete for these shared enhancers, similar to observations for other pairs of neighboring genes (16, 17). Histone deacetylases are required for erythropoiesis (18), and HDAC6 has been implicated in cellular proliferation in multiple cancers (19). Thus, although HDAC6 does not score as essential in proliferation assays in K562 cells, it is possible that proliferative defects observed upon inhibition of e-GATA1 or e-HDAC6 result from the combined effects on both GATA1 and HDAC6 expression (13), and the genomic proximity of these genes may be important for coordinating their expression in vivo. These observations indicate a complex connectivity between enhancers and promoters in their native genomic contexts (fig. S4D).

We next investigated the cis-regulatory architecture of MYC, a critical transcription factor encoded within a 3-Mb topological domain that contains hundreds of putative enhancers. Several enhancers in this domain are known to regulate MYC in other cell types (13), but chromatin state varies markedly across cell types, and it is unclear which of these elements regulate MYC in a given cell type. Notably, the domain contains more than 60 genetic haplotypes associated (through genome-wide association studies) with human phenotypes, including cancer susceptibility (20).

To identify elements that regulate MYC in K562 cells, we tiled sgRNAs across ~1.2 Mb of sequence in this topological domain (Fig. 2A). A sliding window analysis identified several regions whose inhibition reproducibly reduced cellular proliferation, including a known promoter-proximal element located 2 kb upstream of the MYC TSS (fig. S5A) (21), the transcribed region of the MYC gene body (fig. S5A), and seven distal regions (labeled e1 through e7) located between 0.16 and 1.9 Mb downstream of MYC (Fig. 2A and fig. S5, B and C). We also identified two regions that significantly increased cell proliferation (r1 and r2), and thus may repress MYC expression (Fig. 2A and fig. S5, D and E) (13).

Each of the seven putative activating elements is marked by high levels of DNase I hypersensitivity (Fig. 2A), is bound by multiple transcription factors (fig. S6A), and shows patches of sequence conservation across mammals (Fig. 2B). Each enhancer frequently contacts the MYC promoter in three dimensions as assayed by Hi-C and chromatin interaction analysis with paired-end-tag sequencing (ChIA-PET) in K562 cells (Fig. 2A) (3, 6); elements e5 and e6/7 form very long-range (>1.8 Mb) loops to the MYC promoter and are located within 10 kb of CCCTC-binding factor (CTCF) ChIP-seq peaks with motifs oriented toward MYC (fig. S5, B and C), consistent with the convergent rule for CTCF-mediated chromatin loops (6). Two elements (e3 and e4) correspond to alternative TSSs for the long noncoding RNA plasmacytoma variant translocation 1 (PVT1) (Fig. 2A); knockdown experiments indicate that the mature PVT1 RNA transcript itself is likely not essential in K562 cells (fig. S1), and so e3 and e4 likely affect cellular proliferation through direct regulation of MYC (13).

We experimentally characterized these seven activating elements to test whether they regulate MYC. CRISPRi inhibition of each of these elements with individual sgRNAs led to proliferation defects in a competitive growth assay (fig. S6B) and resulted in a 9 to 62% reduction in MYC expression (Fig. 2C). The magnitude of the change in gene expression correlated with the proliferation defect, consistent with a quantitative relationship between cell growth and precise MYC expression levels (Pearson’s R = 0.92, Fig. 2D). In a plasmid-based reporter assay, each putative regulatory element led to >5-fold up-regulation of a reporter gene relative to a control sequence (fig. S6C) (13). For a subset of the elements (e2, e3, and e4), we generated clonal cell lines containing genetic deletions on one or two of the three chromosome 8 alleles (K562 cells are triploid) and measured the expression of MYC from each allele (13). For each element, we found that genetic deletions reduced MYC expression from the corresponding allele(s), confirming our CRISPRi results (fig. S7). Together, these data support the hypothesis that these seven elements, spanning 1.6 Mb of noncoding sequence, act as enhancers to control MYC expression and cellular proliferation.

In addition to e1 to e7, we characterized one noncoding element (NS1) that did not score in the screen (Fig. 2A). In K562 cells, NS1 displays strong DHS and H3K27ac occupancy, binds to multiple transcription factors (fig. S6A), and participates in a long-range chromatin loop to the MYC promoter (Fig. 2A). In a lung adenocarcinoma cell line, NS1 regulates MYC as assayed by CRISPRi inhibition with individual sgRNAs (22). Accordingly, we wondered whether NS1 regulates MYC in K562 cells despite not being detected as such in our CRISPRi screen. To explore this possibility, we targeted KRAB-dCas9 to NS1 with individual sgRNAs in K562 cells and found that CRISPRi successfully reduced H3K27ac occupancy to an extent similar to that observed when targeting other MYC enhancers (fig. S6D). Despite affecting chromatin state at NS1 in K562 cells, these sgRNAs did not substantially affect cellular proliferation or MYC expression (Fig. 2, C and D), consistent with the results from the pooled screen. These observations support the ability of the CRISPRi screening approach to distinguish elements that do and do not regulate a given gene. However, we note that some regulatory elements, such as those that act redundantly with others in the locus, may not be discoverable by this method (13).

The ability to systematically test gene regulatory elements will help to train predictive models of functional enhancer-promoter connectivity. Notably, existing annotations and catalogs of enhancer-promoter predictions performed poorly at distinguishing e1 to e7 from enhancers that do not affect MYC expression (13). For example, ENCODE annotates 185 kb of sequence in this domain as putative “strong enhancer” in K562 cells (Fig. 2A), but only 8% of this sequence, corresponding to e1 to e7, appears to regulate MYC. We sought to improve the ability to predict enhancers and connect them with genes that they regulate. When we examined chromatin state maps (including DHS, H3K27ac, and Hi-C), we found that quantitative DHS or H3K27ac signal could distinguish most of the seven MYC enhancers but ranked them in the wrong order (fig. S8A): for example, e5 shows the strongest DHS signal, yet has the weakest effect on MYC expression (Fig. 2). Accordingly, we considered a framework (fig. S8B) wherein the impact of an enhancer on gene expression is determined both by its intrinsic activity level (for which we use quantitative DHS and H3K27ac levels as a proxy) and the frequency at which the enhancer contacts its target promoter (for which we use Hi-C data as a proxy) (13). This metric correctly ranked six of the seven distal enhancers as the most important of 93 DHS elements in K562 cells (Fig. 2E) and provided a reasonable ordering of their relative effects (Spearman correlation = 0.79). This approach did not perfectly distinguish between enhancers that do and do not regulate MYC: NS1 was ranked 7th and e6 was ranked 11th. Nonetheless, quantitative measures of chromatin state and chromosome conformation are strongly predictive of enhancers that regulate MYC in K562 cells.

To determine whether this approach might be applicable in other cellular contexts, we examined four MYC enhancers identified in other cell types (Fig. 3, A and B) (13). In each case, our metric ranked these known elements among the three most important in the corresponding cell type (Fig. 3B). We also identified multiple instances where elements predicted to regulate MYC in one or more cell types harbor single-nucleotide polymorphisms (SNPs) associated with human traits including cancer susceptibility and height (Fig. 3, C and D, and table S1). Additional CRISPRi-based functional mapping in other cell types and gene loci might allow the derivation of general models to predict functional enhancer–promoter connections and help to elucidate noncoding genetic variation.

In summary, CRISPRi screens can accurately identify and characterize the regulatory functions and connectivity of noncoding elements. In the MYC and GATA1 loci, CRISPRi reveals complex and nonobvious dependencies between multiple genes and enhancers, including relationships that suggest regulation of multiple genes by the same enhancer, coordinated activity of multiple enhancers to control a single gene, and competition between neighboring promoters. Thus, learning the principles and connectivity of transcriptional networks requires dissecting putative regulatory elements in their native genomic contexts.

Although we used cellular proliferation as a readout to investigate two essential genes, this CRISPRi approach can be applied to identify regulatory elements that control an arbitrary gene or phenotype of interest through alternative assays—for example, by tagging an endogenous gene locus with green fluorescent protein (GFP) and sorting cells by GFP expression (23).

,
A GATA box in the GATA-1 gene hematopoietic enhancer is a critical element in the network of GATA factors and sites that regulate this gene. Mol. Cell. Biol.20,
713–723 (2000).doi:10.1128/MCB.20.2.713-723.2000pmid:10611250

Acknowledgments: We thank T. Wang and R. Issner for technical advice and reagents; and R. Ryan, B. Bernstein, N. Sanjana, J. Wright, and F. Zhang for discussions. This work was supported by funds from the Broad Institute (E.S.L.). C.P.F. is supported by the National Defense Science and Engineering Graduate Fellowship. J.M.E. is supported by the Fannie and John Hertz Foundation. M.M. is supported by a Deutsche Forschungsgemeinschaft Research Fellowship. S.R.G. is supported by National Institute of General Medical Sciences grant T32GM007753. The Broad Institute, which E.S.L. directs, holds patents and has filed patent applications on technologies related to other aspects of CRISPR. J.M.E., C.P.F., and E.S.L. are inventors on a patent application filed by the Broad Institute related to this work (U.S. no. 62/401,149). Data presented in this paper can be found in the supplementary materials and in GEO Accession GSE87257. J.M.E. conceived the study. J.M.E., C.P.F., M.M., and S.R.G. designed experiments. C.P.F., M.M., R.A., G.M., E.M.P., M.K., and J.M.E. performed experiments. C.P.F., J.M.E., and B.C. analyzed data. C.P.F., J.M.E., and E.S.L. wrote the manuscript with input from all authors.