Dipeptide repeat peptides on the attack

Certain neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS), are associated with expanded dipeptides translated from RNA transcripts of disease-associated genes (see the Perspective by West and Gitler). Kwon et al. show that the peptides encoded by the expanded repeats in the C9orf72 gene interfere with the way cells make RNA and kill cells. These effects may account for how this genetic form of ALS causes disease. Working in Drosophila, Mizielinska et al. aimed to distinguish between the effects of repeat-containing RNAs and the dipeptide repeat peptides that they encode. The findings provide evidence that dipeptide repeat proteins can cause toxicity directly.

Abstract

Many RNA regulatory proteins controlling pre–messenger RNA splicing contain serine:arginine (SR) repeats. Here, we found that these SR domains bound hydrogel droplets composed of fibrous polymers of the low-complexity domain of heterogeneous ribonucleoprotein A2 (hnRNPA2). Hydrogel binding was reversed upon phosphorylation of the SR domain by CDC2-like kinases 1 and 2 (CLK1/2). Mutated variants of the SR domains changing serine to glycine (SR-to-GR variants) also bound to hnRNPA2 hydrogels but were not affected by CLK1/2. When expressed in mammalian cells, these variants bound nucleoli. The translation products of the sense and antisense transcripts of the expansion repeats associated with the C9orf72 gene altered in neurodegenerative disease encode GRn and PRn repeat polypeptides. Both peptides bound to hnRNPA2 hydrogels independent of CLK1/2 activity. When applied to cultured cells, both peptides entered cells, migrated to the nucleus, bound nucleoli, and poisoned RNA biogenesis, which caused cell death.

Among familial causes of amyotrophic lateral sclerosis (ALS) and/or frontotemporal dementia (FTD), between 25 and 40% of cases are attributed to a repeat expansion in a gene designated C9orf72, with an open reading frame (ORF). The hexanucleotide repeat sequence GGGGCC normally present in 2 to 23 copies is expanded in affected patients to 700 to 1600 copies (1, 2). The pattern of genetic inheritance of the C9orf72 repeat expansion is dominant, and multiple lines of evidence suggest that the repeat expansion causes disease. Two theories have been advanced to explain repeat-generated toxicity. First, in situ hybridization assays have identified nuclear dots containing either sense or antisense repeat transcripts (3–5), which leads to the idea that the nuclear-retained RNAs might themselves be toxic. More recently, equally clear evidence has been generated that both the sense and antisense transcripts of the GGGGCC repeats associated with C9orf72 can be translated in an ATG-independent manner (without an ATG start codon) known as repeat-associated non-ATG (RAN) translation (6). Depending on reading frame, the sense transcript of the repeats can be translated into glycine:alanine (GAn), glycine:proline (GPn), or glycine:arginine (GRn) polymers. RAN translation of the antisense transcript of the GGGGCC repeats of C9orf72 lead to the production of proline:alanine (PAn), proline:glycine (PGn), or proline:arginine (PRn) polymers. These repeat-encoded polymers are expressed in disease tissue (5, 7–9). The disordered and hydrophobic nature of these polymers, at least the GAn, GPn, and PAn versions, properly predicted that they would aggregate into distinct foci within affected cells (5, 9). Another plausible explanation for repeat-generated toxicity is the idea that the polymeric aggregates resulting from RAN translation of either the sense or antisense repeats are themselves toxic.

Here, we investigated a third and distinct interpretation as to the underlying pathophysiology associated with repeat expansion of the hexanucleotide repeats associated with the C9orf72 gene. We suggest that two of the six RAN translation products, GRn encoded by the sense transcript and PRn encoded by the antisense transcript, act to alter information flow from DNA to mRNA to protein in a manner that poisons both pre-mRNA splicing and the biogenesis of ribosomal RNA.

Our standard method of retrieving proteins enriched in unfolded, low-complexity (LC) sequences involves the incubation of cellular lysates with a biotinylated isoxazole (b-isox) chemical (10). When incubated on ice in aqueous buffers, the b-isox chemical crystallizes. X-ray diffraction analyses of the b-isox crystals revealed the surface undulation of peaks and valleys separated by 4.7 Å. It is hypothesized that, when exposed to cell lysates, disordered, random-coil sequences can bind to the surface troughs of b-isox crystals and, thereby, be converted to an extended β-strand conformation. When the crystals are retrieved by centrifugation, they selectively precipitate DNA and RNA regulatory proteins endowed with LC sequences. When these methods were used to query the distribution of nuclear proteins precipitated by b-isox microcrystals, scores of proteins annotated as being involved in the control of pre-mRNA splicing were retrieved (11).

Many splicing factors contain long repeats of the dipeptide sequence serine:arginine (SR). Given the LC nature of SR domains, we hypothesized that it was this determinant that facilitated b-isox precipitation. Focusing on a member of the SR protein family that has been studied extensively, serine:arginine splicing factor 2 (SRSF2), we appended its SR domain to green fluorescent protein (GFP) to ask whether the SR domain might be sufficient to mediate b-isox precipitation. GFP is a well-folded protein that, alone, is not precipitated by b-isox crystals (10). When fused to the SR domain of SRSF2, GFP was precipitated efficiently by b-isox crystals (fig. S1, A and B).

When incubated at high concentrations, the LC domains of certain RNA regulatory proteins, including FUS, EWS, TAF15, and hnRNPA2, polymerize into amyloid-like fibers. In a time- and concentration-dependent manner, these fibers adopt a hydrogel-like state (10). No evidence of polymerization or hydrogel formation was observed upon incubation of the GFP fusion protein containing the SR domain of SRSF2 (designated GFP:SRSF2). We then asked whether the fusion protein might be bound and retained by hydrogel droplets formed from polymers of the LC domain of hnRNPA2 (10). Indeed, GFP:SRSF2 bound avidly to hydrogel droplets formed from the LC domain of hnRNPA2 (Fig. 1A).

Hydrogel droplets composed of mCherry fused to the LC domain of hnRNPA2 were incubated with protein solution of GFP-fused to SR domains from either SRSF2 (A) or SRSF2G1/G2 (B). Both GFP proteins bound well to the mCherry:hnRNPA2 hydrogels as revealed by GFP signal trapped at the periphery of hydrogel droplets (22). After overnight incubation with either CLK1 or CLK2, prebound GFP-fused SR domain of SRSF2 was released from the mCherry:hnRNPA2 hydrogels in the presence of ATP [third and fifth panels of (A)]. The GFP-fused to the SR domain of SRSF2G1/G2 was resistant to CLK1/2-mediated release from hydrogels [third and fifth panels of (B)].

The SR domains of splicing factors can be phosphorylated (12–14). Two related protein kinase enzymes, CDC2-like kinase 1 (CLK1) and CDC2-like kinase 2 (CLK2), phosphorylate serine residues within SR domains (fig. S1C) (15–18). In order to ask whether phosphorylation of SR domains might affect their binding to hydrogel droplets formed from the LC domain of hnRNPA2, we prebound the GFP:SRSF2 fusion protein and then exposed the droplets to ATP alone, CLK1/2 enzymes alone, or a mix of ATP and enzymes. Release of the GFP:SRSF2 test protein was observed in a time-, enzyme-, and ATP-dependent manner (Fig. 1A).

The CLK1/2 protein kinases themselves contain SR domains, presumably to help guide these enzymes to the proper subnuclear locations where they serve to regulate the activities of SR domain–containing splicing factors (17). Hydrogel droplets were coexposed to GFP:SRSF2 along with a derivative of CLK2 containing an SR domain. In this case, exposure to ATP alone facilitated release of GFP:SRSF2, presumably because of activation of the CLK2 enzyme held by its SR domain in proximity to the GFP:SRSF2 test protein (fig. S1D).

The SRSF2 splicing factor contains two SR domains, one located between residues 117 and 169 of the polypeptide and another located between residues 177 and 221 (fig. S2). Out of 21 serine residues within the former SR domain, 16 were mutated to glycine, which led to a variant designated SRSF2G1. Likewise, 14 out of 17 serine residues within the latter SR domain were mutated to glycine, which led to the SRSF2G2 variant. These two mutants were recombined to produce the SRSF2G1/G2 variant (fig. S2). The altered SR domain of the SRSF2G1/G2 variant was fused to GFP (GFP:SRSF2G1/G2), expressed in bacteria, purified, and exposed to hnRNPA2 hydrogel droplets. Like the native SR domain, the GFP:SRSF2G1/G2 variant bound to the hydrogel droplets. By contrast, when the bound hydrogel droplets were exposed to ATP and either of the CLK1/2 enzymes, no GFP was released (Fig. 1B).

Binding of native and serine-to-glycine variants of SRSF2 to nuclear puncta

SR domain–containing pre-mRNA splicing factors localize to various puncta in the nucleus of eukaryotic cells (19). In interphase nuclei, SR-containing proteins are found in puncta variously termed “interchromatin granule clusters” or nuclear speckles. These puncta are roughly 1 to 3 μm in diameter and are composed of smaller granules connected by a thin fibril (20). Hypophosphorylated SR domains associate with the periphery of nucleoli in a region termed nucleolar organizing region (NOR)–associated patches (NAPs) (21). Knowing that the SR domain of the SRSF2G1/G2 mutant binds to hnRNPA2 hydrogels in a manner immune to CLK1/2-mediated release, we transfected cultured cells with GFP-tagged versions of the four SRSF2 variants (the native protein and the three mutants: SRSF2G1, SRSF2G2, and SRSF2G1/G2). Unlike the native SRSF2 protein, which distributed to nuclear speckles, the other three proteins associated with nucleoli (Fig. 2A). When cotransfected with an expression vector encoding CLK1 enzyme, partial release from nucleoli was observed for the SRSF2G1 and SRSF2G2 mutants, yet no release was observed for the SRSF2G1/G2 mutant (Fig. 2B). Thus, it appears that, by changing serine residues to glycine in the three mutants of SRSF2, we created mimics of the hypophosphorylated state of SR proteins. Because the SRSF2G1 and SRSF2G2 proteins can only be partially phosphorylated by CLK1, and because the SRSF2G1/G2 variant cannot be phosphorylated, we reason that these proteins become trapped in nucleoli at an early stage of the pathway of nuclear speckle formation and pre-mRNA splicing.

Fig. 2Native or S-to-G mutated variants of SRSF2 localize to different nuclear puncta.

GFP fusion proteins linked to either the native, full-length SRSF2, or the SRSF2G1, SRSF2G2 or SRSF2G1/G2 mutants were transfected in U2OS cells in the absence (A) or presence (B) of a coexpressed mCherry:CLK1 fusion protein. The native SRSF2 protein localized to nuclear speckles and was dispersed into the nucleoplasm in the presence of cotransfected mCherry:CLK1. The SRSF2G1 and SRSF2G2 mutants localized to nucleoli as deduced by costaining with antibodies specific to the nucleolar marker, fibrillin. The SRSF2G1 mutant was partially redistributed from nucleoli to the cytoplasm in the presence of mCherry:CLK1. The SRSFG2 mutant was partially redistributed from nucleoli to the nucleoplasm in the presence of mCherry:CLK1. Coexpression of mCherry:CLK1 had no effect on the nucleolar localization of the SRSF2G1/G2 mutant.

The sense and antisense transcripts of the GGGGCC repeat expansions associated with familial forms of ALS and FTD can be translated in an ATG-independent manner (5, 7, 9). Depending on reading frame, the sense repeat transcript encodes GAn, GPn, or GRn polymers. Likewise, the antisense transcripts of the repeats encode PAn, PGn, and PRn polymers. We focused on the GRn translation product of the sense repeat transcript and the PRn translation product of the antisense repeat transcript for three reasons. First, these polymers are considerably more hydrophilic than the GAn, GPn, PAn, and PGn polymers and are less likely to aggregate. Second, the GRn and PRn polymers might, by virtue of the abundance of arginine residues, be self-programmed to return to the nucleus after cytoplasmic translation (owing to the fact that nuclear localization signals tend to be enriched in basic amino acids). Third, these polymers are reminiscent of the SRSF2G1, SRSF2G2, and SRSF2G1/G2 variants that bound to hnRNPA2 hydrogel droplets independent of the effects of the CLK1 enzyme (Fig. 1) and are associated tightly with nucleoli in living cells (Fig. 2).

GFP derivatives were prepared that contained 20 repeats of the dipeptide sequence SR, GR, or PR (22). After expression in bacterial cells and purification, each fusion protein was incubated with hnRNPA2 hydrogels. Unlike GFP itself, which did not bind to any of the hydrogels used, the GFP:SR20, GFP:GR20, and GFP:PR20 fusion proteins bound avidly to hnRNPA2 hydrogel droplets. When protein-bound hydrogels were exposed to CLK1 or CLK2 in the presence of ATP, GFP:SR20 was liberated but not GFP:GR20 or GFP:PR20 (Fig. 3). We interpret these results in the same way as observations made with GFP:SRSF2 and its serine-to-glycine variants (Figs. 1 and 2). CLK1/2-mediated phosphorylation of the serine residues in the GFP:SR20 fusion protein is interpreted to facilitate its release from hnRNPA2 hydrogel droplets. Because the GRn and PRn polymers have no serine residues, they cannot be phosphorylated and released from hydrogels upon exposure to CLK1/2 and ATP.

Recombinant fusion proteins linking GFP to 20 repeats of the SR, GR, or PR polymers (GFP:SR20, GFP:GR20, or GFP:PR20) were applied to slide chambers containing mCherry:hnRNPA2 hydrogel droplets. After overnight incubation at 4°C; all three proteins were trapped to the periphery of the hydrogels droplets (top). When incubated with reaction mixtures containing either the CLK1 or CLK2 protein kinase enzymes, prebound GFP:SR20 was released from the hydrogels in an ATP-dependent manner. GFP:GR20 or GFP:PR20 prebound to mCherry:hnRNPA2 hydrogel droplets were immune to the release by CLK1 or CLK2, even in the presence of ATP.

The GRn and PRn RAN translation products of C9orf72 penetrate cells, migrate to the nucleus, bind nucleoli, and kill cells

Polymeric versions of the GRn and PRn translation products of C9orf72 were synthesized that contained 20 dipeptide repeats terminated by an epitope tag (22). The synthetic peptides were solubilized in aqueous buffer and applied to cultured U2OS cells (a human osteosarcoma cell line) for 30 min at 10 μM. The cells were then fixed and stained with antibodies capable of recognizing the hemagglutinin (HA) epitope tag. Both the GR20 and PR20 polymers entered cells, migrated to the nucleus, and bound to nucleoli (Fig. 4A). The morphology of U2OS cells was altered after prolonged exposure to the GR20 and PR20 translation products of C9orf72 hexanucleotide repeats. Alteration in cell morphology was more pronounced for the PR20 peptide than for GR20. Within 24 hours of exposure to 10 μM of PR20, U2OS cells began to display a spindlelike phenotype. Upon exposure to 30 μM of the PR20 peptide for 24 hours, almost all cells were detached from the culture substrate and dead (fig. S3A). Similar effects on cell morphology and viability were observed for cultured human astrocytes (fig. S3B).

(A) Peptides containing 20 repeats of GR or PR (GR20 or PR20, respectively) were synthesized to contain an HA epitope tag and applied to cultured U2OS cancer cells (left) or human astrocytes (right). Cells were fixed and stained with either the HA-specific antibody (green signal) or an antibody to the nucleolar protein fibrillin (red signal). Both GR20 and PR20 synthetic peptides associated prominently with nucleoli. Measurements of U2OS cell viability revealed toxicity in response to both PR20 (B) and GR20 (C) synthetic peptides. Cell viability was measured at 72 or 12 hours after initial treatment of PR20 or GR20, respectively. In the case of GR20 peptide, the medium was replaced every 2 hours to supplement fresh peptide. The PR20 and GR20 synthetic peptides killed U2OS cells with IC50 levels of 5.9 and 8.4 μM, respectively.

The stability of the GR20 and PR20 peptides was analyzed by immunoblotting. After the administration of a single dose of each peptide, U2OS cells were incubated for the indicated time periods. After retrieval of culture medium, cells were then washed with phosphate-buffered saline, lysed, and deposited onto nitrocellulose dot blots that were probed with antiserum specific to the HA epitope. These measurements gave evidence of a relatively short half-life for the GR20 peptide (20 to 30 min), but a much longer half-life for the PR20 peptide (72 hours) (fig. S4, A and B).

Cell viability was then measured for cultures exposed to varying levels of the PR20 (22). A median inhibitory concentration (IC50) value of 5.9 μM was observed for the PR20 peptide (Fig. 4B). Similar cellular toxicity was observed for the GR20 peptide, but only when the GR20 peptide be added every 2 hours (Fig. 4C). Cell death in response to the PR20 peptide was also time-dependent. After administration of 10 μM of the PR20 peptide, half-maximal impact on cell viability was observed roughly 36 hours later (fig. S4E). When cells were exposed to a 30 μM dose of the peptide, 50% cell death was observed at 6 hours (fig. S4F).

Exposure of cultured cells to the GR20 and PR20 translation products of C9orf72 impairs both pre-mRNA splicing and the biogenesis of ribosomal RNA

Having observed that GR20 and PR20 translation products of the C9orf72 hexanucleotide repeats bound nucleoli and killed cultured cells, we wondered whether this might be the consequence of alterations in RNA biogenesis. To this end, cultured human astrocyte cells were exposed for 6 hours to the synthetic PR20 peptide and used to prepare RNA for deep sequencing. Computational analysis of the RNA-sequencing (RNA-seq) data predicted alteration in splicing in a variety of cellular mRNAs (22). Validation of predicted changes in pre-mRNA splicing was conducted by use of strategically designed polymerase chain reaction (PCR) primers (22). PCR products consistent with predicted alterations in splicing were subjected to DNA sequencing (fig. S5A). In all cases, predicted changes in pre-mRNA splicing were confirmed, with the degrees of effect on splicing ranging from modest, in the cases of the nascent polypeptide-associated complex subunit alpha (NACA) and RAN guanosine triphosphatase (GTPase) mRNAs, to severe, in the cases of the pentraxin-related protein PTX3 and the growth arrest and DNA damage-inducible GADD45A mRNAs. Administration of the PR20 peptide caused exon 2 skipping of the mRNA encoding the RAN GTPase, which resulted in removal of the first 88 residues of the protein (Fig. 5A and fig. S5B). Furthermore, PR20 administration caused exon 2 skipping of the mRNA encoding the PTX3, which predicted an in-frame deletion of 135 amino acids (Fig. 5B and fig. S5C). PR20 administration also caused the mRNA encoding NACA to contain a different 5′ untranslated region (5′ UTR) (Fig. 5C). Finally, PR20 administration caused the mRNA encoding GADD45A protein to include the full intronic sequences on both sides of exon 2 in the mature transcript, which altered the ORF in a manner expected to inactivate the GADD45A protein if translated from the aberrantly spliced mRNA (Fig. 5D).

Computational analysis of RNA-seq data further revealed changes in the abundance of a subset of cellular RNAs as a function of administration of the PR20 peptide (table S1). A large fraction of the altered RNAs encoded ribosomal proteins (Fig. 5E) or small nucleolar RNAs (snoRNAs). In both cases, PR20 administration enhanced RNA abundance. Having observed that both the GR20 and PR20 peptides bound to nucleoli and having observed changes in the abundance of snoRNAs and mRNAs encoding ribosomal proteins, we investigated the central task of nucleoli to synthesize mature ribosomal RNA (rRNA). Nine PCR primer pairs were designed to interrogate the synthesis and processing of rRNA (22). Three monitored the levels of the mature 18S, 5.8S, and 28S rRNAs. The other six primers were designed to monitor the 45S rRNA precursor, including pairs that probed: (i) the initial, 5′ end of the precursor that is eliminated along the pathway of rRNA maturation; (ii) the precursor junction at the 5′ end of 18S rRNA; (iii) the precursor junction at the 3′ end of 18S rRNA; (iv) the precursor junction at the 5′ end of 5.8S rRNA; (v) the precursor junction at the 3′ end of 5.8S rRNA; and (vi) the precursor junction at the 5′ end of 28S rRNA (see Fig. 5F). RNA was prepared from human astrocytes exposed for 12 hours to vehicle alone or to 10 μM or 30 μM of the PR20 peptide. Slight reductions in 28S rRNA were observed in the samples derived from cells treated with 30 μM of the PR20 peptide. To our surprise, the level of 5.8S rRNA was reduced by 70% under these conditions (Fig. 5F).

Evidence of impediments in the production of rRNA was confirmed upon evaluation of junctional PCR probes. The quantitative PCR (qPCR) primers specific for the 5′ transcribed spacer at the front end of the rRNA precursor revealed a 20% elevation of the precursor in cells exposed to the lower, 10 μM, concentration of the PR20 peptide. The first junctional probe also revealed an elevation in immature rRNA, the second junctional probe revealed normal levels of rRNA precursor, the third probe revealed roughly 20% attenuation of the precursor, the probe monitoring the 3′ junction of 5.8S rRNA revealed 40% attenuation, and the probe monitoring the 5′ junction of 28S rRNA revealed normal precursor levels. Cells exposed to 30 μM of the PR20 peptide revealed reductions in the 45S rRNA precursor consistent with a similar, 5′ to 3′ polarity of impediment. Indeed, the qPCR primer pair monitoring processing at the 3′ terminus of 5.8S rRNA revealed a 70% drop. Irrespective of whether these effects result from altered transcription of rRNA genes, altered processing of the 45S rRNA precursor, or both, these assays provide evidence of nucleolar dysfunction in cells treated with the PR20 RAN translation product of the C9orf72 hexanucleotide repeats.

The GRn and PRn synthetic peptides alter splicing of the EAAT2 transcript in a pattern identical to that observed in ALS patients

Having observed global alterations in pre-mRNA splicing in cells exposed to the PR20 synthetic peptide, we asked whether alterations in pre-mRNA splicing might have been observed in the study of patient-derived tissues. Splicing of the transcript encoding a glutamate transporter designated excitatory amino acid transporter 2 (EAAT2) is altered in ALS patients. Two hallmarks of the altered pattern were the skipping of exon 9 and the inclusion of 1008 nucleotides of intronic sequence downstream from the splice donor site of exon 7 (23).

In order to ask whether a similar pattern of derangement might result from exposure of cells to the PR20 synthetic peptide, we incubated cultured human astrocytes with 10 or 15 μM of the polymer for 36 hours, a time point corresponding to roughly 50% reduction in cell viability (fig. S4E). RNA was then extracted and subjected to PCR analysis as a means of resolving the architecture of the EAAT2 transcript (22). PCR products diagnostic of both the exon 9–skipped form of the EAAT2 mRNA, as well as the 1008-nucleotide extension of intron inclusion beyond the splice donor site of exon 7, appeared in a concentration-dependent manner in human astrocytes as a function of exposure to the PR20 polymer (Fig. 6). Both PCR products were sequenced and found to replicate the pattern of aberrant splicing first described in ALS patients precisely (fig. S6A).

It is possible that generalized cell toxicity commonly causes an idiosyncratic pattern of aberrant splicing of the EAAT2 mRNA, which could explain why cells poisoned by the PR20 peptide replicate the same pattern of improper splicing observed in patient samples. To test this hypothesis, human astrocytes were individually exposed to four other toxins, doxorubicin, taxol, staurosporin, and cytochalasin D. RNA was prepared from cells exhibiting clear evidence of toxicity and analyzed by PCR as a means of testing for the patterns of aberrant EAAT2 pre-mRNA splicing commonly observed in astrocytes treated with the PR20 peptide and from brain samples derived from patients suffering the GGGGCC repeat expansion in the C9orf72 gene. None of the four toxins gave evidence of aberrant splicing of the EAAT2 mRNA (fig. S6B).

Discussion

Here we report several findings that may be relevant to both the basic science of gene expression and the pathophysiology of a specific type of neurodegenerative disease. First, alternative splicing factors containing SR domains interact with fibrous polymers of LC domains in a manner reversible by phosphorylation. When appended to a GFP reporter, these SR domains bind to hydrogel droplets formed from polymeric fibers derived from the LC domain of hnRNPA2. This binding is reversed upon phosphorylation of serine residues in the SR domains by either of two CDC2-like kinases, CLK1 and CLK2, that are known to phosphorylate SR domains in living cells (16, 17). Mutational change of the serine residues of the SR domain in the SRSF2 alternative splicing factor to glycine resulted in repetitive GRn sequences that retain binding to hnRNPA2 hydrogels, but are not affected by CLK1/CLK2-mediated phosphorylation.

Moving from test tubes to cells, cells expressing variants of SRSF2, wherein serine residues of its two SR domains were uniformly changed to glycine, revealed association of the altered splicing factor with nucleoli. Cotransfection of an expression vector encoding the CLK1 kinase failed to liberate the SRSF2G1/G2 variant from its nucleolar localization. Together, these data give evidence of a pathway in which SR domain–containing splicing factors first enter a nucleolar compartment in a hypophosphorylated state (21) then migrate to nuclear speckles as a function of phosphorylation by the CLK1/2 family of protein kinase enzymes.

We do not know the identity of the nucleolar target of hypophosphorylated SR domains. Many aspects of the behavior of SR domains in cells can be mimicked by their attachment to hydrogel droplets composed of polymeric fibers of the LC domain of hnRNPA2, including that the interaction can be readily reversed by phosphorylation of serine residues by the CLK1/2 protein kinase enzymes. We speculate that the nucleolar target of hypophosphorylated SR domains will also represent a polymeric fiber not unlike the hnRNPA2 fibers described herein.

Two of the RAN translation products of the hexanucleotide repeats associated with disease variants of the C9orf72 gene behaved as cytotoxins that impeded pre-mRNA splicing and the biogenesis of ribosomal RNA. The relevant peptides are polymers of one of two dipeptide sequences, GRn or PRn. The density of arginine residues favorable for solubility might also facilitate nuclear import and cell penetrability. Repetitive arginine residues might account for nuclear entry by mimicking the positive charge prototypical of nuclear localization signals (24, 25). Likewise, arginine-rich peptides, such as the HIV TAT peptide, are readily able to penetrate cells (26). Here, both of the GR20 and PR20 peptides entered cells, migrated to the nucleus, and associated with the periphery of nucleoli. We hypothesize that the binding of GRn and PRn polymers to nuclear puncta suspected to represent an early stage in the complex process of pre-mRNA splicing (21) may clog the pathway. This concept of peptide-induced toxicity differs from earlier studies that gave evidence of cytoplasmic aggregates observed in HeLa cells expressing a fusion protein linking GFP to five repeats of the GR dipeptide (7), immunostaining assays of ALS disease tissue showing cytoplasmic aggregates of the GR RAN translation product (9), and immunostaining of ALS disease tissue showing cytoplasmic aggregates of the PR RAN translation product (5). Our concept of nucleolar binding of the PRn and GRn RAN translation products and consequential impediments to RNA biogenesis and earlier concepts of aggregate-mediated toxicity generated by any or all of the five RAN translation products of the sense or antisense transcripts of the C9orf72 repeats are not necessarily mutually exclusive.

We offer three reasons to believe that the toxicities driven by the GRn and PRn RAN translation products may account for the pathophysiological deficits observed in nerve cell dysfunction in patients carrying repeat expansions in the C9orf72 gene. First, very specific alterations in splicing of the EAAT2 mRNA have been described from brain tissue derived from C9orf72 patients, including the skipping of exon 9 and the inclusion of 1008 nucleotides of intronic sequence distal to the splice donor site of exon 7. Administration of the PR20 peptide to human astrocytes derived from normal subjects led to the same changes in EAAT2 pre-mRNA splicing. Second, RNA-seq studies of cells exposed to the GR20 and PR20 peptides revealed changes in the expression of snoRNA known to be important for maturation of rRNA. These data properly predicted that peptide-treated cells would suffer deficits in rRNA maturation. Third, recent studies of brain tissue derived from patients carrying repeat expansions in the C9orf72 gene have given evidence of nucleolar disorder, including impediments in the processing of the 45S ribosomal RNA precursor (27). Thus, we conclude that administration of the GR20 and PR20 peptides to normal human astrocytes leads to pathophysiological deficits that mimic those observed in disease tissue.

In the context of disease progression of both ALS and FTD patients carrying a repeat expansion in the C9orf72 gene, nerve cell degeneration only begins after 40 or more years of age (28). How is it that RAN translation products appear with such delayed kinetics? Perhaps RAN translation of the hexanucleotide repeats may take place at very low levels in presymptomatic decades. Stochastically, the heavy burden of RNA biogenesis demanded of neurons may eventually lead to sufficient expression of the GRn and/or PRn peptides to begin to mildly impede nucleolar function and pre-mRNA splicing. This impediment might favor the generation of damaged ribosomes that could themselves favor RAN translation relative to normal protein synthesis. Alternatively, improperly spliced mRNAs might affect events such as the control of nuclear import and/or export as regulated by the RAN GTPase, perhaps favoring export of sense or antisense transcripts of the expanded hexanucleotide repeats in the C9orf72 gene for cytoplasmic translation. If either or both of these alterations slightly favored production of the GRn and/or PRn peptides, the process could eventually snowball to the point of nucleolar catastrophy and nerve cell death.

We close with two final considerations. First, we do not know what pathway of cell death results from the toxic activities of the GRn or PRn polymers. If the death pathway were messy, spilling out cellular contents, it is possible that the GRn and PRn polymers of a dead neuron could be taken up by neighboring cells just as we have observed for cultured cells and so facilitate the pathological spread of toxicity. Second, we offer the idea that what has been witnessed in this work could reflect the failed birth of a gene. Whereas the repeat expansion in C9orf72 can be considered to have generated a new protein-coding gene, it is ultimately toxic to the organism. Could it be that the scores of LC sequences associated with DNA and RNA regulatory proteins in eukaryotic cells evolved in this same manner?

Acknowledgments: We thank M. Brown for suggesting to S.L.M. that the RAN translation products of the hexanucleotide repeats expanded in the C9orf72 gene might be involved in RNA biogenesis as understood in our solid-state conceptualization of information transfer from gene to message to protein. We also thank B. Tu, D. Nijhawan, T. Han, L. Avery, and M. Rosen for stimulating discussion, and J. Steitz, C. Emerson, B. Alberts, and A. Horwich for helpful comments on the composition of our manuscript. This work was supported by unrestricted endowment funds provided to S.L.M. by an anonymous donor.