Abstract

The covalent attachment of ubiquitin to proteins regulates numerous processes in eukaryotic cells. Here we report the identification of 753 unique lysine ubiquitylation sites on 471 proteins using higher-energy collisional dissociation on the LTQ Orbitrap Velos. In total 5756 putative ubiquitin substrates were identified. Lysine residues targeted by the ubiquitin-ligase system show no unique sequence feature. Surface accessible lysine residues located in ordered secondary regions, surrounded by smaller and positively charged amino acids are preferred sites of ubiquitylation. Lysine ubiquitylation shows promiscuity at the site level, as evidenced by low evolutionary conservation of ubiquitylation sites across eukaryotic species. Among lysine modifications a significant overlap (20%) between ubiquitylation and acetylation at site level highlights extensive competitive crosstalk among these modifications. This site-specific crosstalk is not prevalent among cell cycle ubiquitylations. Between SUMOylation and ubiquitylation the preferred interaction is through mixed-chain conjugation. Overall these data provide novel insights into the site-specific selection and regulatory function of lysine ubiquitylation.

The proteins in a eukaryotic cell are subject to a large variety of post-translational modifications (PTMs)1, which greatly extend the diversity of the proteome and play critical roles in regulating cellular functions (1). Among the estimated 200 different PTMs, phosphorylation, methylation, acetylation, and ubiquitylation are some of the most widespread and well characterized modifications (2). Ubiquitylation refers to the covalent attachment of ubiquitin, a 76-residue polypeptide that is highly conserved among eukaryotes, via an isopeptide bond to the ε-amino group of lysines in proteins. The attachment of one or more ubiquitin moieties plays a central regulatory mechanism in eukaryotic cells, and regulates numerous cellular processes, including protein degradation, signal transduction, DNA repair, and cell division, as well as the control of stability, function, and intracellular localization of a wide variety of proteins (3).

Formation of the covalently linked ubiquitin-protein conjugates requires three enzymatic steps. First the formation is catalyzed by an ubiquitin-activating enzyme (E1) and then transferred from E1 to an ubiquitin-conjugating enzyme (E2). These two enzymes cooperate to transfer one ubiquitin moiety to a lysine residue in the target protein through a thio-esterification reaction, with the help of an ubiquitin ligase (E3) (4). The process of protein ubiquitylation can be highly dynamic and reversible, evidenced by an estimated 600 potential E3 ubiquitin ligases and some 80–90 deubiquitylating enzymes encoded by the human genome (5⇓–7). These numbers illustrate the widespread use of substrate-specific ubiquitylation as an important regulatory principle in cell biology. This is further supported by the increasing number of scientific reports, which identify defects in ubiquitin-dependent signaling pathways as being involved in multiple human diseases (8⇓–10). Despite great biological and clinical interest, the knowledge of specific lysine ubiquitylation sites is still very limited.

Liquid chromatography coupled to high-resolution mass spectrometry (LC-MS) has emerged as the key technology for large-scale identifications of various PTMs such as phosphorylation (11, 12), acetylation (13), and N-glycosylation (14). These studies have identified thousands of site specific PTMs across a variety of eukaryotic species, and provided detailed functional understanding of the investigated PTM. Although ubiquitylation was one of the first protein-based modifications to be described, a similar large-scale repository of ubiquitylation sites is currently not available. Reasons for this are the general difficulties in studying lysine ubiquitylation. Ubiquitin is a very large modification (∼8 kDa), and ubiquitylated proteins are predominantly present at substoichiometric levels compared with their unmodified counterparts. To detect low abundant ubiquitylated proteins in complex mixtures, and in order to identify the modified amino acid enrichment methods have to be applied. For ubiquitylation studies purification of ubiquitylated proteins followed by proteolysis and mass spectrometric detection is currently the analytical method of choice. Advantages of this strategy is that tryptic digest of ubiquitylated proteins degrades the ubiquitin modification similar to other proteins, leaving a small di-glycine signature peptide at the ubiquitylation site (15). Although enrichment of modified proteins aids in the detection of ubiquitylation sites, purification of intact modified proteins produces a complex peptide mixture containing only low percentages of ubiquitylated peptides. As a result, the analytical capabilities for lysine ubiquitylation studies rely largely on the sensitivity and dynamic range of the mass spectrometric methods employed.

Among large-scale studies of ubiquitylation sites utilizing a protein enrichment strategy, the largest reported identification of 110 lysine sites from yeast (15) and other studies reported up to 100 sites in human cells (16⇓⇓–19). Recently Xu et al. proposed a strategy for enriching ubiquitylated peptides using a di-glycine specific antibody, hereby identifying a total of 374 ubiquitylation sites (20). These numbers are still less than the total number of ubiquitylation events predicted solely on the sheer number of components in the ubiquitin system, and clearly underscore the analytical difficulties in studying protein ubiquitylation. We therefore reasoned that with the introduction of a new generation of high-resolution mass spectrometers (LTQ Orbitrap Velos), which allows for detection of peptide fragment ions at very high parts-per-million mass accuracy and at low sensitivity using the higher-energy collisional activation dissociation (HCD) technology (21), an improved view of global lysine ubiquitylation would be viable. Our analysis achieves very high confidence and covers a sizeable part of the human ubiquitylome, greatly expanding the number of known in vivo ubiquitylation sites and overall ubiquitin substrates.

Mass Spectrometric Analysis

All MS experiments were performed on a nanoscale high performance liquid chromatography (HPLC) system (EASY-nLC from Proxeon Biosystems, Copenhagen, Denmark) connected to a hybrid LTQ-Orbitrap Velos (Thermo Fisher Scientific) equipped with a nanoelectrospray source (Proxeon Biosystems). Each peptide sample was auto-sampled and separated in a 15-cm analytical column (75 m inner diameter) in-house packed with 3-m C18 beads (Reprosil Pur-AQ, Dr. Maisch) with a 2 h gradient from 5% to 40% acetonitrile in 0.5% acetic acid. The effluent from the HPLC was directly electrosprayed into the mass spectrometer.

Identification of Peptides and Proteins by MaxQuant

All raw data analysis was performed with MaxQuant software suite (www.maxquant.org) as described (23) supported by Mascot (www.matrixscience.com) as the database search engine for peptide identifications. Data was searched by Mascot against a concatenated target/decoy (forward and reversed) version of the IPI human database version 3.37.

RESULTS

Mapping of Lysine Ubiquitylation Sites

To identify and characterize ubiquitylation sites in human proteins, we generated a human U2OS osteosarcoma cell line stably expressing Strep-tagged ubiquitin. This approach allowed for rapid and efficient single-step purification of ubiquitylated proteins from cell extracts by means of the Strep-tag affinity purification method. We verified biochemically that ectopically expressed Strep-tagged ubiquitin was efficiently incorporated into cellular proteins in this cell line (Fig. 1A). Moreover, because Strep-ubiquitin was expressed at a substantially lower level than endogenous ubiquitin, its expression did not alter the steady-state ubiquitylation of proteins (Fig. 1B, Supplemental Figs. S1A and S1B). Like endogenous ubiquitin, the ectopically expressed Strep-HA-ubiquitin was predominantly localized in the nucleus (Fig. 1C). Finally, ectopic Strep-tagged ubiquitin recapitulated a range of biological phenotypes associated with endogenous ubiquitin, such as incorporation of polyubiquitin chains on RIP1 protein in response to TNF-α treatment (Fig. 1D and Supplemental Fig. S1A).

Characterization of a stable cell line expressing ectopic Strep-HA-ubiquitin.A, Whole-cell extracts (WCE) of U2OS cells stably expressing Strep-HA-tagged ubiquitin were subjected to Strep-Tactin Sepharose pull down and analyzed by immunoblotting with HA antibody. B, Lysates from U2OS/Strep-HA-ubiquitin cells were subjected to immunoblotting with indicated antibodies. C, U2OS/Strep-HA-ubiquitin cells grown on coverslips were fixed and immunostained with HA antibody. Scale bar, 10 μm. D, U2OS cells or U2OS/Strep-HA-ubiquitin were mock-treated or incubated in the presence of TNF-α, and harvested 1 h later. RIP1 ubiquitylation was analyzed by immunoblotting of Strep-Tactin Sepharose pull downs from cell extracts with RIP1 antibody. E, U2OS/Strep-HA-ubiquitin cells were processed as in (A), resolved on SDS-PAGE and Coomassie-stained. The lane containing Strep-protein complexes was divided into 20 slices and processed for mass spectrometric analysis of site-specific ubiquitylation.

The enriched fractions of ubiquitylated proteins were resolved on SDS-PAGE, Coomassie-stained and in-gel digested (Fig. 1E). We recently reported that usage of iodoacetamide, a cysteine alkylating reagent commonly used in mass spectrometric sample preparation, mimics the di-glycine tag used for site-specific identification of protein ubiquitylation (24). Consequently, iodoacetamide was substituted with chloroacetamide in order to avoid any in vitro artifacts. Trypsin-digested peptides were separated and detected by means of LC-MS utilizing recently introduced LTQ Orbitrap Velos. The mass spectrometer delivers improved operation of HCD, in which fragment ion spectra are acquired at low sensitivity and high mass accuracy (25) allowing for increased confidence in peptide identifications (26). The overall peptide mass accuracy was in the parts-per-billion range (average absolute mass accuracy = 0.385 ppm (parts/million), Supplemental Fig. 2A), and by employing HCD fragmentation similar high accuracy was achieved on all fragment ions (average absolute HCD fragment ion mass accuracy = 2.83 ppm, calculated from 4,698,861 HCD fragment ions, Supplemental Fig. 2B). Using this combined strategy we identified 333 lysine ubiquitylation sites belonging to 217 proteins and a total of 3.612 ubiquitin substrates in a single analysis of U2OS cells.

To assess the depth of ubiquitylation coverage, we performed a biological U2OS replicate experiment along with a similar analysis of whole cell extracts of a HEK293T cell line (data not shown). In total 66 liquid chromatography (LC)-tandem MS (MS/MS) runs were acquired and the resulting raw data were processed with the MaxQuant software (23). In total we identified 753 ubiquitylation sites on 471 proteins (Fig. 2A) at an overall false discovery rate for peptides and proteins of less than 1%. Exact localization of ubiquitylation sites were determined using the lysine di-glycine tag with an average localization probability of 0.99 for all identified peptides (Supplemental Table 1), greatly assisted by the increased fragmentation efficiency of HCD (21). Although NEDD8 produces an identical proteolytic lysine tag as ubiquitin, no known NEDD8 modification sites on Cullin proteins were identified illustrating the ubiquitin specificity of our enrichment analysis. No N-terminal substrate ubiquitylation was identified (27). Altogether, we identified 5756 putative ubiquitylated proteins (ubiquitin substrates) with 99% certainty (Supplemental Table 2). Unambiguous identification only requires a few peptides per protein; however, for each identified ubiquitin substrate we achieved an average sequence coverage of 21.5%, whereas for proteins where a site-specific ubiquitylation was identified the average sequence coverage was 55.8%.

A, Overlap of ubiquitylation sites from three analyzed experiments. Experiments one and two were biological U2OS replicates, whereas experiment three was enrichment analysis of HEK293T cells. B, Overlap of identified ubiquitylation substrates from same three experiments. C, Normalized abundance measurement of poly-ubiquitin chain linkages on ubiquitin by MS. Data follows previous results reported in the literature. D, Lysates from U2OS cells co-transfected with indicated combinations of Myc-ubiquitin and FLAG-Cbx1 constructs were subjected to FLAG immunoprecipitation followed by immunoblotting with Myc antibody to visualize Cbx1 ubiquitylation.

Of all identified peptides only 1.4% was found to be ubiquitylated (753 ubiquitylated peptides out of 54,846 peptides, Supplemental Table 3). Despite this large discrepancy of ubiquitylated peptides, which arises from the analytical constraints of enriching ubiquitylated proteins (as mentioned above), a 40%–55% overlap in identified ubiquitylation sites among cell lines was observed. This overlap is significantly larger than the overlap to previously published ubiquitylation studies (Supplemental Fig. 3) and reflects in proteomic peptide studies a high degree of reproducibility (28).

As the protein enrichment constraint mainly affects identification of ubiquitylation sites, a larger overlap (75%–80%) was observed for the 5.756 ubiquitin substrates identified in our combined pull-down approach (Fig. 2B). Collectively this demonstrates that our experimental methodology is accurate and reproducible, and that ubiquitylation is a widespread PTM with an expected in vivo expression similar to phosphorylation and acetylation.

Analysis of Lysine Ubiquitylation Sites

The most prominent function of ubiquitin is labeling proteins for proteasomal degradation. The minimal ubiquitin signal necessary for this process is a chain of four ubiquitin molecules linked through K48 (29). However, of the seven possible linkages within ubiquitin other lysine residues such as K11 (30) can target proteins for degradation. Hence, in order to estimate the extent of identified ubiquitylation sites that targets proteins for proteasomal degradation we assessed our data for ubiquitylation of ubiquitin itself.

We found all seven previously reported lysine residues in ubiquitin to be ubiquitylated (15). Quantification of these ubiquitin linkages is not straightforward as different peptides have varying ionizing properties in the electrospray ionization process affecting their detection propensity (31, 32). By normalizing peptide intensities between ubiquitylated peptides and their unmodified counterparts (see Supplementary method description) we estimated the relative abundance of all seven linkages and found K48, K11, and K63 to be the most abundant linkages followed by K6, K27, K29, and K33 (Fig. 2C, Supplementary methods) and follows what has previously been observed in the literature (33).

As mono- and poly-ubiquitylation chains cannot be distinguished by the mass spectrometric analysis employed, our measurement reflects the total pool of ubiquitin-ubiquitin linkages. And because K48 poly-linkages tend to encompass longer ubiquitin chains, the identified sites do not only represent ribosomal translation products. This especially because our dataset contains many known sites involved in various signaling modules, such as FANCD2 (34), FANCI (35), and PCNA (36). But because the vast majority of identified sites are novel ubiquitylation sites no exact conclusion can be drawn from this.

Among other Ubiquitin-like (Ubl) family members that share the three-dimensional structure of ubiquitin, and which also conjugate to target proteins, we find ubiquitylation of SUMO1, SUMO2, SUMO3, and ATG12 (Supplemental Fig. 4A). Previous reports identified SUMO2 to be in vivo ubiquitylated at K11, K32, and K41 (37). We confirm those in vivo sites and further find K5, K7, and K21 to be ubiquitylated. Additionally we report, for the first time, lysine ubiquitylation of SUMO1 (K25 and K37) and ATG12 (K136).

Based upon Gene Ontology (GO) annotations we find the largest number of ubiquitylation sites to be localized in the cytoplasm (29% of total sites). However, sites with exclusive nuclear annotation were almost equally represented (Supplemental Fig. 2D). Among known nuclear ubiquitylations are sites belonging to histone H2A and H2B protein variants, two chromatin proteins discovered to be ubiquitylated more than 30 years ago (38). Ubiquitylated H2B is associated with transcriptionally active chromatin and is required for methylation of histone H3 on residues K4 and K79, whereas ubiquitylated H2A is associated with transcriptional silencing (39). Besides these known sites we report seven novel ubiquitylation sites on various histone H2A isoforms and three on H2B. Moreover, three novel sites on histone H1.2 are identified (K46, K64, and K75) and four on histone H4 (K32, K60, K78, and K80). These novel histone H4 ubiquitylation sites fit well with previous in vitro experiments demonstrating that recombinant histone H4 is modified with exactly four ubiquitylations (40, 41). Until now the exact localization of these ubiquitylation sites were not known.

Other chromatin proteins known to be ubiquitylated, but for which the exact lysine ubiquitylation sites are not known, are CBX1 and CBX3 (42). CBX1, CBX3, and CBX5 (also known as heterochromatin proteins 1 (HP1); -β, -γ, and -α respectively) are considered “gatekeepers” of histone H3 methyl-K9-mediated gene silencing (43). In the histone code, H3K9 methylation is regarded as an epigenetic mark for gene silencing, to which HP1 binds in order to establish and maintain higher-order chromatin structure (44). Recently it was proposed that HP1 proteins are modified in a similar fashion as histone proteins, PTMs suggested to define a regulatory “subcode” involved in gene silencing (42). We report one ubiquitylation site on CBX1 (K181) and one on CBX3 (K154). To demonstrate in vivo ubiquitylation of CBX1 we generated a protein analog lacking the last six C-terminal residues and hence the identified ubiquitylation site (referred to as CBX1ΔC). CBX1ΔC displayed a strongly reduced propensity to undergo ubiquitylation, indicating that ubiquitylation plays a broader role in the histone code than currently believed (Fig. 2D).

Additional validation of in vivo lysine ubiquitylation sites is performed for proteins RPA14, RPA70, USP5, Rad18, MCM7, and DDB1 (Supplemental Fig. 5). Combined these results illustrate the improved sensitivity and extent of our dataset and confirm that the mapped ubiquitylation sites are of in vivo origin.

Lysine Selection by the Ubiquitin System

The selection of target lysine residues by the ubiquitin system is commonly believed to be based on lysine accessibility and not the primary sequence context (45⇓–47), as the ubiquitin system has to transfer an entire protein to the modification site rather than a small chemical group (e.g. a phosphate or acetyl group). However, the accessibility preference has never been experimentally validated on a larger scale.

We analyzed our data for local sequence context around the identified ubiquitylation sites by extracting a ± six residues sequence window surrounding every ubiquitylation site (48). Overall no significant sequence recognition motif was detected among the identified ubiquitylation sites (Fig. 3A), which could reflect that the investigated ubiquitylation sites are the outcome of numerous active enzymes. However, when comparing the derived ubiquitin sequence context to that of an equally sized modification system, such as phosphorylation (Supplemental Fig. 4B), it becomes evident that selection of lysine ubiquitylation sites cannot be explained through sequence content or unique ligase selectivity. A previous sequence study suggested KXL, LXXXK, AXXXXK, and KXXXXXG as potential ubiquitin recognition motifs (20). However, these motifs were based upon fewer identified ubiquitylation sites, the usage of low-accuracy ion trap instrumentation and antibody-based enrichment. Thus most probable are the outcomes of subtle epitope preferences by the antibody used or artifacts from the poorer statistics and/or MS data quality. Overall our observation that no recognition motif is present among identified ubiquitylation sites is supported by previous experimental investigation of the dihydrofolate reductase degradation signal (49). Although only few proteins were investigated, no dependence on the unique amino acid sequences in the vicinity of the ubiquitylated residue was observed.

In the same work it was discovered that two closely residing lysine residues in concert affected the in vivo half-life of dihydrofolate reductase (49). We therefore decided to investigate if such a preference existed on a global scale. In order to conduct such an analysis we performed an amino acid property analysis by clustering the five amino acids N- and C-terminally of every ubiquitylation site. In agreement with previous observations, a significant preference for positively charged residues (Lysine and Arginine [KR]) upstream of the ubiquitylation site was observed (Fig. 3B, blue bars). No such preference was observed downstream of ubiquitylation sites.

Besides charged amino acids, a significant preference for Alanine and Glycine residues [AG] surrounding ubiquitylation sites indicates a preference by the ubiquitin system for sequence accessibility (Fig. 3B, green bars). Exclusion of cysteine residues downstream of ubiquitylation sites is observed (yellow bars). Improved accessibility by minimizing the occurrence of cysteine-bridges close to the ubiquitylation site, or avoiding errors in the thio-esterification reaction of the ubiquitin transfer between E2 enzymes and target lysines could be an explanation for this. Exclusion of cysteine because of its biological function is evidenced by no significant changes in methionine occurrence (data not shown). For a more detailed investigation of the accessibility preference by the ubiquitin system, we compared the secondary structure of ubiquitylated lysines to average lysine amino acids. As shown in Fig. 3C, ubiquitylated lysines have a significantly different preference for secondary structure compared with the average lysine residue. In general, ubiquitylated lysines are found in ordered helical regions more frequently (p < 1.65E-6) and less frequently in unstructured coil regions (p < 4.07E-6). In addition to ordered regions we further evaluated our identified lysine ubiquitylation sites for solvent accessibility using the NetSurfP tool (50). We found that 91.3% of all identified lysine ubiquitylation sites are exposed to the protein surface as compared with 88.0% of the average lysine residue (p < 2.3E-3, Fig. 3D). Although the difference in lysine surface exposure is subtle, the statistical significance along with the helical preference validates that ubiquitin substrates are targeted differently from sequence motif-dependent modifications such as phosphorylation, which mainly occur in unstructured regions of proteins (51). Moreover, as phosphorylation is sequence motif dependent, the phosphorylation sites are highly conserved across various species. In contrast, protein ubiquitylation has been shown, in a few selected proteins, to occur on a variety of lysine residues with low selectivity or even on lysine residues introduced at nonnative positions (52). This suggests a more promiscuous nature of lysine ubiquitylation compared with other PTMs and consequently, ubiquitylated lysine residues should exhibit low evolutionary conservation. Consistent with this prediction, a phylogenetic analysis revealed that only 6% of all identified ubiquitylation sites were conserved in more than 50% of all eukaryotic species (Fig. 4A, blue bars), only marginally larger than the average lysine residue which shows 3% conservation (Fig. 4A, red bars). This is in striking contrast to the high evolutionary conservation of the ubiquitin system itself across eukaryotic species. Thus, ubiquitylation shows promiscuity at the site level and E3 ligases targets lysine residues for ubiquitylation based upon their accessibility and not according to a sequence recognition motif. This preference has previously been observed, but never validated in an unbiased in vivo system where all lysine sites are available. In addition, our data confirms that the actual lysine ubiquitylation step indeed is not a substrate recognition determinant within the ubiquitin system. Taken together, substrate recognition in the ubiquitin process relies upon the specificity of E2 and E3 enzymes (53), either through the substrate recognition specificity of E3 or through a gained specificity by combinatorial interactions between E2 and E3 (54).

A, Conservation analysis of average lysines and ubiquitylated lysines across eukaryotic species. Ubiquitylated lysines show a slightly higher conservation compared with average lysines (6% and 3% respectively, of lysines being conserved in more than 50% of eukaryotic species). B, Distribution of N-terminal amino acids belonging to (i) 471 proteins identified as being lysine ubiquitylated (blue bars), (ii) all 5.600 putative ubiquitylated proteins identified in entire experiment (red bars) and (iii) all proteins in the human database (green bars). A significant enrichment for N-terminal Serine and Alanine residues is observed among ubiquitylated proteins.

It is widely known that E3 ligases can catalyze both initial ubiquitin-substrate and subsequent ubiquitin-ubiquitin ligation. One example is UBR1, a RING E3 ligase that recognizes and binds to proteins bearing specific destabilizing N-terminal residues according to the “N-end rule pathway” (49, 55, 56). To investigate the global extent of such a degradation signal in our dataset we compared the N-terminal amino acid of all 471 proteins, for which a lysine ubiquitylation site was identified, to all N-terminal residues in the human protein database. No correlation was found toward destabilizing amino acids among these 471 ubiquitylated proteins, except a significant enrichment for N-terminal alanine and serine residues (Fig. 4B, blue colored bars). To further validate this preference we compared N-terminal amino acids from all 5.756 identified ubiquitylation substrates (Supplemental Table 3) to the human database, and found identical enrichment for alanine and serine whereas other destabilizing amino acids were significantly down-regulated (Fig. 4B and Supplemental Fig. 6). Notably, the observation that the identified substrates follow the same statistical trend as lysine ubiquitylation sites further proves that our Strep enrichment strategy targets mainly putative ubiquitin substrates.

N-terminal alanine and serine residues are regarded type-3 destabilizing amino acids within the N-end rule pathway, but at the same time the most prominent residues to become N-acetylated (57). A recent study showed that N-terminal acetylation of proteins creates specific degradation signals (58), thus N-terminal acetylation seems to play a larger role in the global protein degradation signal as compared with the N-degron signaling of the N-end rule pathway.

Modification Crosstalk on Lysine Residues

The positively charged side chains of lysine residues are targets for a range of PTMs (59). These PTMs are mutually exclusive and thus generate a great potential for cross-regulation. Accumulating evidence suggests that cross-talk among these known lysine modifications plays an important role in the control of vital cellular functions. The most apparent effect is the inhibition of proteasome-mediated protein degradation through acetylation-based protection of lysines (60⇓–62).

Ubiquitylation and Acetylation Cross-Talk

Investigations of ubiquitylation and acetylation site-specific cross-talk has primarily been performed in the context of epigenetic analysis. Comparing our ubiquitin data to the recently published lysine acetylome by Choudhary et al. (13) provides a mean of estimating the extent of competitive cross-talk between lysine ubiquitylation and lysine acetylation on a global scale. We generated a 21 amino-acid-sequence window surrounding all identified ubiquitylation sites and compared this to the lysine acetylation counterpart. The width of the sequence window was chosen empirically as to contain sufficient sequence information for unique site identification. Using this approach we were able to identify 152 lysine residues identified as being both ubiquitylated and acetylated. This corresponds to 20.2% of all identified lysine ubiquitylation sites (Fig. 5A) and is significantly larger than what would be expected at random (p < 2.2E-16). Prompted by these results and by the large number of ubiquitylation sites in the nucleus (Supplemental Fig. 2D), we performed a similar cross-talk analysis on nuclear ubiquitylation sites based upon their GO cellular compartment annotation. Of 183 identified nuclear ubiquitylation sites, 45 sites (24.6%, p < 8.29E-15) were found to be acetylated at the same lysine residue (listed in Fig. 6). The observed overlap between lysine ubiquitylation and acetylation is statistically more profound than the corresponding ubiquitylation overlap among same-cell lysates (Fig. 5A). Hence, as the number of identified ubiquitylation sites increases, the site-specific ubiquitylation/acetylation overlap is expected to increase accordingly.

Table of 45 nuclear lysine residues identified as being both ubiquitylated and acetylated. Proteins to which the modified sites belong are listed with gene name. Localization probability of the identified ubiquitylation site, as well as charge state and measured mass (m/z) of the modified peptide are listed.

To evaluate this cross-talk in more detail we decided to investigate all cell cycle annotated proteins found to be ubiquitylated in our dataset (according to the proteins GO biological process annotation, Supplementary methods). The protein degradation dependence in the cell cycle is signified by a shorter half-life (faster turnover) of proteins regulating cell cycle progression (63), and is crucially dependent on the ubiquitin-mediated degradation of key regulatory proteins (64⇓–66). A short in vivo half-life of cell cycle regulators provide a way to generate spatial gradients and allow for rapid adjustments of protein concentration through changes in the rate of protein synthesis. Moreover, the discovery of ubiquitin-independent mechanisms for rapid degradation of key cell cycle regulators further emphasizes the need for fast and efficient protein degradation within the cell cycle (67, 68). Taken together this emphasizes that cell cycle ubiquitylations primarily target proteins for degradation. Hence, if acetylation-based protection of lysine residues affects protein degradation the occurrence of ubiquitylation/acetylation cross-talk among cell cycle regulating proteins should be minimal.

In total we identify 41 ubiquitylated lysine sites on the GO biological process annotated cell cycle proteins (Fig. 5B, fully red colored circles). Among these ubiquitylation sites five have previously been reported (red and yellow bicolored circles), and only two ubiquitylation sites are known to be acetylated (K248 on PCNA and K50 on YWHAE). Compared with the large overlap within the nucleus, the observed cell cycle ubiquitylation/acetylation overlap is remarkably reduced (Fig. 5A). This observation is especially striking considering that the cell cycle pathway is the largest interaction network of the acetylome (13). Overall these data confirm that lysine-inhibition by acetylation plays an important and wide-spread role in protein degradation within the cell.

Site-specific Interactions Between Ubiquitylation and SUMOylation

Interaction between ubiquitylation and SUMOylation is well known and occurs either through sequential poly-conjugation at the same lysine residue (69), or through antagonistic competition for the same acceptor lysine (70, 71). Currently less than 100 SUMOylation sites have been reported in the literature. A comprehensive comparison between SUMOylation and ubiquitylation sites is therefore not statistically viable. Instead, for estimation of the preferred interaction between these two modifications we first looked at identified ubiquitylation sites residing in the SUMO2/3 consensus sequence ψKxE (where ψ is Val, Ile, Leu, Met, Phe, Pro, or Cys, and x is any amino acid) (72). Only 2% of all identified lysine ubiquitylation sites where located in this motif, illustrating that antagonistic cross-talk between ubiquitylation and SUMO2 cannot be frequently occurring. In comparison, all known lysine SUMOylation sites on SUMO2 are identified as ubiquitylation targets in our dataset, strongly indicating that signaling interaction between SUMOylation and ubiquitylation preferentially occur through mixed-chain conjugation. Reasons for the limited site-specific and competitive cross-talk could be because of SUMOylation and ubiquitylation differing widely with respect to their biological function (73).

The observed mixed-chain conjugation between SUMOylation and ubiquitylation is similar to the known poly-ubiquitylation of ubiquitin (vide supra). Thus, similar to ubiquitin we estimated the relative abundance of identified SUMO2 ubiquitylation sites (Fig. 5D). We find that K11 is the most abundant ubiquitin linkage on SUMO2 followed by K7, K42 and K33. These results follow previous observations that K11 is the major internal SUMO2 acceptor site for SUMO poly-conjugation (74). Moreover, K11 is the only residue found to be acetylated, emphasizing the importance of this particular lysine residue. As our abundance measurement reflects the total pool of ubiquitin-linkages to SUMO2, it confirms that mixed-chain conjugation between SUMO2 and ubiquitin, particularly through K11, is widely occurring. Hence, the biological importance of mixed-chain interaction between SUMOylation and ubiquitylation appears to be more profound than currently anticipated.

DISCUSSION

Although lysine ubiquitylation has been studied extensively over the past 30 years, only a limited number of ubiquitylation sites have been identified so far and specific questions regarding the entire ubiquitin system still need to be answered. Here we describe a large-scale identification of lysine ubiquitylation and demonstrate that our approach identifies in vivo ubiquitylation sites. In total we identify 753 sites belonging to 471 proteins, extending the current spectrum of lysine ubiquitylations in human proteins by twofold as compared with available and validated databases. Thus, our data contains valuable information for the broad scientific field related to ubiquitin and ubiquitin-like proteins.

As the exact process by which ubiquitin is conjugated to lysine residues still remains elusive, it is vital to understand every step in the ubiquitylation process. This includes sequence specific information on ubiquitylated lysine residues and their surrounding amino acids, studies for which large-scale mass spectrometric analysis is an ideal method of choice. Our data answers a long-standing question on how the ubiquitin system globally determines which specific lysine residues are targeted for ubiquitylation when all lysine residues are available. We find that lysine ubiquitylation sites are selected based upon surface accessibility, and not according to their unique sequence features. It follows from these results that prediction of ubiquitylation sites based solely upon sequence motif location of lysine residues is not feasible. Moreover, our data reveals that prediction of lysine ubiquitylation based upon conservation of lysine residues between eukaryotic species is not straightforward. Most ubiquitin acceptor lysines are currently identified through site-directed mutagenesis using either sequence location or site conservation predictions. Though mutagenesis is a powerful tool to biologically validate ubiquitylation sites, the method is very laborious and is unable to discriminate between ubiquitin sites and loss of ubiquitylation because of substrate changes caused by the mutation. Consequently, experimental approaches encompassing mass spectrometric analysis appear to be the most suitable option for large-scale identification of lysine ubiquitylation sites.

As lysine residues are targets of several PTMs besides ubiquitylation, our data provide an unparalleled opportunity to conduct site-specific cross-talk assessment of various lysine modifications. Our dataset reveals a large, and site-specific, overlap between lysine ubiquitylation and acetylation. Lysine acetylation is known to form dynamic regulatory programs through cross-talk with a variety of PTMs, and many acetylated proteins are key components of various signaling pathways. Although the scientific interest in this competitive cross-talk with lysine ubiquitylation is steadily increasing, investigations into its cellular extent have so far been hampered by the lack of large ubiquitylation repositories. Our data reveals that 20% of all identified ubiquitylations are also known to be acetylated illustrating that competitive cross-talk between these two PTMs is extensive. The biological importance of such cross-talk is highlighted by observations that among cell cycle proteins the ubiquitylation/acetylation cross-talk is virtually absent despite 25% of all nuclear ubiquitylation sites are found to be acetylated. As fast and efficient degradation of key regulators during the unidirectional cell cycle progression is of the utmost importance, it is crucial that ubiquitin-dependent and ubiquitin-independent degradation operates in a timely, consistent and rapid fashion. Hence, inhibition of protein degradation by acetylation-based protection of lysine residues is not favored during cell cycle progression.

In addition to acetylation we find strong interaction between ubiquitylation and SUMOylation. We identify several novel ubiquitylation sites located on various ubiquitin-like proteins, and site-specific analysis reveals that interaction between SUMOylation and ubiquitylation preferentially occur through mixed-chain conjugation. Although the exact role of this conjugation is unclear, evidence in the literature has shown that mixed-chain conjugation does not target SUMO proteins for proteasomal destruction (69). The large number of ubiquitylation sites identified on SUMO proteins suggests that mixed-chain conjugation is more abundant than previously assumed.

In conclusion, our data provide an extensive insight into human lysine ubiquitylation sites, along with novel information regarding the nature of lysine ubiquitylations. We demonstrate that lysine ubiquitylation is an extensive PTM displaying considerable interplay with other modifications to form codified intermolecular signaling programs that are crucial for cell function. The large resource of high-confidence lysine ubiquitylation sites identified will be made available through post-translational modification databases such as Phosida (75). The MS data is deposited at Tranche (www.proteomecommons.org; Trance folder “Lysine Ubiquitylation study”.

Acknowledgments

We thank members of the NNF-CPR for fruitful discussions and careful reading of the manuscript.

Footnotes

↵* The work carried out in this study was in part supported by the Novo Nordisk Foundation Center for Protein Research and by the Lundbeck Foundation.