Protein Glycosylation

Glycosylation, the attachment of sugar moieties to proteins, is a post-translational modification (PTM) that provides greater proteomic diversity than other PTMs. Glycosylation is critical for a wide range of biological processes, including cell attachment to the extracellular matrix and protein-ligand interactions in the cell. This PTM is characterized by various glycosidic linkages, including N-, O- and C-linked glycosylation, glypiation (GPI anchor attachment) and phosphoglycosylation. Glycoproteins can be detected, purified and analyzed by different strategies, including glycan staining and visualization, glycan crosslinking to agarose or magnetic resin for labeling or purification, or proteomic analysis by mass spectrometry, respectively.

Learn more

Introduction

Glycosylation is a critical function of the biosynthetic-secretory pathway in the endoplasmic reticulum (ER) and Golgi apparatus. Approximately half of all proteins typically expressed in a cell undergo this modification, which entails the covalent addition of sugar moieties to specific amino acids. Most soluble and membrane-bound proteins expressed in the endoplasmic reticulum are glycosylated to some extent, including secreted proteins, surface receptors and ligands and organelle-resident proteins. Additionally, some proteins that are trafficked from the Golgi to the cytoplasm are also glycosylated. Lipids and proteoglycans can also be glycosylated, significantly increasing the number of substrates for this type of modification.

Scope

Protein glycosylation has multiple functions in the cell. In the ER, glycosylation is used to monitor the status of protein folding, acting as a quality control mechanism to ensure that only properly folded proteins are trafficked to the Golgi. Sugar moieties on soluble proteins can be bound by specific receptors in the trans Golgi network to facilitate their delivery to the correct destination. These sugars can also act as ligands for receptors on the cell surface to mediate cell attachment or stimulate signal transduction pathways (1). Because they can be very large and bulky, oligosaccharides can affect protein-protein interactions by either facilitating or preventing proteins from binding to cognate interaction domains. Because they are hydrophilic, they can also alter the solubility of a protein (2).

Distribution

Glycosylated proteins (glycoproteins) are found in almost all living organisms that have been studied, including eukaryotes, eubacteria and archae (3,4). Eukaryotes have the greatest range of organisms that express glycoproteins, from single-celled to complex multicellular organisms.

Glycoprotein diversity

Glycosylation increases the diversity of the proteome to a level unmatched by any other post-translational modification. The cell is able to facilitate this diversity, because almost every aspect of glycosylation can be modified, including:

Glycosidic linkage – the site of glycan (oligosaccharide) binding

Glycan composition – the types of sugars that are linked to a particular protein

Glycan structure – branched or unbranched chains

Glycan length – short- or long-chain oligosaccharides

Glycosylation is thought to be the most complex post-translational modification because of the large number of enzymatic steps involved (5). The molecular events of glycosylation include linking monosaccharides together, transferring sugars from one substrate to another and trimming sugars from the glycan structure. Unlike other cell processes such as transcription or translation, glycosylation is non-templated, and thus, all of these steps do not necessarily occur during every glycosylation event. Instead of using templates, cells rely on a host of enzymes that add or remove sugars from one molecule to another to generate the diverse glycoproteins seen in a given cell. While it may seem chaotic because of all of the enzymes involved, the different mechanisms of glycosylation are highly-ordered, step-wise reactions in which individual enzyme activity is dependent upon the completion of the previous enzymatic reaction. Because enzyme activity varies by cell type and intracellular compartment, cells can synthesize glycoproteins that differ from other cells in glycan structure (5).

Enzymes that transfer mono- or oligosaccharides from donor molecules to growing oligosaccharide chains or proteins are called glycosyltransferases (Gtfs). Each Gtf has specificity for linking a particular sugar from a donor (sugar nucleotide or dolichol) to a substrate and acts independent of other Gtfs. These enzymes are broad in scope, as glycosidic bonds have been detected on almost every protein functional group, and glycosylation has been shown to incorporate most of the commonly occurring monosaccharides to some extent (6).

Glycosidases catalyze the hydrolysis of glycosidic bonds to remove sugars from proteins. These enzymes are critical for glycan processing in the ER and Golgi, and each enzyme shows specificity for removing a particular sugar (e.g., mannosidase).

Types of glycosylation

Glycopeptide bonds can be categorized into specific groups based on the nature of the sugar-peptide bond and the oligosaccharide attached, including N-, O- and C-linked glycosylation, glypiation and phosphoglycosylation. Because N- and O-glycosylation and glypiation are the most commonly detected types of glycosylation, more emphasis in this article will be placed on these modifications.

Proteins are not restricted to a particular type of glycosylation. Indeed, proteins are often glycosylated at multiple sites with different glycosidic linkages, which depends on multiple factors including those described below.

1. Enzyme availability

Glycosylation is controlled by moving proteins to areas with different enzyme concentrations; the cell sequesters enzymes into specific compartments to regulate their activity. For example, after a protein is N-glycosylated in the ER, glycan processing occurs in a step-wise fashion by trafficking proteins to distinct Golgi cisternae that contain high concentrations of specific Gtfs and glycosidases.

2. Amino acid sequence

Besides the requirement for the right amino acid (e.g., Asn for N-linked; Ser/Thr for O-linked), many enzymes have consensus sequences or motifs that enable formation of the glycosidic bond (6).

3. Protein conformation (availability)

As proteins are synthesized, they begin to fold into their nascent secondary structure, which can make specific amino acids inaccessible for glycosidic binding. Thus, the target amino acids must be conformationally accessible for glycosylation to occur.

View products

The influence that glycoproteins have on biological processes and disease states continues to expand, spurring the development of detection and analytical strategies with increasing sensitivity and throughput to better understanding their diverse structures and biochemistry. Because glycoproteins are a combination of a protein and oligosaccharides, they are more complex to analyze than non-glycosylated proteins. Additionally, the vast diversity in glycan structure and composition add an additional level of complexity to glycoprotein analysis.

Glycan staining or labeling

Because of their structure, glycan sugar moieties are not reactive to staining or labeling molecules. To overcome this problem, sugar groups can be chemically restructured with periodic acid, which oxidizes vicinal hydroxyls on sugars (especially sialic acid) to aldehydes or ketones that are then reactive to multiple dyes. The periodic acid-Schiff (PAS) stain uses this reaction to detect and quantify glycoproteins in various biological samples. Periodic acid can also be used to make sugars reactive towards crosslinkers, which can be covalently bound to labeling molecules (e.g., biotin) or immobilized support (e.g., streptavidin) for detection or purification.

View Thermo Scientific products

Lectins can be used to detect and analyze glycoprotein function (52). These glycan-binding proteins have high specificity for distinct sugar moieties. As described previously, lectins facilitate protein folding in the ER, but they are also critical for cell-cell and pathogen-cell attachment. While anti-glycan antibodies can also bind sugar moieties, lectins are used more often because they are less expensive, better characterized and more stable than antibodies (52). Like antibodies, lectins can be conjugated to probes such as horseradish peroxidase, fluorophores and biotin and immobilized to solid support including streptavidin and Thermo Scientific™ NeutrAvidin™ protein. Some of the common uses of lectins include:

Many lectins are commercially available for use in these applications. The most popular lectin is concanavalin A (ConA) from the Jackbean, but other lectins, including Jacalin, wheat germ agglutinin (WGA) and lentil lectin (LCA), are also widely available in different commercial kits.

Learn more

Glycoproteins are unique from other post-translationally modified proteins, because they are a combination of a protein portion and a glycan portion, both of which can comprise a significant proportion of the molecular weight of the molecule. Thus, glycoproteins can be analyzed as a whole or as individual components. Glycoproteomics is the global analysis of glycosylated proteins and integrates glycoprotein enrichment and proteomic analysis for the systematic identification and quantitation of glycoproteins in complex systems. This subset of proteomics differs from glycomics, which is restricted to all glycans in a system (i.e., the glycome)(53).

As with other proteomic analyses, glycoprotein identification and quantitation is performed using mass spectrometry. The basic pipeline for glycoproteomic analysis includes (53):

Glycoprotein or glycopeptides enrichment

Multidimensional separation by liquid chromatography (LC)

Tandem mass spectrometry

Data analysis via bioinformatics

This approach can be performed before or after enzymatic cleavage of glycans via endoglycanase H (endo H) or peptide-N4-(N-acetyl-beta-glucosaminyl)asparagine amidase (PNGase), depending on the type of experiment. Quantitative comparative glycoproteome analysis can be performed by differential labeling with stable isotope labeling by amino acids in cell culture (SILAC) reagents. Additionally, absolute quantitation by selected reaction monitoring (SRM) can be performed on targeted glycoproteins using isotopically labeled, "heavy" reference peptides.

Glycoproteomic analysis. Glycoproteins are first digested into glycopeptides and either analyzed directly by liquid chromatography and tandem mass spectrometry (LC-MS/MS) or first deglycosylated and then enriched for glycans or peptides prior to analysis.

Glycosylation is often characterized as a post-translational modification. While this is true with other types of glycosylation, N-glycosylation often occurs co-translationally, in that the glycan is attached to the nascent protein as it is being translated and transported into the ER. The "N" in the name of this type of glycosylation denotes that the glycans are covalently bound to the carboxamido nitrogen on asparagine (Asn or N) residues.

Because the ER is the site of translation and processing of most membrane-bound and secreted proteins, it is not surprising that most of these are N-linked glycoproteins. Besides being the most common type of glycosylation (90% of glycoproteins are N-glycosylated), N-linked glycoproteins also have large and often extensively branched glycans that undergo multiple processing steps after being bound to proteins.

N-glycosylation is conserved across eukaryotes and archae, and a considerable number of the enzymes and processes involved are also conserved across the different species (7). N-glycosylation can be broken down into separate events, as follows:

Precursor glycan assembly

Attachment

Trimming

Maturation

As described previously, different enzymes are required for each step in during glycosylation, which facilitate diversity in the glycans that are generated (8,9). But N-glycosylation initially occurs identically for all proteins, and the diversity does not manifest until the subsequent trimming and glycan maturation (7).

Precursor glycan assembly

Oligosaccharides attached via N-glycosidic linkages are derived from a 14-sugar precursor molecule comprised of N-acetylglucosamine (GlcNAc), mannose (Man) and glucose (Glc). These sugars are added consecutively onto dolichol, a polyisoprenoid lipid carrier embedded in the ER membrane (8,10). The first 7 sugars are donated from sugar nucleotides (UDP- and GDP-sugars) in the cytoplasm and bound to dolichol via a pyrophosphate linkage (-PP-). After the Man5GlcNAc2-PP-dolichol intermediate is completed, the entire complex is flipped into the lumen of the ER, after which the final 7 sugars are donated from Man- and Glc-P-dolichol molecules to make the Gcl3Man9GlcNAc2-PP-dolichol precursor glycan.

Glycan attachment

Glycosylation is often characterized as a post-translational modification. While this is true with other types of glycosylation, N-glycosylation often occurs co-translationally, in that the glycan is attached to the nascent protein as it is being translated and transported into the ER. The "N" in the name of this type of glycosylation denotes that the glycans are covalently bound to the carboxamido nitrogen on asparagine (Asn or N) residues.

Because the ER is the site of translation and processing of most membrane-bound and secreted proteins, it is not surprising that most of these are N-linked glycoproteins. Besides being the most common type of glycosylation (90% of glycoproteins are N-glycosylated), N-linked glycoproteins also have large and often extensively branched glycans that undergo multiple processing steps after being bound to proteins.

Glycan assembly and attachment. Precursor glycan synthesis begins on the cytosolic face of the endoplasmic reticulum (ER) and is completed after the structure is flipped into the ER lumen. Oligosaccharide transferase (OSTase) then transfers the precursor glycan to the Asn residue on the nascent protein.

One aspect to note is that not all Asn residues with the predicted consensus sequence are glycosylated. N- to C-terminal protein synthesis results in transport of the growing polypeptide into the ER in the same orientation, and protein folding occurs soon after the polypeptide enters the ER. Therefore, as protein folding increases, OSTase is less able to access the consensus sequence for glycan transfer. Indeed, more N-terminal Asn residues are glycosylated than C-terminal Asn residues.

Glycan trimming in the ER

Oligosaccharides are trimmed in both the ER and Golgi by glycosidases via hydrolysis. Glycan trimming in the ER, though, serves a different purpose than trimming in the Golgi.

In the ER, sugar hydrolysis is used to both monitor protein folding and indicate when proteins should be degraded. Glucosidases I and II remove 2 terminal Glc from the precursor glycan, after which calnexin and calreticulin, which are membrane-bound and soluble (respectively) sugar-binding lectins, bind to the nascent glycoprotein via the remaining Glc and act as chaperones to help the protein fold properly. The final Glc is soon hydrolyzed by glucosidase II, releasing the glycoprotein from the chaperone. Non-native-folded proteins are recognized by UDP-glucose glycoprotein glucosyltransferase, which transfers a Glc to the glycoprotein, and the protein again is bound to the lectin chaperones to facilitate proper protein folding (12). This cycle of Glc addition and removal continues until the protein is correctly folded, at which time it is not reglycosylated, and the glycoprotein is trafficked to the Golgi for further processing (13). The glycan structure for all properly folded glycoproteins that proceed to the Golgi is Man9GlcNAc2 in higher eukaryotes.

An ER-resident mannosidase (ERManI) plays a key role in identifying proteins that are unable to fold properly. Proteins that lose 3-4 mannose residues in the ER via ERManI activity are transported out of the ER and deglycosylated by glycanase N (removes the entire glycan en bloc) and delivered to ER-associated degradation (ERAD) (14,15,16,17). It is thought that ERManI acts as a timer of sorts, because it has a slow rate of mannose hydrolysis that allows nascent proteins multiple rounds of reglycosylation to attempt to fold properly before mannose residues are removed and the protein is targeted for degradation (18).

Glycan maturation in the Golgi

To this point during glycosylation, all N-linked glycoproteins have the same precursor glycan structure. Glycan processing in the Golgi apparatus combines both trimming and adding sugars to diversify the glycans on individual glycoproteins. As with precursor glycan biosynthesis, this maturation pathway to generate diverse oligosaccharides is highly ordered, such that each step is dependent upon the previous step. To this end, the Golgi segregates specific enzymes into different cisternae to facilitate this step-wise process.

Golgi enzyme compartmentalization. Enzymes that mediate glycan processing in the Golgi apparatus are segregated into distinct cisternae to ensure that glycosylation occurs in a step-wise fashion.

The final glycan structures can be broadly separated into two groups:

Complex oligosaccharides—contain multiple sugar types

High-mannose oligosaccharides—multiple mannose residues

Hybrid—branches of both high mannose and complex oligosaccharides

Glycans destined to be complex oligosaccharides are trimmed by Golgi mannosidase I and II and glycosylated by GlcNAc transferase, resulting in a common core region (7,12). The core then becomes the substrate for multiple Gtfs that consecutively transfer sugar moieties from sugar nucleotides to build variable-length and -branched oligosaccharide chains of GlcNAc, galactose (Gal), N-acetylneuraminic acid (NANA or sialic acid) and fucose. Any glycoproteins that progress through this processing from the common core stage become resistant to glycan removal by endoglycosidase H (endo H), which is used experimentally to determine if glycoproteins contain high-mannose or complex oligosaccharides.

Unlike complex oligosaccharides, high-mannose oligosaccharides do not carry other sugar moieties, although some of the Man residues are often trimmed by Golgi mannosidase I. Whether a glycan is processed into a complex oligosaccharide rather than remaining a high-mannose oligosaccharide is dependent upon the accessibility of the processing enzymes to the glycan, which can be hindered by the glycoprotein conformation. Some glycoproteins have hybrid oligosaccharides, comprising a combination of complex and high-mannose glycans

Glycan maturation. After initial trimming in the ER, the glycoprotein is trafficked to the Golgi, where Golgi mannosidase I removes multiple mannose sugars. Glycans that do not undergo further glycosylation are called high-mannose oligosaccharides. Further sugar addition and removal yields a common core oligosaccharide onto which multiple Gtfs add different sugars to generate the highly variable complex oligosaccharides. Glycan maturation beyond the common core provides endo H insensitivity. Glycans can be high-mannose, complex or a combination of both (i.e., hybrid oligosaccharide).

O-Glycosylation

While N-glycosylation is the most common glycosidic linkage, O-glycoproteins also play a key role in cell biology (19). This type of glycosylation is essential in the biosynthesis of mucins, a family of heavily O-glycosylated, high-molecular weight proteins that form mucus secretions. O-glycosylation is also critical for the formation of proteoglycan core proteins that are used to make extracellular matrix components. Additionally, antibodies are often heavily O-glycosylated.

O-glycosylation occurs post-translationally on serine and threonine side chains in the Golgi apparatus. N-glycosylation does not preclude the other from occurring, as O-glycosylation commonly occurs on glycoproteins that were N-glycosylated in the ER. Besides the different linkage, O-glycosylation also differs in the method of glycosylation. While a precursor glycan is transferred en bloc to Asn via N-glycosylation, sugars are added one-at-a-time to serine or threonine residues. O-glycosylation can also occur on hydroxylysine and hydroxyproline, oxidized forms of lysine and proline, respectively, that are found in collagen (19). Additionally, O-linked glycans usually have much simpler oligosaccharide structures than N-linked glycans.

View products

The O-glycosidic mechanism is not as complex as that of N-glycosylation. Proteins trafficked into the Golgi are most often O-glycosylated by N-acetylgalactosamine (GalNAc) transferase, which transfers a single GalNAc residue to the β-OH group of serine or threonine. To date, there is no known consensus sequence for this enzyme, although structural motifs have been characterized. Some proteins are O-glycosylated with GlcNAc, fucose, xylose, galactose or mannose, depending on the cell and species (6,20). As with N-glycosylation, sugar nucleotides are used as monosaccharide donors for O-glycosylation. Following this first sugar, a highly variable number of sugars (from only a few to greater than 10) are consecutively added to the growing glycan chain. O-glycosylation can also occur in the cytosol and nucleus to regulate gene expression or signal transduction through other Gtfs (21).

Glypiation

The covalent attachment of a glycosylphosphatidylinositol (GPI) anchor is a common post-translational modification that localizes proteins to cell membranes. This special kind of glycosylation is widely detected on surface glycoproteins in eukaryotes and some archae (22).

GPI anchors consist of a:

Phosphoethanolamine linker that binds to the C-terminus of target proteins

Glycan core structure

Phospholipid tail that anchors the structure in membrane

Both the lipid moiety of the tail and the sugar residues in the glycan core have considerable variation (23,24,25,26,27,28), demonstrating vast functional diversity that includes signal transduction, cell adhesion and immune recognition (29). GPI anchors can also be cleaved by enzymes such as phospholipase C to regulate the localization of proteins that are anchored at the plasma membrane.

Mechanism

Similar to the precursor glycan used for N-glycosylation, GPI anchor biosynthesis begins on the cytoplasmic leaflet of the ER and is completed on the luminal side. During this process, 3-4 Man and various other sugars (e.g., GlcNAc, Gal) are built onto a phosphatidylinositol (PI) molecule embedded in the membrane using sugars donated from sugar nucleotides and dolichol-P-mannose outside and inside the ER, respectively. Additionally, 2-3 phosphoethanolamine (EtN-P) linker residues are donated from phosphatidylethanolamine in the ER lumen to facilitate binding of the anchor to proteins (30,31,32,33,34).

Proteins destined to be glypiated have 2 signal sequences:

An N-terminal signal sequence that directs co-translational transport into the ER

A C-terminal signal sequence that is recognized by a GPI transamidase (GPIT) (29)

GPIT does not have a consensus sequence but instead recognizes a C-terminal sequence motif that enables it to covalently attach a GPI anchor to an amino acid in the sequence. This C-terminal sequence is embedded in the ER membrane immediately after translation, and the protein is then cleaved from the sequence and attached to a preformed GPI anchor (35,36,37).

C-Glycosylation

C-mannosylation represents a different approach to glycosylation, because the reaction forms carbon-carbon bonds rather than carbon-nitrogen or carbon-oxygen bonds. C-mannosyltransferase (c-Mtf) links C1 of mannose to C2 of the indole ring of tryptophan (38). The enzyme recognizes the specific sequence Trp-X-X-Trp and transfers a mannose residue from dolichol-P-Man to the first Trp in the sequence (39,40,41).

C-mannosylation has been detected in multiple cell lines (42) and rat liver microsomes (40). Specific proteins that are C-glycosylated include Trp2 in RNAse (43), the erythropoietin receptor and IL-12B (5). The biological function of C-glycosylation is unknown, but current research focuses on the synthesis of C-glycosylated molecules by plants, insects and bacteria for drug discovery, because they are resistant to metabolic hydrolysis.

Phosphoglycosylation

This type of post-translational modification is limited to parasites (e.g., Leishmania andTrypanosoma) and slime molds (e.g., Dictyostelium) and is characterized by the linking of glycans to serine or threonine via phosphodiester bonds (44). In some parasitic species, such as Leishmania, phosphoglycosylation is the most abundant post-translational modification (45,46) and is used to make proteophosphoglycans (PPGs), which are critical for protection against host complement (47) and promote parasite aggregation in the host (48). Similar to N-glycosylation, phosphoglycosylation occurs by transfer of a prefabricated phosphoglycan from a membrane-bound molecule via a phosphoglycosyltransferase (PTase), although the exact structure and enzyme varies by species (44).

Post-glycosylation modifications

Besides multiple types of glycosylation occurring on the same protein, glycans can be further modified to increase the diversity of glycoproteins in a given proteome. These modifications include:

Sulfation at Man and GlcNAc residues in the production of glycosaminoglycans (GAGs), which are components of proteoglycans in the extracellular matrix

Ermonval M. et al. (2001) N-glycan structure of a short-lived variant of ribophorin I expressed in the Madia214 glycosylation-defective cell line reveals the role of a mannosidase that is not ER mannosidase I in the process of glycoprotein degradation. Glycobiology. 11, 565-76.

Vainauskas S. and Menon A. K. (2006) Ethanolamine phosphate linked to the first mannose residue of glycosylphosphatidylinositol (GPI) lipids is a major feature of the GPI structure that is recognized by human GPI transamidase. J Biol Chem. 281, 38358-64.