Contents

Introduction

The vast majority of the molecules participating in Reactome pathways are proteins. Surprisingly, there is no universal authoritative source of names for proteins and no agreed vocabulary that encompasses cleaved peptide fragments or post-translationally modified forms. Reactome frequently represents a protein in many forms, perhaps as the initial translated form, as fragments following processing or cleavage, or as a peptide that has a post-translational modification. To improve naming consistency and avoid ambiguity in names we have developed a systematic nomenclature that can be used to name peptides. We also have have a simple set of rules for naming mRNA molecules, genes and small molecules

Process

The majority of peptide names have been generated by a scripted process, new peptide instances are named manually and verified at the time they are first made visible as part of a Reactome quarterly update. Some peptides are exempt from the naming process to prevent name duplications or because the peptide represents a modification or state that is not currently included in the naming process. See the Exemptions section below for more details.

Explanation Of Systematic Names

Gene symbol core

Reactome peptide names use HGNC gene symbols as the 'core' of the name. We obtain these indirectly from UniProt via the Reactome referenceEntity.

Peptide coordinates suffix

Reactome peptides refer to UniProt. Unless otherwise indicated the peptide sequence we represent is that given by UniProt's'Chain' feature, part of an annotation group called Molecular Features. This represents the 'default' peptide. When the peptide represented in Reactome is identical to the peptide represented by the UniProt Chain, the name used in Reactome is the gene symbol. If the UniProt record has no chain feature, more than one chain feature, or the start and end peptide coordinates of the Reactome peptide do not agree with the UniProt Chain, the start and end coordinates of the peptide are added in brackets as a suffix to the gene symbol. Unknown coordinates are represented as '?' symbols.

e.g. Caspase-9 precursor, with peptide coordinates start:1 end:416 is named CASP9. The large and small subunits of caspase-9 are respectively named CASP9(1-315) and CASP9(316-416).

An N-terminal fragment of Aggrecan, where the exact cleavage position is unknown would be named ACAN(17-?).

Note that Reactome peptide coordinates always refer to the UniProt peptide, even when the literature convention is to number a cleaved fragment following the removal of a signal peptide or initiating methionine. This combination of gene symbol and coordinates is usually sufficient to generate a unique name but can fail if a peptide is cleaved at multiple unknown locations. When this is the case, peptides are named manually, while following the sytematic naming as closely as possible.

Reactome annotation identifies the coordinate postions of PTMs when this is known but for brevity, most PTM prefixes do not include the modified peptide coordinate. The exceptions are di- and tri- lysine methylation, lysine acetylation, ubiquitination and phosphorylation. For these PTM types the coordinates are necessary to avoid name duplications.

PTM prefixes for phosphorylation include, when known, the coordinate and a residue letter to indicate the residue that is phosporylated. Phosphorylations are ordered by peptide coordinate.

If there are more than 4 occurrences of any PTM type, or in the case of phosphorylation subtype, the coordinates are not included, instead the prefix code is preceded by the number of occurrences and 'x'.

The Reactome database represents PTMs as modifiedResidue annotations. These use PSI-MOD terms as their primary external reference. PSI-MOD terms can be searched here. PSI-MOD terms are cross-referenced to the RESID database. The PTM prefix(es) used in Reactome lookup table (see below). Some infrequently used PTM types are not represented here.

Examples of phosphorylation prefixes:

p-Y139-DAPP1 is DAPP1 phosphorylated on tyrosine-139

p-Y150,S343,T346-WASF2 is WASF2 phosphorylated on tyrosine-150, serine-343 and threonine-346. Note that the phosphorylations are ordered by coordinate.

p-Y55,S112,S121,Y227-SPRY2 - note that the ordering is by coordinate, phosphorylations are not grouped by subtype.

p-Y-GAB2 is GAB2 phosphorylated on a tyrosine, but the coordinate position of this tyrosine is unknown.

p-GLI3 is GLI3 phosphorylated but both the subtype and position are unknown.

p-7Y-KIT is KIT phosphorylated on seven tyrosines. The coordinates are omitted from the name as there are more than 4 tyrosine phosphorylations.

Ubiquitination commences with the attachment of ubiquitin to a lysine residue, often followed by the addition of multiple ubiquitin peptides, which can be cross-linked at several positions in the ubiquitin protein.

K63polyUb-13,57-p-Y200-XYZ1 is XYZ1 with K63 cross-linked polyubiquitin attached to residues 13 and 57 and a phosphorylation on Y200.

When phosphorylation and other PTMs occur in combination, the phosphorylations are detailed last in the prefix:

2xPalmC-MyrG-p-S1177-NOS3(2-1203) is NOS3 peptide fragment 2-1203 with 2 two palmitoylated cysteines, one myristoylated glycine and a phosphorylation on serine-1177.

Exemptions

A small number of Reactome peptides do not currently follow the systematic naming described above.

Note that referenceEntity is a Reactome term describing a key external reference, from which our internal molecular records are derived. For most proteins this is UniProt.

Exemptions are made when:

The peptide has a universally understood common name. In these cases the systematic name will be retained as an alias name.

The peptide has the word 'mutant' in its name, indicating that the peptide has a disease-associated mutation.

The peptide has an annotation in the Disease field, again indicating that it is an abnormal peptide associated with a disease process.

The referenceEntity is a referenceIsoform with variantIdentifier > 1. This avoids applying coordinates for the canonical peptide to an isoform.

The peptide has a modification that is not a simple modifiedResidue instance. This applies to peptides with unusual modifiedResidue types such as GroupModifiedResidues and Internal peptide crosslinks.

The peptide name contains the word 'active', which is used in Reactome to indicate a peptide that has an active conformation, but has a peptide chain that is identical to an inactive precursor.