User's manual

The IARC TP53 database contains exclusively TP53 mutations associated with human cancers that have been identified by
sequencing and published in the peer-reviewed literature or compiled in mutation data repositories.

If you dont' find the answer to your question on this page or on the FAQ page, do not hesitate to contact us at
.

Datasets and their annotations

Gene variations

geneVariationIARC TP53 Database, R17.txt

Column head

Description

MUT_ID

Unique identifier of each gene variation reported in the database.
This identifier is used in all datasets (somatic, polymorhisms, germline).

hg18_Chr17_coordinates

Chromosome coordinate of mutation: start position based on hg18 human genome assembly.

hg19_Chr17_coordinates

Chromosome coordinate of mutation: start position based on hg19 human genome assembly.

hg38_Chr17_coordinates

Chromosome coordinate of mutation: start position based on hg38 human genome assembly.

ExonIntron

Location of the mutation in the introns or exons in TP53 gene.
Terms occurring in this column are "1-intron"' to "11-intron" and "2-exon"
to "11-exon". An "i" or "e" in front mean that the mutation is located within the indicated intron
or exon with no information on the precise location.

Genomic_nt

Nucleotide numbering based on the Genbank
NC_000017.10 reference sequence for TP53 gene.

Codon_number

For mutations in exons, codon number
at which the mutation is located (1-393). If a mutation spans more
than one codon, (e.g. tandem mutation or deletion of several
bases) only the first (5') codon is entered. For mutations in
introns, 0 is entered.

Description

For substitutions, nucleotide
change read from the coding strand by convention. For
deletions and insertions, the number of bases deleted (del) or
inserted (ins) is given. For more complex mutation events, a full
description is given as indicated in the original publication.
Note that the annotation system has been modified on February 7th, 2007
to fix errors that were previously generated in the dowloaded files.

g_description

Mutation nomenclature according to HGVS standards and using the GenBank NC_000017.10 (hg19 assembly) genomic sequence as reference.

c_description

Mutation nomenclature according to HGVS standards and using the NM_000546.4 coding sequence as reference.

ProtDescription

Mutation description at the protein level as recommended by HGVS and using the Uniprot reference sequence P04637.

Splice_site

Annotation on the position of the mutation within conserved nucleotides of p53 consensus, criptic or alternative splice sites:
consensus SD or SA= the mutation is located at conserved dinucleotides involved in p53 consensus splice sites
(SD for splice donor site, SA for splice acceptor site) producing the full-lenght p53 protein (TA isoform);
criptic SD or SA= the mutation is located at conserved dinucleotides involved in splice sites (gt or ag)
that have been observed experimentally in p53 sequences carrying mutated consensus splice sites;
alternative SD or SA= conserved dinucleotides involved in splice sites (gt or ag)
responsible for producing p53 isoforms beta and gamma;
alternative = mutated nucleotides are in the "cassette" sequence responsible for producing the p53 delta isoform;
no= the position is outside the above mentioned nucleotides.
Information on splice site can be found here.

CpG_site

Yes or No indicate if the position of the mutation falls within a CpG
sequence or not. To see the position of all CpG sites in TP53 coding
sequence, click here.

Sequence context

Trinucleotide sequence context of variants. The 5' base and 3' base of the start position of the variant are indicated on the left and right respectively of the mutated base.
This context is provided on the coding strand of the gene sequence and is based on hg19 TP53 sequence.

Type

Nature of the mutation. The terms
occurring in this column are "A:T>C:G" (A to C or T
to G base change), "A:T>G:C" (A to G or T to C base
change), "A:T>T:A" (A to T or T to A base change),
"G:C>A:T" (G to A or C to T base change at non CpG
sites), "G:C>A:T at CpG" (G to A or C to T base
change at CpG sites),
"G:C>C:G" (G to C or C to G base change),
"G:C>T:A" (G to T or C to A base change),
"tandem" (two consecutive base changes), "ins"
(insertion), "del" (deletion) and "complex"
(complex changes).

WT_nucleotide

Wild-type base at the position of the mutation.

Mutant_nucleotide

Mutant base, described on the coding strand.

Mut_rate

Substitution rates were calculated for all single base substitutions in the coding sequence of p53 according to the
dinucleotide substitution rates derived from human-mouse aligned sequences of chromosomes 21 and 10
(Lunter and Hein 2004).
The mutation probabilities for a given single nucleotide substitution are calculated by averaging the dinucleotide substitution
rates at that position for the forward and reverse strands.

WT_codon

For mutations in exons, normal base sequence of the codon in which the mutation occurred.

Mutated amino acid encoded at the
codon in which the mutation occurred (three-letter or one-letter amino acid
abbreviation). The chain terminating mutations due to single base
substitutions are designated by "stop".
Check AA letter code and
Genetic code

Mut_rateAA

Mutation rate of amino-acid substitution calculated by summing up the nucleotide substitution
rates. This value is only valid for amino-acid substitutions
resulting from single nucleotide substitutions.

Effect

Effect of the mutation. The terms
occurring in this column are: missense (change of one amino-acid), nonsense (stop codon), FS
(frameshift), silent (no change in the protein sequence),
splice (mutations located in the two first and two last conserved nucleotides of the introns and are thus predicted to alter splicing,
or mutations that have been shown to alter splicing experimentally), other (inframe deletions or insertions, mutations
in introns)

Polymorphism

Polymorphic status of the gene variation. Validated : reported in ESP5400, 1000G, or other populations databases (dbSNP138 data) at MAF>0.001; No : not reported or reported at MAF<0.001 in ESP5400, 1000G, or other populations databases (dbSNP138 data); Na : not applicable.

SNPlink

Link to NCBI SNP database.

Residue function

Known function of the wild-type residue. When the function is not known but the structure
is known, the solvent accessibility (SA) of the residue is indicated by the terms buried, exposed or partially exposed
(SA calculated with Naccess and 1TSR (chain B) structure of p53:
<20 = buried, >
=20 and
<50 = partially exposed, >
=50 = exposed).

Solvent accessibility of the wild-type residue as calculated with
Naccess
and the 1TSR (chain B) structure of p53.

GV

Grantham variation. GV is a measure of the amount of observed biochemical
variation at a particular position in a multiple sequence alignment.
GV was calculated with an
alignment containing 9 sequences of p53 from fish to placental mammals with
the A-GVGD program. More details here.

GD

Grantham deviation. GD is a measure of the deviation of the mutated residue from the different types of residues observed
at that postion in a multiple sequence alignment. It is derived from GV and the Grantham matrix. More details
here.

AGVGD class

Prediction of functional impact based on protein sequence conservation,
taking into acount GV and GD (see above). Mutations are classified as "neutral",
"deleterious" or "unclassified". More details here.

SIFT class

Functional classification based on SIFT program
using the same sequence alignement as for AGVGDclass (see details here) and program default settings.
Missense mutations are classified as "deleterious" or "neutral".

Functional classification based on the overall transcriptional activity (TA) on 8 different promoters as measured by Kato et al.
For each mutant, the median of the 8 promoter-specific activities (expressed as percent of the wild-type protein) is calculated and missense mutations are classified as "non-functional" if the median is <=20,
"partially functional" if the median is >20 and <=75, "functional" if the median is >75 and <=140, and "supertrans" if the median is >140.

Structure/Function class

Functional predictions derived from a computer model that takes into account
the 3D structure of WT and mutant proteins and is trained on the transactivaton dataset from
Kato et al. Mutations are classified as "functional" or "non-functional". More details
here.

DNE class

Dominant-negative (DN) Effect on transactivation by wild-type p53.
Classification established for mutants for which available DN activity on more than
2 p53-response elements is available. Data are based on WAF1 and RGC promoters in various studies
(these promoters were the most frequently used in different studies to assess DNE status), and on
two large systematic study (
Dearth et al that includes 76 mutants;
Monti et al that includes 104 mutants).
Mutants were classified as "Yes" if they had dominant-negative activity on both WAF1 and
RGC promoters, or on all promoters in the large studies, "Moderate" if they had dominant-negative
activity on some but not all promoters, and "No" if they had no dominant-negative
activity on both WAF1 and RGC promoters, or none of the promoters in the large studies.

SwissProt identification number with link to the variant page of the SwissProt database.

Somatic_count

Number of occurence in the somatic dataset (number of tumors reported to carry this somatic mutation). Total count is 29893 in R18.

Germline_count

Number of occurence in the germline dataset (number of tumors in confirmed carriers of this germline mutation). Total count is 1644 in R18.

CellLine_count

Number of occurence in the Cell-line dataset (number of cell-lines reported to carry this mutation). Total count is 2765 in R18.

Predicted effect on splicing:

Column head

Description

Site Type

Indicate if the predicted splice site is an acceptor site or donor site.

p53 Site

Indicate if the predicted splice site correspond to a canonical p53 splice site.

WT score

Fit score of the predicted splice site for the non-mutated sequence (scores are specific of prediction tools).

MUT score

Fit score of the predicted splice site for the mutated sequence (scores are specific of prediction tools).

Variation

Predicted effect of the mutation on the predicted splice site.

Source

Prediction tools used.

Predicted effect on p53 protein isoforms:

The predictions provided are based on whether the mutation falls within the specific isoform. For a description of p53 isoforms, see here.

Column head

Description

TAp53alpha

Indicate if the mutation fall within the canonical isoform coding for the full length p53 protein.

TAp53beta....deltap53alpha

Indicate if the mutation fall within the specified isoform.

Somatic mutations found in human tumor samples

somaticMutationDataIARC TP53 Database, R17.txt

This dataset contains TP53 somatic mutations identified in human tumor samples (including metastasis and cell-lines).
It includes data on the type and position of mutations, detailed information on the tumor in which the mutations have been found,
and on various characteristics of the patients in which the tumor developed.

Each row in the downloaded tab-delimited text file represents a single mutation reported in a tumor sample with an arbitrarily assigned unique identification number.
A unique identification number is also attributed to the tumor sample and to the patient. Table content is as follows:

Column head

Description

The first set of columns describe the mutation.

Mutation_ID

Unique identification number for a Sample/Mutation association.
Tandem mutations (two adjacent base substitutions) are considered as one mutation event;
therefore tandem mutations have only one identification number and are a single record.

The second set of columns are assigned to the description of the organ site, tissue and type of
lesion in which the mutation has been identified. The descriptions given in the publication are translated into the standards
of the International Classification of Diseases for Oncology (ICD-O 3rd Edition, World Health Organization, Geneva, 2000)
and SNOMED.
For information on tumor classification, grading and staging, check out
ICD-O training at SEER,
Cancer
Information at NCI and Oncologychannel.com.

Sample_Name

A sample name is assigned as follows: first 3 letters of the first author's name, year of publication (2 digit), followed by the ID number indicated in the publication.
The same name or number can occur several times as in some samples more than one mutation has been reported.

Sample_ID

Unique sample identification number. This number allows the automatic retrieval of samples with
multiple mutations.

p53 immunostaining graded as ‘positive’, ‘negative’ or ‘+/-‘. ND stands for not done.

Add_Info

Any relevant additional information is entered here.

The third set of columns are assigned to the description of the patient origin and life-style.
They contain heterogeneous notes, usually comments emphasized by authors reporting the mutations.
It should be noted that this information is generally qualitative. No quantitative information on exposure of risk
factors is included in the database. This information does not presuppose that a formal, causal link has been established between
such factors and the mutation described. Moreover, for most exogenous risk factors, individual exposure has not been
monitored. This information is given solely to (i) permit the retrieval of mutations found in patients
belonging to defined groups or having specific risk factors, and (ii) facilitate access to the corresponding publications. For
detailed comparison between exposure groups, users are invited to perform their own analysis based on the information given in the
original publication.

Individual_ID

Unique identification number for an individual included in the database. It is automatically assigned
by the database system.

Information on the presence or absence of cancers in the family of the patient.

Tobacco

Information on the smoking status of the patient. Terms occurring in this column are 'smoker' (with
qualitative amount in brackets), 'non-smoker', 'passive-smoker' and chewer.

Alcohol

Information on the drinking status of the patient. Terms occurring in this column are ‘drinker’
(with qualitative amount in brackets), and 'non-drinker'.

Exposure

Risk factors to which the patient has been exposed to, such as aflatoxins, radon, thorotrast, etc…

Infectious_agent

Pathogen (virus or bacteria) detected in the patient.

Ref_ID

Unique identification number for the reference in which the mutation is described.

PubMed

PubMed reference number provided by NCI.

Exclude_analysis

Studies that we recommend to exclude from any analysis because of dubious quality.
Such studies are identified based on the following criteria: they report several samples with
multiple mutations, and/or a high proportion of rare variants or variants classified as functional.

somaticMutationReference-IARC TP53 Database, R17.txt

This file lists the publications in which are described the mutations
and gives the method used to detect the mutations. Each row (record) represents a
citation with an arbitrarily assigned unique identification number
(Ref_ID). See standardized annotations for the
description of the column content.

Prevalence of TP53 somatic mutations by tumor site

prevalenceSomaticIARC TP53 Database, R17.txt

This dataset contains information on the proportion of tumors that carry a somatic TP53
mutation extracted from publications contained in the Somatic dataset, and in additional publications that do not give a
detailed description of the mutations (many studies do not provide detailed information on each mutation detected
but rather report their results in the form of summary tables, preventing their inclusion in the somaticMutation dataset), or publications reporting negative results (no mutation found, thus not included in the somaticMutation dataset).

For each study, the total number of tumors or tissue samples analyzed, and the number of these samples which were found to contain
a mutation is provided.The reference paper, method of mutation detection and country of origin of the patients are also indicated.
When the same research team published several papers that describe the same set of samples, data from the most recent or more complete paper are used.

Studies that we recommend to exclude from any analysis because of data quality issues.
Studies are labeled as 'exclude' if: they report several samples with multiple mutations
in patients with no specific genetic background or exposure to mutagen;
they report more variants that are classified as functional or partially functional (based on TA class) than variants classified as non-functional;
mutations are not precisely described and can not be fully annotated in the database;
several mutations in the series are reported with errors (such as position and base that do not fit, or report of neutral polymorphisms as somatic mutations).

Prevalence of the R249S TP53 mutations in liver cancer

This dataset contains data on the prevalence of the c.747G>T (p.R249S) mutation in liver cancers.
It includes studies that have screened at least exon 7 of TP53 by sequencing, and studies that have searched for this specific mutation by RFLP.
The presence of this mutation in hepatocellular carcinomas has been linked to exposure to aflatoxins and HBV, and may thus constitutes a
biomarker of exposure.

The file R249SprevalenceLiver-IARC_TP53_Database_R18.zip contains a Disclaimer,
and a tab-delimited text file, TP53_R249S_R18.txt.

Number of tumor samples analyzed for the c.747G>T (p.R249S) TP53 mutation.

Count_R249S

Number of tumor samples containing the c.747G>T (p.R249S) TP53 mutation.

Remark

Any relevant information.

Method

Comment on method if different from sequencing.

Prognostic value of TP53 somatic mutations

prognosisSomatic-IARC TP53 Database,R17

This dataset includes information on all studies that have analyzed the
relationship between p53 mutations and cancer prognosis. For each study,
the patient cohort, study settings and a summary of the results are
described. When the same research team published several papers with
increasing number of patients, the most recent paper with the largest
dataset is used.

Many of these studies do not provide detailed information on each
mutation detected but rather report their results in the form of summary
tables. Such publications have been included in the prognosis dataset but not in the somaticMutation dataset.
For some of them, the mutations have been published in a previous paper and can be retrieved with the
Cross_Ref_ID study identifier (see below).

Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple mutations, and/or a high proportion of rare variants or variants classified as neutral or functional).

Germline mutations in LFS/LFL families

germlineMutationData-IARC TP53 Database,R17

The dataset of germline mutations contains information on families in which at least one family member
has been identied as a carrier of a germline mutation in the TP53 gene. Criteria for inclusion are the following:
a) individuals carrying a sequenced TP53 germline mutation, affected or not by a cancer, b) individuals affected by a cancer and
belonging to a family in which at least one family member has been identified as a carrier of a germline mutation in the TP53 gene.

Each row (record) in the downloaded file represents a tumor found in an individual having a TP53
germline mutation. The file contains the following information:

Column head

Description

Family_ID

Unique family identification number.

Family_Code

Name or number given in the original
publication or an arbitrarily-assigned name, usually the 3 first
letters of the first author's name and the publication date.

Family classification:LFS = strict clinical definition of Li-Fraumeni
syndrome (defined by Li and Fraumeni as a Proband with sarcoma
<45 years with a first degree relative with cancer at <45
and another first/second degree relative with cancer at <45 or
sarcoma at any age);
LFL = Li-Fraumeni like for the extended clinical
definition of Li-Fraumeni (including Birch definition:
proband with any childhood cancer or sarcoma, brain tumor or
adrenocortical carcinoma at <45 years, with one first or second
degree relative with sarcoma, breast cancer, brain tumor,
leukemia, or adrenocortical carcinoma at any age, plus one first
or second degree relative in the same lineage with any cancer
diagnosed under age 60; Eeles definition E1: two different tumors
which are part of extended LFS in first or second degree relatives
at any age (sarcoma, breast cancer, brain tumor, leukemia,
adrenocortical tumor, melanoma, prostate cancer, pancreatic
cancer); Eeles definition E2: sarcoma at any age in the proband
with two of the following (two of the tumors may be in the same
individual): breast cancer at <50 years and/or brain tumor,
leukemia, adrenocortical tumor, melanoma, prostate cancer,
pancreatic cancer at <60 years or sarcoma at any age).
FH: family history of cancer which does not
fulfil LFS or any of the LFL definitions (Birch, Eeles E1 or E2,);
No FH: no family history of cancer.; FH= Family
history of cancer (not fulfilling the definition of LFS/LFL); No=
no family history of cancer; ?= unknown.

Unique identification number for an individual included in the database.
It is automatically assigned by the database system.

Individual_code

Code or number given in the original publication or an arbitrarily-asigned code,
usually the family code followed by the position of the individual in the family tree.

FamilyCase

Family case in the pedigree, such as proband (index case), mother, father,....

FamilyCase_group

Degree of relationship to the proband.

Sex

Gender of the individual.

Germline_carrier

TP53 mutation status of the individual: confirmed= the individual has been tested for the presence of the mutation and the
mutation has been found; obligatory= the individual has not been tested for the presence of the mutation but must be carrier based on
the mutation status of the other individuals in the pedigree; 50%prob.= there is a chance of 50% that the individual is a mutation carrier;
negative= the individual has been tested for the presence of the mutation and the mutation has not been found; NA=
the individual has not been tested for the presence of the mutation.

Mode_of_inheritance

Parent from which the mutation has
been inherited. P=paternal, M=maternal, M&P=maternal and paternal,
de novo= new mutation that has not been inherited, na=not known.

Dead

Living status of the individual at
time of follow-up. 0=alive; 1=dead

Unaffected

Disease status of the individual at time of follow-up. 0 = affected by cancer; 1 = not affected by
cancer.

Reference number indicating the
publication in which the mutation is described. This number
corresponds to the Ref_ID number in the GermlineRefR17 file.

germlineMutationReference-IARC TP53 Database,R17

Each row represents a reference identified by a unique identification
number (Ref_ID). See standard annotations
for a description of the column content.

germlinePrevalenceIARC TP53 Database, R17

This dataset includes studies reporting TP53 germline mutation screening in large cohorts of patients selected based on family history of not (sporadic cancer cases).
Each row represents the result of the analysis of TP53 germline mutation status in a selected cohort.

Column head

Description

Diagnosis

Tumor site or clinical description of the selected cohort.

Cohort

Detailed criteria for patient selection.

Cases analyzed

Number of patients included in the mutation screen.

Cases mutated

Number of patients found to carry a TP53 mutation.
Details on mutations can be found in the dataset of germline mutations when the information was provided, but many studies
do not provide a detailed list of mutations.

Mutation prevalence

Percent of mutated cases.

Remark

Any further information on the cohort or method.

PubMed

PubMed ID with link to ncbi database.

Functional activities of missense mutations

Data on the biological properties of p53 mutant proteins in functional assays performed in yeast or human cells, are provided in two datasets.

functionalAssessment-IARC TP53 Database, R17

In this dataset, data were extracted from publications that report functional assessment of p53 mutant proteins
in human or yeast cells, assessed either by transfection and overexpression of mutant proteins,
or by assessment of endogenous mutants. Comparison between mutants requires caution since
functional assays differ from one study to the other, in particular with respect to
the expression vector (which influences the level of expression of the mutant protein),
the p53-responsive elements (generic consensus sequence versus gene-specific
response elements from WAF1, BAX or PIG3), and the recipient cells that have been used.

The functional properties of mutant proteins that are included in this dataset are:

The functional results have been organised in 5 columns for (1) conserved
wild-type properties, (2) complete or partial loss of wild-type properties, (3) dominant-negative effects, (4) gain
of function and (5) temperature sensitivity. The cell system is indicated in two columns and a detailed reference to the
published report is given.

Functional property of mutant that is similar to the activity of the wild-type protein.

- Activities of mutant proteins in human or yeast cells:DNAb = DNA binding capacity tested by gel-shift or ChIP assay;TA = transactivation of a reporter gene under the control of a p53-response element
(indicated in brakets, see list here);TR = transrepression of a reporter gene under the control of a gene-specific response-element
(name of gene indicated in brakets)TETR = capacity to form tetramers;x binding = interaction with protein x;

drug sensitivity = conserved capacity to mediate cytotoxic effect of drug (specific drug used is indicated, see List of abbreviations).

- Biological effect after over-expression in mouse or rat embryonic fibroblasts:TRANSF- = ability to counteract the transformation of primary cells induced by the co-transfection of ras or another
transformant oncogene, such as HPV E7;

"super" indicates that the activity of mutant protein is higher than the one of wtp53 (on transactivation, induction of apoptosis, DNA binding or growth suppression).

Loss of Function

WTp53 functional property that is lost by the mutant protein.
Same annotations as in previous column, with partial" indicating that the loss of function is partial (residual activity).

Dominant negative activity

Inhibition of the wild-type protein by
mutant proteins in transactivation or cell growth assays.
- Yes = the mutant protein counteract the activity of the wild-type
protein when the two proteins are co-expressed in human or yeast
cells (the p53-response element or cell growth assay performed is indicated in brakets);
- No = the mutant protein does not counteract the effects of the
wild-type protein.

"moderate" indicates that the mutant protein has a partial inhibiting effect on
the wild-type protein.

Gain of Function

Functional properties displayed by the
mutant but not by the wild-type protein.

- Biological effect after over-expression in mouse or rat embryonic fibroblasts:TRANSF+ = ability to cooperate with ras or another transformant oncogene, such as HPV E7, in the transformation of
primary cells.

"moderate" indicates that the mutant protein has a partial effect on the activity studied;
"no" indicates that the mutant protein has no effect on the activity studied.

Temperature sensitivity

Sensitivity of mutant to temperature changes in transactivation assays
(the p53-RE is indicated in brackets), and in other experimental assays (specified in brackets).Yes = the activity of the mutant protein is affected by the
temperature at which is preformed the test;mut_H = the protein is inactive (mutant) at higher temperatures;mut_L = the protein is inactive (mutant) at lower temperatures;No = the activity of the mutant protein is NOT affected by the temperature at which is preformed the test.

Note that functional tests are performed at different temperature in yeast (30ÝC) and human (37ÝC) cells.
Detailed annotation rules are available here.

Temp_ref

Temperature at which experiments have been performed or which has been used as reference for temperature sensitivity assays.

Cell assay

Human = the activity of the mutant
protein has been tested in human cells.
Yeast = the activity of the mutant protein has been tested in the
yeast.

cellLines

Name of cell-line(s) that have been
used for testing mutant activities. "(endo)" indicates that activities have been tested on endogenous mutants.

Assay design

Indicates if the assay has been performed with or without wtp53 as control, or if activity has been tested on endogenous mutant.

Method

Details on type of experimental assay that was performed to assess function.

The list of polymorphisms shown in the table includes gene variations that have been validated by frequency in unaffected human populations
and reported in scientific publications or in the SNP databases dbSNP, 1000G, and ESP6500.

Link to the p53Knowledgebase that provides information on population frequencies and association with disease for the selected SNP.

PubMed

Link to PubMed of the publication that first described the SNP (where applicable).

Mouse models with engineered TP3

The dataset contains mouse models with engineered p53 that are compiled in the
caMOD database or reported in the scientific literature.
Data curated at caMOD were courteously provided by the caMOD team. A direct link to the caMOD web site is available for each model for a detailed
description of model genetics and phenotypes. Data reported in the literature but not compiled in caMOD are curated at IARC and a link to PubMed
abstract is provided. For a detailed description of model genetics and phenotypes, please refer to caMOD and/or original publication

Mouse ModelsIARC TP53 Database, R17

Column head

Description

Model descriptor

Model name as indicated in caMOD or original publication.

Affected organs

List of organs affected or targeted by transgene.

AA change in human

Amino-acid substitution. Note that amino-acids are numbered according to the human sequence.

caMOD link

Model ID from caMOD database with direct link to caMOD database.

PubMed

PMID with link to PubMed abstract that reported the data.

Experimentally induced mutations

This dataset contains list of mutations in the human TP53 gene obtained from
mutagenicity assays in the Hupki mouse model (MEF cells treated with the indicated carcinogen agent) or in
a yeast assay. See original papers for detailed methods.

Nature of the sample from which the
mutation has been identified: cell-line, surgery (surgical or
autopsy specimen, including fresh samples and archival, pathology
specimen), biopsy, xenograft, body fluid (blood, saliva,
urine...).

Tumor_Origin

Origin of the tumor sample. Terms
occurring in this column are: primary, secondary (second primary
tumor in the same patient), metastasis (with the localisation of
the metastasis in brackets), recurrent (tumor recurrence).

Topography

Site of the tumor defined by organ or
group of organs, according to the ICD-O nomenclature. (examples:
"colon", "brain", "bronchus and
lung"). Note that some tumors are annoted
"Head&Neck,NOS" or "Colorectum,NOS"
because no detail is given in the original publication (NOS= not
otherwise specified).
For the database search tool, a short name is used in place of the
ICD-O name (example: "Lung" for "bronchus and
lung"). See a
numerical list of topographies.For metastasis, the topography corresponds to the primary
site of the tumor and the site of metastasis is indicated in
brackets in the tumor_origin field.

Precise identification of anatomic
site, organ or tissue. The description given in the publication is
translated to ICD-O nomenclature.

Morphology

Tumor type, including morphology
and/or histologic type. The terminology used is based on ICD-O (2nd
and 3rd editions) and SNOMED classifications. Terms have been
added, such as 'normal tissue' or 'na'. See
alphabetical list of morphologies.

Papers that we recommend to exclude from analysis because of dubious data quality (report several samples with multiple mutations, and/or a high proportion of rare variants or variants classified as neutral or functional).

WGS_WXS

Whole genome or whole exome sequencing study.

Mutation detection method

Column head

Description

Tissue_processing

Indicates if the sample analysed was
fresh, fixed or frozen.

Start_material

Indicates if DNA or RNA was screened for mutations.

Prescreening/Method

Prescreening method used to select sample to be sequenced:
‘SSCP’ for single strand polymorphism, ‘DGE’ for
denaturant gel electrophoresis, ‘FASAY’ for yeast assay,
‘none’ if no prescreening was done, etc…

Material_sequenced

Indicates if the DNA or RNA was cloned or not (direct) before sequencing.

Exon2-11

Exons that have been screened for mutation. In the downloaded file,
"-1" or "TRUE" indicate that the exon has been screened and "0" or "FALSE" indicate that it has not been screened.

Gene variations.
This option allows the functional and structural analysis of all possible single nucleotide substitutions in TP53
exonic sequences (including those that have never been reported in cancer). In addition, all other types of mutations that have
been reported in human samples and validated polymorphisms are included in this dataset. Functional and structural
annotations and frequency statistics for these gene variations can be retrieved with this search option.
Each dataset entry corresponds to a unique gene variation.

Somatic mutations.
This option allows the retrieval and analysis of TP53 mutations reported as somatic events in tumor samples and cell-lines.
Each dataset entry corresponds to a mutation identified in a human sample.

Somatic mutation prevalence.
This option allows the analysis of the prevalence of TP53 somatic mutations by cancer type and population groups.
Each dataset entry corresponds to the prevalence of TP53 mutation for a specific type of cancer in a defined human population.

Germline mutations.
This option allows the retrieval and analysis of TP53 mutations reported as germline events in human individuals.
Each dataset entry corresponds to a tumor identified in an individual carrier of a TP53 germline mutation.
The searchable dataset only includes cancer-affected individuals who are confirmed or obligatory carrier of a TP53 mutation
(data on non-affected carriers or non-confirmed carrier can be retrieved by downloading the full dataset with the 'data downloads' option).

Cell-lines.
This option allows the retrieval and analysis of TP53 mutations reported in human cell-lines.
Each dataset entry corresponds to a mutation identified in a cell-line.

Mouse models.
This option allows the display or download of the description of mouse models with engineered p53 that are compiled in
the caMOD database or reported in the scientific literature. Links to caMOD database are available for further details on the model phenotypes.

Mutation distribution graphs

Mutation type. Proportion of mutations classified by their nature (base change, insertions, deletions....):
number of mutations of each class divided by the total number of mutations selected (% is shown).

Codon distribution. Proportion of exonic point mutations at each codon position:
number of mutations at each codon position divided by the total number of exonic mutations selected (% is shown).

Exon/intron distribution. Proportion of mutations in each exon/intron:
number of mutations within each Exon/intron divided by the total number of mutations selected (% is shown).

3D JMOL graph. Residues (within the central domain of p53 protein -codons 96 to 289) are highlighted according to the proportion of exonic mutations at this position (start site of mutation) among all selected mutations:
number of mutations at each codon position divided by the total number of exonic mutations selected: red colored are the most frequently mutated, yellow colored the less frequently mutated, orange are intermediate.

Mutation effect. Proportion of mutations classified according to their predicted effect on protein sequence (missense, nonsense, frameshift ins/del, …):
number of mutations of each class divided by the total number of mutations selected (% is shown).

Point mutation. Proportion of single amino-acid substitutions classified according to their predicted effect on protein sequence (missense, nonsense, silent):
number of mutations of each class divided by the total number of point mutations selected (% is shown).

Point mutation dot-plot. Each dot represent a specific point mutation, colored according to their predicted effect on protein sequence (missense in blue, nonsense in red and silent in green);
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).

SIFT. Proportion of missense mutations classified according to their predicted deleterious/neutral effect based on SIFT algorithm using this MSA:
number of mutations of each class divided by the total number of missense mutations selected (% is shown).

SIFT dot-plot. Each dot represent a specific point mutation, colored according to their predicted deleterious/neutral effect based on SIFT algorithm;
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).

Transactivation. Proportion of missense mutations classified according to their experimentally measured transactivation activities (based on FASAY):
number of mutations of each class divided by the total number of missense mutations selected (% is shown).

Transactivation dot-plot. Each dot represent a specific point mutation, colored according to their experimentally measured transactivation activities;
the x axis shows the proportion of the specific mutation in the selected dataset (% of total point mutations in the selected dataset); the Y axis shows the predicted mutation rate for the particular point mutation (see mutation annotations).

Tumor distribution graphs

Germline data. Distribution of tumor sites associated with the selected mutations;
number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected mutations (% is shown).

Somatic data. Distribution of tumor sites associated with the selected mutations;
number of mutations classified by tumor site divided by total number of mutations observed in tumors carrying the selected mutations (% is shown).

Gene variation data, somatic graph. Proportion of the selected mutations among all mutations reported in the database by tumor sites;
number of selected mutations classified by tumor site divided by total number of mutations in the database for each tumor sites (% is shown).

Gene variation data, germline graph. Distribution of tumor sites associated with the selected mutations;
number of tumors classified by tumor site divided by total number of tumors observed in individuals carriers of the selected mutations (% is shown).

Mutation prevalence. Proportion of mutated samples by cancer site (topography graph), cancer type (morphology graph), or by country of origin of the patients (country graph);
number of mutated samples divided by total number of samples analyzed (% is shown).