Where do I find the description of the annotations used in the database?

Mutations screened from RNA

All mutations are annotated in the database at the genomic level.
For mutations identified from RNA screening, annotations may not be accurate.
For example, a mutation described as a deletion of exon 5 at RNA level might in
fact be a point mutation located in a splice site at the genomic level
(inducing skiping of an exon). It is of note that this concerns only a small
fraction of the data included in the database. You may exclude studies that
have screened RNA by using the 'Start material' control.

Description of deletions and insertions

The exact location of deletions, insertions and complex variations are often
poorly described in original reports (often reported at the codon but not
genomic level). Annotations for these mutations are thus not precise since we
annotate mutations at the genomic level. For example, if a deletion is described as a
deletion of one nucleotide at codon 158, it is entered in the database as deletion of the
first nucleotide of codon 158 while it may in fact be the second or third nucleotide that is
actualy deleted. This information is thus only reliable at the codon level.

Data analysis

How can I perform custom analyses?

The analyses that can be performed with the web based tools can not cover all user needs.
To perform other types of analysis, all datasets are fully
downloadable and annotation details are provided in the User's manual.

How to retrieve a specific p53 mutation frequency and residual activity?

Select the "Gene variation" option; select a specific mutation
with available criteria and click on "Go"

Where can I find reports on database analyses?

Several analyses of different datasets of the IARC TP53
database have been published. PubMed links are provided here.

How to retrieve the type of somatic mutations found in a specific type of cancer?

Select the Somatic mutation' option and select a specific type of
tumor with available criteria (ICD-O 3rd Topography and Morphology classification systems are used for describing tumor types)
is used for tumor classification). The 'Mutation distribution' button leads to a results page with different
types of graphics showing mutation spectra for the selected tumor type.

How graphs are calculated?

Several types of graphs are drawn from selected data. See user's manual. for details on graph calculations.

How to retrieve the TP53 status of a specific cell-line?

Select the Cell lines option; enter the name, ATCC_ID or characterisitics
of a specific cell-line with available criteria; click on the "GO" button.

How to retrieve the summary of annotations contained in the database for a specific mutation?

Select the Gene variation option; select a specific mutation
with available criteria and click on "Go". On the result page, click on 'More features'.

How to retrieve the functional properties and biological activities of specific mutants?

Select the Gene variation option; select a specific mutation
with available criteria and click on "Go". On the result panel, click on 'More features'.

How to retrieve the type of tumors in which a specific somatic mutation occurs?

Select the Gene variation option; select a specific mutation with
available criteria and click on "Go". On the result page, click on 'Tumor distribution'

How to retrieve the type of tumors associated with a specific germline mutation?

Select the Germline mutations option; select a specific mutation with available criteria and click on "Tumor distribution".

How to retrieve the frequency of somatically mutated samples (mutation prevalence) for (a) specific type(s) of cancer?

Select the Somatic mutation' option; select a specific type of cancer
(and optionally a population and mutation detection method) and click on the "Go" button.

Database contents

Why some mutations present in the somatic dataset are not retrieved with the 'Gene variation' search option?

There may be two reasons for this: (1) Mutations retrieved with this option only include gene variation that are
fully described, while in the dataset of somatic mutations some mutations are not fully described.
(2) Somatic mutations may be reported in individuals with different SNP status. If a mutation is
close to a SNP, it may have a different impact on the protein sequence depending on the SNP status.
For example, the mutation c.637C>T on the first base of codon 213, will result in a p.R213X change in the protein
sequence if the SNP present on the third base of the codon is a A (CGA>TGA), while it will result in a
p.R213W change if the SNP present on the third base of the codon is a G (CGG>TGG). Mutations not described from the reference
sequence are included in the somatic dataset, but not in the Gene variation dataset.

Data in the prevalence dataset are "independent" from data included
in the somatic dataset. Numbers differ for the following reasons:

Because we retrieve data from papers, and in many papers the only information that can be
extracted is the total number of samples analyzed and total number of samples mutated (mutations
are not described in details), mutations can not be included in the somatic dataset. Thus, numbers
in the prevalence table do not match numbers in the somatic dataset (mutation spectrum).

Numbers by histologies may also differ, as for example, a paper may contain mutation details for lung
ADC, SCC, LCC (which are all non-small cell lung cancers), but total numbers of samples analyzed for each
histology is not available. In this case, mutations corresponding to ADC, SCC and LCC will be entered in
the somatic dataset but in the prevalence table the prevalence will be indicated only for non-small cell
carcinoma (group that includes the 3 tumor types).

Cell-lines are not included in the prevalence count.

Samples with more than one mutations are counted once in the prevalence table
while all mutations are entered in the somatic dataset.

The prevalence may be missing for some papers that describe mutations included in
the somatic dataset. The prevalence dataset has been added in a recent version of the database
(2001 while the database started in 1994) and not all papers have been reviewed. The non-reviewed
papers correspond mainly to publications that describe less than 10 mutations (about 400 papers).
For some papers, the prevalence could not be retrieved from the information provided in the
publication (about 100 papers).