PICNIC

PICNIC (Predicting Integral Copy Numbers In Cancer) is an algorithm designed to identify copy number segments and
genotypes in cancer using a SNP6 'cel' file as input.

All PICNIC code has been made available under a BSD license and shall continue to be developed under this agreement.
This code requires Matlab. To use the algorithm without Matlab, use picnic_gui_full. To just normalize a .CEL file, use
picnic_gui_short.

PICNIC has now been updated to cater for primary tissues that contain normal contamination in addition to cell lines.

CGP Software License

This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License.

Any redistribution or derivation in whole or in part including any substantial portion of this code must include this copyright and permission notice.

Algorithm description

This array contains over 906,600 SNPs together with 946,000 copy number probes, interrogating over 1.85 million loci
in a single experiment. For each SNP on the array there are six/eight features, three or four features each for
allele A and B. The features for each allele are technical replicates. The non-polymorphic copy number probes are
designed to known copy number variations (202,000) with the remainder (744,000) being evenly distributed across the
genome. These loci are represented by a single feature. A more detailed description of the array design can be
obtained from the Affymetrix website.

An algorithm has been written specifically for use with the Affymetrix SNP6 data (PICNIC - Predicting Integral Copy
Numbers In Cancer). This algorithm provides a more refined analysis than has previously been applied to the
Affymetrix 10K data. This includes improved normalisation of the data together with determination of underlying copy
number for each segment by genome wide analysis of allele ratio and signal strength data. The data is subsequently
rescaled and plotted onto its predicted underlying integer value and segmentation applied (it should be noted that
rescaling the raw data to the underlying absolute copy number can affect the spread of the data points).

Analysis of the data in this way also allows for assignment of a genotype to each SNP. Because such genotypes are
based on the ratio for each allele they can be more complex than the traditional AA, BB, AB assignment; potentially
including such genotypes as AAB etc. Regions of loss of heterozygosity (LOH) can also be determined.

Three plots are available for SNP6 data from the CGH Viewer webpage :-

Absolute copy number: This plot shows the normalised data (grey dots) for each genomic locus on the array
together with segmentation information. The normalised data is rescaled to the underlying copy number with dark blue
lines indicating total copy number for each genomic region and light blue giving the predicted copy number of the
minor allele. Minor allele values of zero are indicative of loss of heterozygosity (LOH).

Probability: This plot shows the probability of a change in state for copy number, heterozygosity or both.

Genotype intensity: This plot shows the ratio of the two allelic intensities for SNPs on the array. Equal
heterozygote's give a ratio value of 0.5, while homozygous calls give values of ~0.8 (AA) and ~0.2 (BB). Skewed
allele ratios can result in up to four bands on the genotype intensity plot. The data is again segmented with black
lines indicating regions of heterozygosity and red lines indicating regions of homozygosity (loss of heterozygosity,
LOH).

For example, the following plot represents Chromosome 9 of sample CMK. The information for four didactic segments
labeled A, B, C and D is described below.

Segment A: 0 - 5 Mb, Total copy number (Dark blue) 4, Minor copy number (Light blue) 2. That is, each parental allele
has been duplicated. The state change probability plot indicates the end of the segment. There are three black lines
in the genotype intensity plot. SNPs with points near these lines have genotypes AAAA, AABB and BBBB, going down the
plot respectively.

Segment B: 5 - 21 Mb, Total copy number two, Minor copy number 0. That is, one parental allele has been lost (LOH)
and the other has been duplicated. The genotype intensities have two lines, corresponding to genotypes AA or BB.

Segment C: 21 - 23 Mb, Total copy number 0, Minor copy number 0. That is, both parental alleles have been lost
resulting in a homozygous deletion. The genotype intensity has a single line at 0.5, resulting from equal signal
intensity from both alleles due to background hybridisation.

Segment D: 24 - 27 Mb, Total copy number 6, Minor copy number 2. That is, one parental allele has been copied to give
two copies, the other duplicated to give four copies, with a total copy number of six. The genotype intensities have
four lines, corresponding to genotypes AAAAAA, AAAABB, AABBBB or BBBBBB.