Analysis Programmes

CONTINLL

The original version of CONTIN implemented the ridge regression
algorithm of Provencher & Glockner, 1982. The latest version incorporates
the locally linearised model (Van Stokkum et al) in selecting basis set
proteins from the reference database.

Selcon was designed by N Sreerama & Woody 1993, and
incorporates the self-consistent method together with the SVD algorithm
to assign protein secondary structure. The programme analyses results from
a number of stages in the analysis. The first stage assigns an initial
guess at the fractional composition. The first stage result corresponds
to the Hennesey & Johnson method using SVD. In the second stage, the
SVD calculations are iterated until a convergent solution is produced (equivalent
to the original self-consistent method ). The third stage selects a number
of likely solutions from the calculations of the basis set by constraining
the summed fractional contents to equal one and each individual fraction
to be greater than -0.05. The fourth stage applies a fourth constraint:
the helix limit theorem, from which a range for helix content is determined
and results screened. The range is taken from the solution using the Hennesey
and Johnson method.

Average run time: <1 minGraphical output is producedChoice between 7 reference datasets

CDSSTR

This programme is a modification of the original Varslc
written by WC Johnson. It implements the variable selection method by performing
all possible calulations using a fixed number of proteins from the reference
set. The algorithm recognises proteins posessing characteristics not reflected
by the test protein or proteins not reflecting the characteristics of the
test protein, and removes them from the basis set. The SVD algorithm assigns
secondary structure.

This method probably produces the most accurate analysis
results, but can take up to 15 minutes to run due to the sheer volume of
calculations. It will however produce results where other methods fail
to analyse proteins.

Average run time ~5minGraphical output7 Reference datasets

VARSLC

The original implementation of the variable selection
method. The programme is flexible in that the user may configure input
data files to specify the number of proteins to be selected from the reference
set, the number of proteins to eliminated at a time from the reference
set, and the total number of calculations tried before selecting solutions.
The constraints applied can also be configured, for example, results are
selected if their rmsd, sum squares error, individual fractional content
and summed total content are within sertain limits.

To incorporate some of this flexibility into the website,
several configuration files have been set up. The first follows the guidelines
set out in the readme.txt that comes with the programme. It is recommended
~500 iterations with a basis set of 5-7 proteins, removing 1-2 per iteration.

Details of the settings files:

Choice

RMSD max

Individual Fraction min

Total sum of Fractions

No. proteins removed

No. basis proteins

No. Calculations

Default

0.55

-0.15

0.95 - 1.14

1

6

300

Settings 1

0.55

-0.15

0.95 - 1.30

1

20

528

Settings 2

0.55

-0.20

0.95 - 1.40

1

30

700

Settings 3

2.55

-0.20

0.95 - 1.20

1

6

900

The second settings file reflects the recommended values
for accurate protein analysis. The default settings exists for quicker
analyses where only 300 calculations are performed. When testing this programme
with various different CD data files, it was found that in the majority
of cases results are overlooked due to the total fraction of secondary
structures being significantly greater than 1. Therefore settings file
2 exists for cases where the default and settings 1 have not produced valid
results, and it is of use to the user to look at the kind of values resulting
from the analysis as a rough guide. Settings 3 is an extension of settings
1 with 900 calculations and a high maximum RMSD value. If no results are
obtained with any of the settings files, then CDSSTR uses the same
method but with no restrictions on the number of calculations.

There is only one reference database that comes with the
programme containing 33 reference proteins. This programme doesn't produce reconstructed spectra data
and therefore no graphical output exists.

K2D

K2D is one of a few neural network programmes. The neural
network operates via an input layer with interconnecting neurons to the
output layer. The output layer (secondary structure) is calulated as a
function of the input layer (CD data) via assigning weightings to each
neuron. The weightings are assigned random values in a training phase.
Each of the layers are fed large volumes of CD and structural data (equivalent
to reference proteins) and the weightings are adjusted in an iterative
process until an accurate secondary structure profile is obtained.

In K2D the weights file is fixed and therefore there is
no choice of reference dataset. Accuracy is calculated by , and results
for beta sheet and mixed proteins tend to be far less accurate than for
helical proteins, although when compared with other methods (Greenfield
1996) these results are an improvement.