Obtaining an account for the server

We have an on-line form to sign up for an account. It asks for your academic or non-profit
contact details. Users from other sectors who wish to have access should send email to
cdweb@mail.cryst.bbk.ac.uk .

File Location

The file location is the path name to the CD data file on your local computer.
This string is checked for errors and if the server cannot locate the file
then the analysis will be terminated and an error message generated. It
is advisable to use the browse button as it will specify the correct file
location automatically.

The files that you upload should be in text format, for example as raw text
(.txt) DichroWeb will not accept non-text
file formats such as .exe, .gif, .jpg, .doc
or .ppt

Also, to accommodate that some users utilise the comma as a decimal separator,
DichroWeb does not interpret comma separated value (.csv) files.

File Format

The select options in the file format field are derived
from the file formats output from different CD spectroscopy machines. Mainly,
the formats differ in the size of the header and the column layout of the
data.

The 60 DS format may be obtained, even in later versions of the software,
by choosing the "export to 60 DS format" option in the instrument data browser
window, from the "export data set" pulldown.

**

It has been reported that the dichroism data can appear in either column 2
or column 4 for the BP format. Please check which format your BP file is in
and select BP (data in col. 2) or BP (data in col. 4) accordingly.

If your data exists in some other format please edit it
to match one of the above file formats or use the FREE format
option which requires two columns, wavelength and CD data respectively.
The data may begin with either high or low wavelength. If the format has
been incorrectly chosen an error message will be generated stating that
the file uploaded was not suitable for analysis.

Input Units

Circular dichroism can be measured in several ways. Within
the literature their are several conflicting measures and definitions.
Most of these have been accommodated in the select box, but for clarity,
the conversion equations used are detailed below:

Delta Epsilon Δε The per residue molar absorption units of circular
dichroism measured in M-1cm-1.
Δε is sometimes referred to as molar circular dichroism.Data peaks are usually in the range of 0 - 10

All of the analysis programmes accept these input
units except K2D. So if your data is in Δε
then no conversions are required.

Mean
Residue Ellipticity MRE [θ]
Mean residue ellipticity is the most commonly reported unit and is measured in
degrees cm2dmol-1residue-1.
Data peaks are usually in their 10,000's and the relationship between [θ]
and Δε is shown below:

Δε =
[θ] / 3298

Theta
Machine Units θTo
convert from machine units in
millidegrees, to delta epsilons, the following equation is
applied. Machine units measure the difference in molar extinction coefficients
between left and right handed light, usually between 1 and 100, and need to be corrected to account
for the amount of protein used in the sample.

Note:
on selection of this option you will be asked to specify the mean residue
weight (MRW = protein mean weight (in atomic mass units/daltons) / number of residues) amu
for the protein, path length
(P) in cm and protein concentration (CONC) in mg/ml.

Δε = θX
( 0.1 * MRW)
( P * CONC) * 3298

DRS
yy unitsOften,
CD data units are particularly large measurements and in order to acheive
accurate data measures after unit conversion, it may be necessary to multiply
the machine values. These units are commonly used at Daresbury with the
yy file format. The data is usualy in the range 0.001- 0.01.

DRS-yy
units are Theta machine units multiplied by a factor of 100. Therefore,
the relationship with Delta epsilons is as follows:

Δε =

( θ * 100
) X

( 0.1 * MRW )

(P * CONC) * 3298

DRS
unitsThese are standard Daresbury units
(machine units that have been divided by a factor of 10,000).
The relationship with delta epsilons is shown below:

Δε=

θ

X

( 0.1 * MRW)

10 000

(P * CONC) * 3298

Molar Ellipticity (θ)mMolar ellipticity is a little used unit which
has the dimensions degrees decilitres mol-1decimeter-1
.
DichroWeb does not accept data in units of (θ)m,
but such data may be converted to units of Δε
by using the following formula, where Nr represents the number of amino acids in the protein :

Δε = (θ)m * Nr / 3298

If you have data in units of (θ)m, please convert
the values to units of Δε and then submit to DichroWeb.

Initial Wavelength

The initial wavelength should correspond to the first
wavelength that appears in your
data file (i.e. towards the top). This could be either the numerically highest or lowest.
If in doubt, open up your data file in a text editor and take a look.

Final Wavelength

The final wavelength should correspond to the last
wavelength that appears in your
data file (i.e. towards the bottom). This could be either the numerically highest or lowest.
If in doubt, open up your data file in a text editor and take a look.

Wavelength Step

CD spectrophotometers can be set to record data at various wavelength intervals. All
of the DichroWeb-supported analysis programmes accept data at 1nm interval only and so all
other data points will be discarded. DichroWeb performs no smoothing of the data, if you believe
that smoothing is required, you must perform this yourself beforehand.
If the wrong wavelength step is specified the server will detect this and return an error
message stating that your file is unsuitable for analysis.

Lowest Datapoint

Sometimes part of a data set may be collected under conditions which are less than optimal.
In these cases, it is desirable to remove the block of unreliable data points from the dataset
and avoid trying to use them in any analysis. The "lowest wavelength datapoint" box allows for
this without the need to edit the input file which is being submitted to DichroWeb. Just enter
the wavelength of the last data point which is of good quality and DichroWeb will ensure that
any data below that value cannot be submitted in an analysis. The suspect data is always taken
as being the wavelengths below the entered value as the low wavelength data is generally
the problematic area of a CD spectrum.

Why would data be unreliable?
With a conventional radiation source (such as a Xenon lamp), the intensity of the emitted signal
drops significantly towards the lowest wavelengths in its range. The lower intensities can still be
collected and utilised, but in order to compensate for the loss of signal strength, the detector (typically
a photomultiplier unit) has to increase its sensitivity and consequently requires an increased
high tension voltage. There is a maximum high tension voltage at which a photomultiplier unit can
accurately record transmitted radiation, and when this is approached, the readings become unreliable.
Data collected when the high tension voltage is abnormally high, should not be used in the analysis and
the "lowest wavelength datapoint" box allows a convenient method for truncating a dataset for this purpose.
After applying this cut off criterion, if your data does not extend to sufficiently low wavelengths to
enable the various databases and methods to be used for the analyses, then it is suggested that you
re-collect the data changing the conditions - i.e. using shorter pathlengths, lower concentrations of
buffers/additives or different buffers/additives. As a good practice guideline, the high tension voltage
should not be above 550 mV at 190 nm for the sample or not above 500 mV at all for the baseline.

Analysis Programmes

CONTINLL

The original version of CONTIN implemented the ridge regression
algorithm of Provencher & Glockner, 1982. The latest version incorporates
the locally linearised model (Van Stokkum et al) in selecting basis set
proteins from the reference database.

Selcon was designed by N Sreerama & Woody 1993, and
incorporates the self-consistent method together with the SVD algorithm
to assign protein secondary structure. The programme analyses results from
a number of stages in the analysis. The first stage assigns an initial
guess at the fractional composition. The first stage result corresponds
to the Hennesey & Johnson method using SVD. In the second stage, the
SVD calculations are iterated until a convergent solution is produced (equivalent
to the original self-consistent method ). The third stage selects a number
of likely solutions from the calculations of the basis set by constraining
the summed fractional contents to equal one and each individual fraction
to be greater than -0.05. The fourth stage applies a fourth constraint:
the helix limit theorem, from which a range for helix content is determined
and results screened. The range is taken from the solution using the Hennesey
and Johnson method.

Average run time: <1 minGraphical output is producedChoice between 7 reference datasets

CDSSTR

This programme is a modification of the original Varslc
written by WC Johnson. It implements the variable selection method by performing
all possible calulations using a fixed number of proteins from the reference
set. The algorithm recognises proteins posessing characteristics not reflected
by the test protein or proteins not reflecting the characteristics of the
test protein, and removes them from the basis set. The SVD algorithm assigns
secondary structure.

This method probably produces the most accurate analysis
results, but can take up to 15 minutes to run due to the sheer volume of
calculations. It will however produce results where other methods fail
to analyse proteins.

Average run time ~5minGraphical output7 Reference datasets

VARSLC

The original implementation of the variable selection
method. The programme is flexible in that the user may configure input
data files to specify the number of proteins to be selected from the reference
set, the number of proteins to eliminated at a time from the reference
set, and the total number of calculations tried before selecting solutions.
The constraints applied can also be configured, for example, results are
selected if their rmsd, sum squares error, individual fractional content
and summed total content are within sertain limits.

To incorporate some of this flexibility into the website,
several configuration files have been set up. The first follows the guidelines
set out in the readme.txt that comes with the programme. It is recommended
~500 iterations with a basis set of 5-7 proteins, removing 1-2 per iteration.

Details of the settings files:

Choice

RMSD max

Individual Fraction min

Total sum of Fractions

No. proteins removed

No. basis proteins

No. Calculations

Default

0.55

-0.15

0.95 - 1.14

1

6

300

Settings 1

0.55

-0.15

0.95 - 1.30

1

20

528

Settings 2

0.55

-0.20

0.95 - 1.40

1

30

700

Settings 3

2.55

-0.20

0.95 - 1.20

1

6

900

The second settings file reflects the recommended values
for accurate protein analysis. The default settings exists for quicker
analyses where only 300 calculations are performed. When testing this programme
with various different CD data files, it was found that in the majority
of cases results are overlooked due to the total fraction of secondary
structures being significantly greater than 1. Therefore settings file
2 exists for cases where the default and settings 1 have not produced valid
results, and it is of use to the user to look at the kind of values resulting
from the analysis as a rough guide. Settings 3 is an extension of settings
1 with 900 calculations and a high maximum RMSD value. If no results are
obtained with any of the settings files, then CDSSTR uses the same
method but with no restrictions on the number of calculations.

There is only one reference database that comes with the
programme containing 33 reference proteins. This programme doesn't produce reconstructed spectra data
and therefore no graphical output exists.

K2D

K2D is one of a few neural network programmes. The neural
network operates via an input layer with interconnecting neurons to the
output layer. The output layer (secondary structure) is calulated as a
function of the input layer (CD data) via assigning weightings to each
neuron. The weightings are assigned random values in a training phase.
Each of the layers are fed large volumes of CD and structural data (equivalent
to reference proteins) and the weightings are adjusted in an iterative
process until an accurate secondary structure profile is obtained.

In K2D the weights file is fixed and therefore there is
no choice of reference dataset. Accuracy is calculated by , and results
for beta sheet and mixed proteins tend to be far less accurate than for
helical proteins, although when compared with other methods (Greenfield
1996) these results are an improvement.

Reference Set

All of the programmes except K2D rely upon reference datasets of proteins, from which a set of basis spectra will be selected for the analysis. CONTIN SELCON3 and CDSSTR offer a choice of reference database which should be chosen in accordance with the range of input data. It should also be noted that the choice of reference dataset affects the analysis results, particularly if there is mixed or high beta sheet content. The reference set that represents the characteristics of the protein of interest is likely to give the most accurate result.

Optional Scaling Factor

The scaling factor allows the user to modify the experimental data by small amounts
in order to try to compensate for errors in the intensity of the spectra and to
hopefully thus improve the fit. It is possible that some spectrometers have
incorrect intensity calibration and where this is known, a scaling factor may
be applied to compensate for such errors.

The scaling factor is applied to all data points, and has a default value of 1.0,
meaning no scaling. It would be highly unusual to require a large scaling factor
and typical scaling values would be in the range 0.95 - 1.05. Scaling factors
which are outside of the range 0.5 - 1.5 are unfeasibly large and will be ignored
by Dichroweb.

WARNING

Scaling factors should only be applied to data where there is a known reason for doing so.
It is possible to improve the NRMSD of an analysis by tweaking the scaling factor randomly,
but this does not necessarily mean that the structure assignment is improved. Scaling factors
should be used with caution.