For locating sequencing vector the program uses a dynamic programming
algorithm and two percentage matches as cutoffs - one for the 5' end
and another for the 3' end. Both searches include the poor quality data
at the ends of the readings.
This mode writes the SL and SR records in experiment files.

If the users selects the vector_primer file mode of vector_clip the
program searches each reading for
all of the forward and reverse sequence segments in the primer_vector
file and notes
the one which matches best. If this one is above the user defined
threshold the experiment file will be modified accordingly and the
reading searched again for the corresponding sequence from the other side
of the cloning site (in order to look for matches at the reading's 3' end).
Again if the user defined threshold is reached the experiment file will
be modified accordingly.
This mode writes the SL and SR records in experiment files.

For locating cloning vector two algorithms are available, both of which
use hashing.
The original method needs a "Word
length" (word_length), the "Number of diagonals to combine" (num_diags) and
a "Cutoff score" (diagonal_score). The word length is the minimum number
of consecutive bases that will count as a match. The algorithm treats the
problem like a dot matrix comparison. First it finds all matches of length
word_length; then it locates the diagonal with the highest normalised
score. Then it adds the scores for the adjacent diagonals (num_diags). If
the combined score is at least "diagonal_score" the experiment file is
updated to indicate the location of the vector sequence. The score
represents the proportion of a diagonal that contains matching words, and
the maximum score for any diagonal is 1.0.
This mode writes the CS records in experiment files.
If the whole reading is cloning vector
this mode writes a PS record containing "all cloning vector",

A newer method also hashes using "word_length" consecutive bases and
accumulates the hits for each diagonal, but instead of using a score cutoff,
it decides if there is
a match using a probability threshold "P" supplied by the user.
For each length of diagonal vector_clip calculates "E" the score that would be
expected for probability "P", and then compares it with the observed score "O".
If for any diagonal O>E a match is declared and expressed as 100(O-E)/E. This
new method is an attempt to overcome the problem that even though the
scores on diagonals are normalised to lie in the range 0.0 to 1.0 the scores
are still a function of the diagonal length. The probability P hence allows
vector_clip to use a different cutoff score for each length of diagonal.
Tests have shown that the probability based algorithm is very much more
reliable than the older one.
By default the program still
uses the old algorithm, the probability based one being switched on by
the user specifying a probability cutoff (option -P). It is strongly
recommended that the probability based method is used and for our data we have
found that a probability of 0.000001 gives good results.
This mode writes the CS records in experiment files.
If the whole reading is cloning vector
this mode writes a PS record containing "all cloning vector".

The search for "vector rearrangements" uses a simple algorithm which
looks only for a match of length "minimum match". All readings that
contain a string of characters of at least this length that match a segment
of the vector sequence exactly will be classed as "vector rearrangements"
and their names will not be written to the file of passed file names.
This mode writes a PS record containing "vector rearrangement" in experiment
files if a match is found.

This page is maintained by
James Bonfield.
Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/vector_clip_3.html