In article <C7IG5p.CKH at hkuxb.hku.hk> billyli at hkuxa.hku.hk (Billy Li) writes:
>>Hi,
> Sorry for a dumb question. Did anyone know how to precisely define
>a tandem repeat (prossibly with a good reference)? Biologists have
>published sequences claiming a segment is a tandem repeat. How do they
>detect them and is there any algorithm or software that can detect
>tandem repeats? How did tandem repeats differ from slippage since
>both types of repeats occur together.
> Sorry for so many questions and I look forward to hear your responses.
> Please e-mail if possible.
>>--
>Billy Li, email: billyli at hkuxa.hku.hk>Department of Statistics,
>University of Hong Kong.
>Tel: (852) 8591920 Fax: (852) 8589041
There are several approaches to detecting tandem repeats:
1) Needleman/Wunsch/Sellers type algorithms. These algorithms usually
approximate an exhaustive regime of comparisons of the a sequence with
itself, in all possible alignments. There are many variations on this
scheme.
eg. GATGATGAT---> slide top sequence anong bottom seq.
GATGATGAT
2) Data dictionaries. By sorting all possible subsequences in a lexical
fashion, tandomly-repeated sequences will appear near each other in the
dictionary.
eg.
ATG
ATGA
ATGAT
ATGATGA
ATGATGAT
GAT
GATA
etc...
3) Dot-Matrix similarity searches. In this approach, the sequence is
written on both X & Y axes of a matrix. Where subsequences match
above some threshold, a dot or some other character is printed at the
corresponding X,Y coordinate in the matrix.
10 20
CGTATCATGATGATGATACG
C . . . .
G A . . . .
T A . . . .
A A . . . .
T A . . . .
C A . . . .
A A . . . .
T A .A A . . .
G A. A A . . .
10A.........A..A..A........................
T A .A A . . .
G A. A A . . .
A A A A . . .
T A .A A . . .
G A. A A . . .
A A A A . . .
T . A . . .
A . A . . .
C . A. . .
20G........................................
The main diagonal of A's indicates that the sequence matches itself at
all positions. Tandem repeats appear as shorter diagonals symmetrically
arrayed about the main diagonal. The beauty of this approach is the
fact that the superb pattern recognition abilities of the human brain
are exploited. In my opinion, this method is far better than 1 or 2
at finding tandem repeats.
I don't immediately have references to programs specifically designed
for searching for tandem repeats, but any similarity search program
can be used for this purpose.
An explanation of dot-matrix searches can be found in:
Fristensky, B. (1986) Nucleici Acids Res. 14:597-610.
which is available by anonymous FTP to the directory psgendb at
ccu.umanitoba.ca.
===============================================================================
Brian Fristensky |
Department of Plant Science | A question is like a knife that slices
University of Manitoba | through the stage backdrop and gives us
Winnipeg, MB R3T 2N2 CANADA | a look at what lies hidden behind.
frist at ccu.umanitoba.ca |
Office phone: 204-474-6085 | Milan Kundera, THE UNBEARABLE LIGHTNESS
FAX: 204-261-5732 | OF BEING
===============================================================================