* Lerat E. [http://www.nature.com/hdy/journal/vaop/ncurrent/full/hdy2009165a.html Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs ] (Nov 2009)

+

+

Keep in mind that resulting libraries should be further screened for gene families. There are border cases, where genome may contain thousands of modified copies of a gene, ranging from seemingly functional copies, through pseudogenes, gene fragments and single exons (i.e Speer family in rodents).

+

+

==Consensus Based==

+

One has to have at least a draft of the genome or multiple genomic sequences.

+

+

<!-- Since at least some next gen sequence assemblers (Newbler for 454 data) reject highly over-represented sequences during assembly, the true repeat content of the genome will be biased.

While not "repetitive" in a strict sense, ribosomal sequences need to be detected and preferably masked for i.e. de novo gene prediction or ESTs mapping. There are in Genbank "protein coding genes" predicted from ribosomal sequences, not only as a result of de novo prediction but "prediction with a strong ESTs/RNA-Seq support".

Keep in mind that resulting libraries should be further screened for gene families. There are border cases, where genome may contain thousands of modified copies of a gene, ranging from seemingly functional copies, through pseudogenes, gene fragments and single exons (i.e Speer family in rodents).

Consensus Based

One has to have at least a draft of the genome or multiple genomic sequences.

From 10M+ Sanger sequences of average length 800+ bp they constructed two libraries:

A) 1% of reads, 17-mers with a depth of at least 10 ("especially enriched in the CR1 class of retrotransposons")

B) 10% of reads 17-mers depth range of 10 to 100.

"""
- to improve the quality of the assembly, only reads that have at least 100 high-depth 17-mers were considered

- ReAS was run on each of the libraries separately

- after retaining only the assembled repeats of length larger than 500 nucleotides and a minimal average depth value (as provided by the
program) of 10, the two libraries contained 949 and 25,110 repeats each, respectively.

- These sequences were then pulled together.

- initial ReAS assembly appears to be fragmented and there are many redundant sequences, the final version of the library was produced by
running ReAS' join_fragments.pl and rmRedundance.pl scripts.

- final library contained 3909 reconstructed repetitive elements with an average length of about 1500 nt.
"""

DustMasker

WindowMasker

Others

seg from wu-blast

ribosomal RNA detection

While not "repetitive" in a strict sense, ribosomal sequences need to be detected and preferably masked for i.e. de novo gene prediction or ESTs mapping. There are in Genbank "protein coding genes" predicted from ribosomal sequences, not only as a result of de novo prediction but "prediction with a strong ESTs/RNA-Seq support".