Keep in mind that resulting libraries should be further screened for gene families. There are border cases, where genome may contain thousands of modified copies of a gene, ranging from seemingly functional copies, through pseudogenes, gene fragments and single exons (i.e Speer family in rodents).

Consensus Based

One has to have at least a draft of the genome or multiple genomic sequences.

From 10M+ Sanger sequences of average length 800+ bp they constructed two libraries:

A) 1% of reads, 17-mers with a depth of at least 10 ("especially enriched in the CR1 class of retrotransposons")

B) 10% of reads 17-mers depth range of 10 to 100.

"""
- to improve the quality of the assembly, only reads that have at least 100 high-depth 17-mers were considered

- ReAS was run on each of the libraries separately

- after retaining only the assembled repeats of length larger than 500 nucleotides and a minimal average depth value (as provided by the
program) of 10, the two libraries contained 949 and 25,110 repeats each, respectively.

- These sequences were then pulled together.

- initial ReAS assembly appears to be fragmented and there are many redundant sequences, the final version of the library was produced by
running ReAS' join_fragments.pl and rmRedundance.pl scripts.

- final library contained 3909 reconstructed repetitive elements with an average length of about 1500 nt.
"""