A Genome Assembly Hub for Repeat Elements

Sequences

The UCSC Repeat Browser relies on a combination of Dfam 3.1 consensus sequences and automatically built consensuses.

For a few element types (e.g. recent LINE-1s), Dfam does not provide a consensus, instead providing consensuses/models for smaller subparts that RepeatMasker assembles into full length elements. For these elements, we have generated Repeat Browser consensus sequences by aligning the longest 50 instances in the RepeatMasker output. For a few other elements (e.g. MLT endogenous retroviruses ), RepeatMasker uses a single consensus (in combination with other information like flanking LTRs) to produce multiple repeat classifications. In these cases, we use the single sequence, and have the browser map all data from these different instances to that sequence.

We also provide two custom sequences HERVH-full and HERVK-full. These two consensus versions of young human endogenous retroviruses have been shown to produce viral particles when expressed off a plasmid in cultured cells.

Collectively these sequences are known as hg38reps and are available for download here: hg38reps.fa

Below is a table of Repeat Browser sequences, their corresponding Dfam and RepeatMasker names, and the manner in which they are mapped. Basic statistics on coverage and chaining are also given.