Genome has been sequenced and assembled into contigs, but contigs have
not been assembled into pseudomolecules.

Metadata

Data about data. Metadata for genome assemblies includes information about
the plant that was sequenced, methods used, data providers and contributers,
downloads, and more.

Reference genome

Genome has been completely sequenced and assembled into pseudomolecules.

See the right-hand side bar for more terms and explanations.

About genome assemblies

Contig: a set of overlapping DNA segments that together represent a consensus
region of DNA. In bottom-up sequencing projects, a contig refers to overlapping
sequence data (reads); in top-down sequencing projects, contig refers to the
overlapping BAC (Bacterial Artificial Chromosome) clones that form a physical map of
the genome that is used to guide sequencing and assembly.

Gaps: Gaps are regions of unknown sequence. They are usually represented as
a run of Ns. Gaps can be of known or unknown length. Gaps of known length are often
filled with the number of Ns representing the missing length of sequence: for
instance, if the gap is known to be 150bp in size, then 150 Ns will span the gap.
These often represent repeat elements that could not be sequenced through. Gaps of
unknown size are usually represented by a fixed number of Ns; 100 is what GenBank
prefers. Gaps can be found in scaffolds between contig joins, and in pseudomolecules
between scaffold joins. More information can be found
here.

Genome assembly: When the contigs are assembled into larger pieces called
scaffolds. One efficient way of assembling scaffolds is to use an optical map, which
maps restriction sites to the genome and is used to anchor scaffolds based on unique
restriction site location patterns.

Genome sequencing: Short reads are sequenced and then their overlapping
sequence is assembled into longer pieces called contigs. Companies like PacBio can
now make reads that are >10kb long.

Pseudomolecule: Chromosome sequence that is made up of assembled scaffolds.

Pseudomolecule assembly: When scaffolds are oriented to create whole
chromosomes. This orientation can be done using physical map data that corresponds
to genetic map data for your genome of interest (i.e. SNPs that are associated with
genetic markers). Or, orientation can be done using syntenic data between your
assembly and a reference genome (also called reference-guided assembly). This can be
complicated since any inversions or assembly errors unique to the reference genome
can be incorrectly propagated in your assembly. However, this is sometimes the best
method if no genetic map for your genome is available.

Scaffold: Sequence that is made up of assembled contigs. These scaffolds can
span many megabases.