Triticum dicoccoides Assembly and Gene Annotation

About Triticum dicoccoides

Emmer wheat or hulled wheat is a type of awned wheat. Emmer is a tetraploid (2n = 4x = 28 chromosomes).The domesticated types are Triticum turgidum subsp. dicoccum and Triticum turgidum conv. durum. The wild plant is called Triticum turgidum subsp. dicoccoides. The principal difference between the wild and the domestic is that the ripened seed head of the wild plant shatters and scatters the seed onto the ground, while in the domesticated emmer the seed head remains intact, thus making it easier for humans to harvest the grain. Along with einkorn wheat, emmer was one of the first crops domesticated in the Near East. It was widely cultivated in the ancient world, but is now a relict crop in mountainous regions of Europe and Asia. Emmer is considered a type of farro food especially in Italy.

Assembly

Wild emmer accession “Zavitan” was chosen for this genome assembly to leverage the genetic data already collected for this line by the WEWseq Consortium. The WEW reference genome, constructed by whole-genome shotgun (WGS) sequencing of various insert-size libraries, produced contigs with an N50 of 57,378 base pairs (bp) and scaffolds with an N50 of 6,955,166 bp. The scaffolds were validated with genetic data and combined with three-dimensional (3D) chromosome conformation capture sequencing (HiC) data, enabling construction of chromosome-scale assemblies (pseudomolecules). The resulting 10.5-Gb genome assembly is composed of 14 pseudomolecule sequences representing the 14 chromosomes of WEW (10.1 Gb) and one group of unassigned scaffolds (0.4 Gb). The gaps between scaffolds, estimated to represent ~1.5 Gb of the genome, are likely the result of technically difficult-to-sequence or difficult-to-assemble regions.

Annotation

Gene annotation was carried out by the WEWseq Consortium. RNA sequencing reads generated from 20 different combinations of WEW tissues and developmental stages were used to annotate protein-coding genes in the WEW assembly (13). 65,012 high-confidence (HC) gene models were identified, and validation with the BUSCO gene set indicated that the assembly captured 98.4% of the WEW gene complement. 45,532 Low confidence genes where discovered as well and shown on a seperate track.

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes

62,569

Non coding genes

4,731

Small non coding genes

4,513

Long non coding genes

218

A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes

75

A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts