Abstract

With the emergence and subsequent advancement of next-generation sequence technology, detailed structural and functional characterization of genomes is readily attainable. Here, we have sampled the Sorghum bicolor methylome by shallow sequencing of HSO3- (bisulfite)-treated DNA and have used these data to identify methylation patterns associated with high confidence gene models. We trained a classifier to predict functional gene models based on expression levels, methylation profiles, and sequence conservation. We have expanded the transcriptome atlas by sequencing RNA from meristematic tissues, florets, and embryos, and utilized this information to develop a more complete annotation of the sorghum transcriptome. Our gene annotations modify 60% of Sbi1.4 (version 1.4 of sorghum gene annotations) gene models. The updated models most often have extended untranslated region (UTR) annotations (18,105), but some show longer protein coding regions (5096) or previously unannotated alternative transcripts (6493). A phylogenetic analysis suggests that 800 genes are missing from annotation Sbi1.4 and 400 gene models are split. The new annotations resolve 50% of split gene models and include 30% of conserved genes missing from the Sbi1.4 annotation. Using our classifier, we identified a large set of 34,276 novel potentially functional transcribed regions. These transcribed regions include protein coding genes, non-coding RNAs, and other classes of gene products.