Friday, April 4, 2014

With the release of the GRCh38 reference
assembly, we are highlighting areas where improvements to the genome have been
made.

The chromosome 9 peri-centromeric region has undergone
significant change for GRCh38. Assembly-assembly alignments between GRCh37 and GRCh38 reveal some of the differences in the peri-centromeric region of chr. 9. As shown below, some sequences that were on the q-arm in GRCh37 are now on the p-arm in GRCh38. Why were these and other changes made?

Peri-centromeric regions of Chr. 9 in GRCh37 (top) and GRCh38 (bottom).Blue horizontal bar: chromosome sequence. Blue/green fragments: individual clone and WGS components in the assembly tiling path. Purple bars: assembly-assembly alignments. The p- and q- arms, as well as the location of the centromere and adjacent heterochromatin gaps are marked. Note: in GRCh38, the centromere gap was replaced with sequence. The vertical bars through the alignments highlight sequence from the q-arm of GRCh37 chr. 9 that is now found on the p-arm of GRCh38.

In the GRCh37 release the region was highly fragmented, with
little evidence for the order and orientation of the contigs placed within. The
optical map information was consistent with a path problem in this region. The
map data suggested that several contigs in the region were misplaced and did not
represent a valid chromosome structure in this region.

These data sets have allowed us to alter the tile path with
a degree of confidence and the GRCh38 release now provides near complete
representation of the chromosome 9 short arm.

Admixture mapping data provided by GRC collaborator Giulio Genovese confirmed localisation of clones to
chromosome 9 and, in several instances, their positioning on the long or short arm. Strand sequencing data from GRC collaborators Mark Hills and Peter Lansdorp identified contigs on the GRCh37 reference assembly that sat in incorrect orientations.

Aligning these sequences to the optical map data from 3 cell
lines, we were able to confirm results from the other data analysis and place
clone contigs in the correct order, creating longer contiguous contigs.

Although the heterochromatic region on chr. 9 is still underrepresented in GRCh38, improvements have also been made to the long arm. Several contigs localizing to the peri-centromeric region are now ordered, thus providing a better representation of the chromosome.