Wednesday, January 9, 2013

Genes encoding for proteins that compose the immune system
are constantly evolving in response to selective pressures from pathogens. This
rapid host-pathogen co-evolution has led to large families of genes that are
highly polymorphic and are often a result of gene duplication and
diversification. In GRCh37, the current reference assembly, some chromosome regions encompassing such genes are comprised of components from several different genomic libraries. The lack of a single
haplotype and excess allelic variation at such regions hinders haplotype inference using traditional linkage disequilibrium based methodology. In addition, given the polymorphic nature of these genes, paralogs may be missing from the reference assembly. TheCHORI-17 BAC library, derived from ahydatidiform mole, is an excellent resource for resolving loci such as these, as
it is composed of germline material without any allelic variation.
We sequenced clones from CHORI-17 to create a single haplotype across two of
these loci: the leukocyte receptor complex (LRC) and the immunoglobulin heavy
chain locus (IGH). These new paths have now been released as fix patches in GRCh37.p11.

The LRC on chromosome 19q13.4 is approximately 1 Mbp and
contains many genes related to immune response including the LILR
(Leukocyte Immunoglobulin-like Receptor) and KIR (Killer
Immunoglobulin-like Receptor) gene families (Fig.1). The products of these genes
interact with HLA molecules making them important components of the innate
immune response. The GRC previously released 8 novel patches providing partial representation of the LRC region for eight different haplotypes. We have now released a fix patch (KB021647.1) for this region that provides full representation for the CHORI-17 haplotype. In GRCh38, this patch will be incorporated into the reference chromosome, replacing the GRCh37 mixed haplotype. The CHORI-17 haplotype harbors the common 6.8 kbp LILRA3
deletion, which has been associated with multiple autoimmune disorders such as
psoriasis and multiple sclerosis. In addition, the KIR haplotype is the A01
haplotype, which contains the 22 bp frameshift deletion variant of the 2DS4
gene that inactivates the protein.

Fig. 1 Top: Alignment of GRCh37 chr. 19 to the LRC region fix patch. Bottom: Alignment of the fix patch and 8 LRC region novel patches to GRCh37 chr. 19. The blue bars represent the tiling paths of chr. 19 (NC_000019.9) and the fix patch (KB021647.1). The region of the fix patch comprised of CHORI-17 clones is highlighted in orange. Genes annotated on the chromosome are shown in green. The gray tracks below represent the alignments: the thin horizontal lines indicate gaps, while the small vertical red bars indicate mismatches. The red arrows show the location of the LILRA3 deletion in the CHORI-17 haplotype.

The 1 Mbp IGH locus on chromosome 14q32.33 contains genes
that encode for the heavy chain of immunoglobulin molecules that interact with
antigen epitopes (Fig. 2). This locus is even more complicated than the LRC given that
the IGH genes are subject to somatic rearrangements, and attempts to reconcile
the organization of the locus using B-lymphocyte derived material have been difficult. The GRC has now released a fix patch (KB021645.1) that provides a single haplotype representation for the majority of this locus, covering the IG variable domain encoding gene segments. The CHORI-17 haplotype adds 101 kbp of previously uncharacterized sequence, including functional IGH variable genes and four large germline copy number
variants (Watson and Steinberg, in review).

Fig. 2. Top: Alignment of GRCh37 chr. 14 to the IGH region fix patch. Bottom: Alignment of the fix patch to GRCh37 chr. 14. The blue and gray bars represent the tiling paths of chr. 14 (NC_000014.8) and the fix patch (KB021645.1). The region of the fix patch comprised of CHORI-17 clones is highlighted in orange. Genes annotated on the chromosome are shown in green. The purple bars below represent the alignments: the thin regions indicate gaps, while the small vertical ticks indicate mismatches.

These two updates highlight the utility of using hydatidiform
mole BAC libraries for resolving complex, highly duplicated loci of the human
genome. By releasing these updates as fix patches to the reference sequence researchers
can make use of these high quality sequences to better characterize sequence variation
from their own disease association studies ahead of the GRCh38 genome update.