Abstract

Erythrocytosis is a rare disorder characterized by increased red cell mass and elevated hemoglobin concentration and hematocrit. Several genetic variants have been identified as causes for erythrocytosis in genes belonging to different pathways including oxygen sensing, erythropoiesis and oxygen transport. However, despite clinical investigation and screening for these mutations, the cause of disease cannot be found in a considerable number of patients, who are classified as having idiopathic erythrocytosis. In this study, we developed a targeted next-generation sequencing panel encompassing the exonic regions of 21 genes from relevant pathways (~79 Kb) and sequenced 125 patients with idiopathic erythrocytosis. The panel effectively screened 97% of coding regions of these genes, with an average coverage of 450×. It identified 51 different rare variants, all leading to alterations of protein sequence, with 57 out of 125 cases (45.6%) having at least one of these variants. Ten of these were known erythrocytosis-causing variants, which had been missed following existing diagnostic algorithms. Twenty-two were novel variants in erythrocytosis-associated genes (EGLN1, EPAS1, VHL, BPGM, JAK2, SH2B3) and in novel genes included in the panel (e.g. EPO, EGLN2, HIF3A, OS9), some with a high likelihood of functionality, for which future segregation, functional and replication studies will be useful to provide further evidence for causality. The rest were classified as polymorphisms. Overall, these results demonstrate the benefits of using a gene panel rather than existing methods in which focused genetic screening is performed depending on biochemical measurements: the gene panel improves diagnostic accuracy and provides the opportunity for discovery of novel variants.

Introduction

Erythrocytosis is a clinical condition characterized by increased red cell mass and typically elevated hemoglobin concentration and hematocrit.1 It can be congenital (e.g. genetic) or acquired and classified as primary or secondary1 (Figure 1A). Several causal genetic mutations have been identified. Heterozygous mutations in the erythropoietin receptor (EPOR) gene cause primary congenital erythrocytosis,2,3 while JAK2 mutations are predominantly associated with primary acquired erythrocytosis i.e. polycythemia vera.4–6 Homozygous germline mutations in VHL e.g. Chuvash polycythemia and heterozygous germline mutations in EGLN1 (PHD2) and EPAS1 (HIF2A) have been found in patients with secondary congenital erythrocytosis.2,7 Regarding EPAS1, somatic gain-of-function mutations have been detected in pheochromocytomas and paragangliomas in patients with congenital erythrocytosis, attributed to tissue mosaicism.8 Some patients, particularly those with polycythemia vera and some forms of genetic erythrocytosis, have increased incidences of both arterial and venous thromboembolic events.9 Other congenital lesions include high oxygen-affinity hemoglobinopathies or 2,3-bisphosphoglycerate deficiency,10–12 caused by mutations in globin genes (HBA1, HBA2, HBB) or the BPGM gene, respectively. These genes belong to key pathways involved in the pathogenesis of erythrocytosis e.g. the oxygen-sensing (hypoxia-inducible factor, HIF) pathway, erythropoiesis and oxygen transport (Figure 1B). Briefly, HIF are transcription factors composed of two subunits: HIFα, which is oxygen-sensitive, and HIFβ. There are three HIFα isoforms, but HIF2α (EPAS1) is erythropoietin’s (EPO) main transcriptional regulator.13,14 In normoxia, HIFα is hydroxylated by oxygen-dependent prolyl hydroxylases (encoded by EGLN1, EGLN2 and EGLN3), binds to VHL and becomes ubiquitinated and degraded. In hypoxia, hydroxylation diminishes and HIFα stabilizes and initiates the transcription of target genes, including EPO.15 Erythropoietin binds to the EPOR of erythroid progenitor cells in the bone marrow, stimulating proliferation and differentiation into red blood cells, through a JAK2-mediated signaling cascade. In red blood cells, BPGM promotes the release of oxygen to local tissues by producing 2,3-bisphosphoglycerate, which decreases the affinity of hemoglobin to oxygen.

Classification and pathogenesis of erythrocytosis. (A) Causes of erythrocytosis. Erythrocytosis can be congenital or acquired. It is classified as primary, when there is an intrinsic defect in erythropoietic cells and erythropoietin (Epo) levels are low, or secondary, when the increased red cell production is externally driven through increased EPO production and EPO levels are high or inappropriately normal. Note: in this article, the term erythrocytosis rather than polycythemia is used consistently throughout (B) Pathways involved in the pathogenesis of erythrocytosis. (i) Hypoxia inducible factor (HIF) oxygen sensing pathway in renal EPO-producing cells. HIF are dimeric transcription factors composed of one α- and one β- subunit. In normoxia, HIFα subunits are hydroxylated by oxygen-dependent prolyl-hydroxylases (PHD) and asparaginyl hydroxylase (HIF1AN). The hydroxylated prolines (P) are recognized by VHL, which mediates the ubiquitination and proteasomal degradation of HIFα. The hydroxylated asparagine (N) compromises the interaction of HIFα with cofactors necessary for transcriptional activity (p300/CBP). In hypoxia, PHD and HIF1AN are less active, HIFα subunits stabilize and translocate into the nucleus where they interact with the HIFβ subunit and cofactors and initiate transcription of target genes, including EPO (ii) Erythropoiesis in the bone marrow. This is triggered by the binding of EPO to the EPO receptor (EPOR) located on the surface of erythroid progenitor cells and subsequent activation of the JAK2-signaling cascade. The process is inhibited by the interaction of SH2B3 and JAK2. (iii) Hemoglobin (Hb) synthesis and oxygen transport. BPGM produces 2,3-BPG, which promotes the release of oxygen to local tissues by decreasing the affinity of deoxygenated Hb to oxygen. Alterations in the Hb chains (Hb-α and Hb-β) or BPGM could shift the Hb-oxygen dissociation curve and alter oxygen levels, which directly influence EPO production. (PV, polycythemia vera; ECYT 1–4, erythrocytosis type 1-4; Hb, hemoglobin; O2, oxygen; 2,3-BPG, 2,3-bisphosphoglycerate; RBC, red blood cells; EPO, erythropoietin; PHDs, prolyl hydroxylases). PHDs: PHD1 (EGLN2), PHD2 (EGLN1) and PHD3 (EGLN3).

Even if fully investigated (including screening for known mutations), a considerable proportion of patients (~70%) remain without an identified cause of their erythrocytosis and are described as having idiopathic erythrocytosis.3,9 About two thirds of these patients have inappropriately normal or elevated erythropoietin levels suggesting a defect in oxygen-sensing or oxygen delivery pathways. Most patients have early-onset disease and/or often a family history, suggesting a high probability of genetic etiology. Logically, further investigation of these patients should begin by fully sequencing genes in which genetic variants are already known to cause erythrocytosis as opposed to simply screening for particular known variants. As many of these are in the HIF pathway, sequencing other key genes in this pathway (in which variants have not yet been observed) and also other erythropoiesis-related genes, is likely to be fruitful in the effort to resolve functional variants.

Using traditional DNA sequencing methods, e.g. Sanger sequencing, to comprehensively sequence a large number of genes in a substantial number of patients with a relatively rare disease is time-consuming, labor-intensive and impractical. Conversely, high-throughput technology e.g. whole-genome sequencing (WGS), has its own drawbacks with generation of huge volumes of data, high cost and complex bioinformatic analysis. A way forward is the development of disease-relevant, targeted, next-generation sequencing gene panels.

We developed a next-generation sequencing erythrocytosis gene panel, using an ultra-high multiplex polymerase chain reaction method (AmpliSeq, Thermo Fisher), which allows rapid high-throughput sequencing of the full length of multiple genes in multiple samples. We defined a custom-made panel of 21 candidate genes from key pathways involved in the pathogenesis of erythrocytosis, and used it to sequence 125 patients with idiopathic erythrocytosis. We also included novel candidate genes suggested by an initial WGS study, the WGS500 project,16 in which 500 samples across a diverse spectrum of clinical disorders were sequenced, including some cases of idiopathic erythrocytosis strongly suspected of having a genetic cause.

The aims of the study were: (i) to create a targeted sequencing panel, as a research tool, for the genetic investigation of erythrocytosis; (ii) to evaluate the panel’s diagnostic utility in a cohort of patients with idiopathic erythrocytosis; (iii) to search for novel variants in erythrocytosis-associated genes; and (iv) to include new candidate genes identified in WGS500 to determine whether they are mutated in additional patients.

Methods

Patients

DNA samples extracted from the blood of patients with idiopathic erythrocytosis were acquired from four separate idiopathic erythrocytosis databases (UK, Portugal, Germany and The Netherlands). Participants gave informed consent and appropriate ethical approval was gained. The inclusion criteria were: (i) confirmed absolute erythrocytosis with a red cell mass >125% predicted, and hemoglobin >180 g/L and hematocrit >0.52% in adult males or hemoglobin >160 g/L and hematocrit >0.48% in adult females, or hemoglobin and hematocrit levels above the 99th centile of age-appropriate reference values in children; (ii) registered as idiopathic (unidentified cause of illness), following appropriate investigation at each Center (Online Supplementary Figure S1); and (iii) early-onset disease, or cases with long-standing idiopathic erythrocytosis. Details are given in the Online Supplementary Information.

Ten samples were whole-genome sequenced as part of the WGS500 project, whereas we used our erythrocytosis gene panel to sequence 125 samples from patients with idiopathic erythrocytosis as well as ten positive controls.

Whole-genome sequencing

Samples were sequenced at a 30× depth with Illumina HiSeq2000. Details are provided in the Online Supplementary Information.

Ion Torrent sequencing and analysis

A customized panel, encompassing the coding and untranslated regions of the candidate genes (Table 1), was created using the Ion AmpliSeq Designer (Thermo Fisher), whereby 635 primer pairs generating amplicons of ~200 bp were designed. This panel covered 90.3% of the target region (78.96 Kb), with 97.4% average coverage of the coding regions. The primers, synthesized in two multiplex pools, were used with the Ion Ampliseq Library kit 2.0 and Ion Xpress barcode adapters (Thermo Fisher) to create libraries. Library quality and concentration were assessed using a 2100 Bioanalyzer (Agilent Technologies). Pools of eight libraries were used for template preparation, loaded into an Ion 316 chip and sequenced on an Ion PGM instrument (500 flows).

The Torrent Suite Software (Thermo Fisher) was used for quality control and alignment of the sequencing data to the human genome (Hg19). Variants were called with the Ion Reporter Software v4.2 (Thermo Fisher), using the germline workflow for single samples and the default parameters, and annotated with ANNOVAR.17 Only variants fulfilling all of the following conditions were selected for further analysis: confidence ≥40, read depth ≥20, frequency in 1000 Genomes (1000G) ≤3% and frequency in NHLBI ESP exomes (6500si) ≤3%. Provean and the SIFT and PolyPhen2-HDIV scores and cut-offs from the ANNOVAR LJB23 database were used to assess causality of non-synonymous variants. Synonymous variants were investigated for possible splicing effects using Human Splicing Finder, NetGene2 and FSPLICE. Further details are given in the Online Supplementary Information.

Sanger validation

All relevant variants identified by Ion Torrent sequencing were confirmed by Sanger sequencing. For protocol and primer details see the Online Supplementary Information and Online Supplementary Table S1.

Results

Novel candidate genes and variants were identified by whole-genome sequencing

The whole genomes of a small number of idiopathic erythrocytosis cases strongly suspected of having a genetic cause were sequenced as part of the WGS500 project. Candidate variants were found in novel genes, not previously associated with erythrocytosis: EPO, GFI1B, KDM6A and BHLHE41. Details of the rationale and criteria used to select these genes as candidates are given in the Online Supplementary Information and Online Supplementary Table S2. On this basis, these genes were included in the next-generation sequencing gene panel along with other erythrocytosis candidate genes (Table 1).

The erythrocytosis gene panel has high performance in sequencing and variant detection

Overall, 135 samples were sequenced on the Ion Torrent using the gene panel (125 undiagnosed patients, 10 positive controls). On average, 89% of mapped reads were on target regions, which indicates a successful custom panel according to the manufacturer’s guidance. The average coverage depth of the amplicons generated was 450× (Figure 2A). Most samples (133 out of 135) had over 92% of amplicons with coverage above 20× (Figure 2B). Only two samples presented substantial failure across the panel (Figure 2B), which was related to DNA quality. Only 17 amplicons (2.6%) had an average coverage below 20× across samples, indicating a general poor amplification of these regions within the highly-multiplexed reactions (Online Supplementary Table S3). Ten of these (1.6% of all amplicons) had complete failure (coverage <20× in all samples), probably due to sequence context issues. The sequencing was, therefore, generally successful across samples, with a high percentage of the target sequence included at a good depth for germline variant calling.

Coverage of the amplicons generated by the erythrocytosis gene panel across 135 samples. (A) Each boxplot represents the distribution of the number of reads obtained for all the amplicons generated by the panel within each sample. The horizontal line across the plot shows the average coverage (450×). (B) Each dot represents the percentage of amplicons with coverage over 20× within each sample.

We compiled a list of all known erythrocytosis-associated variants from the literature,2,3 including the variants identified in the WGS study, and cross-referenced their genomic coordinates with those of the generated amplicons. With the exception of two missense variants in VHL, all the other variants were within amplicons that performed well. The two VHL missense variants – c.235C>T and c.311G>T – fall within an amplicon in exon 1 that showed complete failure and would not, therefore, be detected.

Importantly, our panel reliably detected ten known variants – in different genes and hence in different amplicons – in the positive control samples, in which mutations had previously been identified either through WGS or Sanger sequencing (Online Supplementary Table S4).

Fifty-one exonic variants were identified across 57 patients by the erythrocytosis gene panel and validated by Sanger sequencing

We identified 98 different variants across the coding regions of the genes examined, of which 19 were insertions or deletions (INDEL), 49 non-synonymous single nucleotide variations (SNV) and 30 synonymous SNV (Figure 3). None of the synonymous SNV is predicted to alter splicing according to Human Splicing Finder, NetGene2 and FSPLICE. We, therefore, focused on variants resulting in protein sequence alterations: following Sanger sequencing, 17 out of the 19 INDEL appeared to be false positives but two were confirmed. All 49 non-synonymous SNV were confirmed, although for one SNV there was a single base discrepancy: Ion Torrent detected a triple base change (CAA>ATT) in exon 12 of JAK2 (chr9:5070025-5070027) but only a double change (AA>TT, chr9:5070026-5070027) was confirmed by Sanger sequencing. As a result, a total of 51 variants (49 SNV, 2 INDEL) were detected (Online Supplementary Table S5). Therefore, 57 out of 125 cases had at least one exonic variant (45.6%); of those, 38 patients had only one exonic variant detected (30.4%), while 19 had more than one (15.2%).

Overview of the exonic variants detected with Ion Torrent sequencing among 125 patients with erythrocytosis, their validation and further classification.

To investigate whether the variants discovered are unique to erythrocytosis patients (and therefore more likely to be disease-causing), we used in silico data from the 1000G project as a control. For this, we examined the variant calls from the 1000G project after integrating both exome and low coverage data across 1041 individuals and extracted the SNV identified within the coordinates of the amplicons generated by our gene panel. We found that of the 49 non-synonymous SNV discovered, 30 were uniquely found in our erythrocytosis cohort and not in the 1000G in silico control cohort, whereas the other 19 were also found in the control cohort at similar or higher frequencies (Fisher exact test and Benjamini and Hochberg false discovery correction18) (Figure 3). Those 19 SNV (Online Supplementary Table S6) are thus unlikely to be disease-causing mutations and most likely represent polymorphisms.

Out of the 30 uniquely identified variants in our cohort of patients, ten had been previously reported in the literature as causing erythrocytosis and hence are classified here as disease-causing variants (Table 2). The remaining 20 had no previous clinical associations. No exonic variants were identified in EGLN3, HIF1AN (FIH), HBA1, HBA2, GFI1B or ZNF197.

Variants detected by the erythrocytosis gene panel, known to cause erythrocytosis.

Novel genes and variants identified by the erythrocytosis gene panel

The 22 novel variants (20 SNV and 2 INDEL) identified (Table 3) are extremely rare: nine were absent from both the dbSNP142 and Exome Aggregation Consortium (ExAC) databases, the latter containing data from 60,706 unrelated individuals; eight were reported only in ExAC at extremely low allele frequencies (≤0.0007), and only five were reported in both databases at very low allele frequencies (≤0.005).

Fourteen of these novel or very rare variants were found in known erythrocytosis-associated genes, such as VHL, EPAS1, JAK2, SH2B3 (LNK), EGLN1 and BPGM. Some of these variants have a high likelihood of causality based on the location and predicted effect of the protein coding change as well as on genetic evidence for causality, and are of particular physiological interest. For example, EPAS1 p.Y532H, a novel exon 12 mutation, is located one position downstream of residue 531, which is the prolyl hydroxylation site on HIF2α on the C-terminal oxygen-dependent degradation domain (ODD). Furthermore, it is part of a six-residue domain which is highly conserved both across all HIFα isoforms and across species and which interacts with the VHL complex.19 Thus, this mutation likely interferes with hydroxylation of HIF2α by prolyl hydroxylases and binding to the VHL complex, leading to upregulation of erythropoietin. It was found in two related patients, father and son, both of whom had idiopathic erythrocytosis with raised erythropoietin levels, and was, therefore, inherited in an autosomal dominant manner. EGLN1 p.L279P affects a conserved residue, previously reported as altered (p.L279Tfs43, a frameshift variant) in a patient with erythrocytosis.20 Structurally, this residue is located on helix 3, which interacts with both N-terminal and C-terminal ODD hydroxylation domains on HIFα;21 a proline substitution may affect protein stability and diminish ODD binding, reducing HIFα hydroxylation. The VHL p.E52X variant introduces a stop codon, predicting translation termination of the long VHL isoform (p30) while allowing translation only of the alternative form of VHL (p19) from a translation site at M54. To date, only a few variants upstream of the VHL internal start codon 54 have been described and have been associated with either pheochromocytomas (codon 25 and 38) or with von Hippel-Landau (VHL) disease (p.E46X and p.E52K).22–24 The role of the heterozygous VHL p.E52X in producing erythrocytosis in the patient in our study is not clear and the patient will be advised to undergo investigations for the presence of VHL disease; there is evidence that erythrocytosis is seen in about 5–20% of patients with VHL disease.25 For the remaining variants, most were classified as deleterious by either SIFT, PolyPhen2 or Provean (Table 3), with a high degree of agreement between tools, so further investigations are needed to elucidate their functional impact.

Eight variants were identified in novel genes included in the panel because of their association with the oxygen-sensing pathway but in which no previous erythrocytosis-associated mutation has been reported, such as EGLN2, HIF3A and OS9 (Table 3). In addition, novel variants were also found in EPO and BHLHE41, two genes without previous genetic association with erythrocytosis which were revealed by WGS500. For EPO, the most striking variant found is a frameshift, p.P7fs, detected in a heterozygous state in one patient. Although at present it is difficult to link an apparently inactivating mutation to the generation of erythrocytosis, the variant has since been confirmed in a heterozygous state in the patient’s father who also has high hematocrit and hemoglobin levels. Two other EPO SNV were detected in other patients but these are most likely very rare polymorphisms (Table 3). Regarding BHLHE41, the novel missense variant identified (p.F149L) is classified as benign by Provean, PolyPhen2 and SIFT and is thus unlikely to be pathogenic, a notion supported by segregation analysis in the patient’s family (Table 3).

Discussion

The technical progress in next-generation sequencing, together with the increasing understanding of the biological pathways underlying the pathogenesis of erythrocytosis, provide new opportunities to advance the genetic investigation of patients with erythrocytosis.

Our approach allowed the creation of a next-generation sequencing targeted gene panel with the capacity to process a large group of samples and simultaneously examine a large number of genes across several biological pathways in a systematic and efficient manner.

Our panel exhibited high performance and reliability. It produced high quality sequencing data with good target coverage. It accurately detected variants in ten positive controls. It was excellent at reliably calling SNV, with all SNV identified subsequently validated in all samples by Sanger sequencing. Nevertheless, a few limitations are recognized and should be taken into account when considering its future applications. For example, a few amplicons – including a region on VHL exon 1 – showed complete failure across samples and thus potential variants within them would not be detected. Furthermore, there were some false positive INDEL, as previously reported by other Ion Torrent sequencing users.26–28 These could be addressed by re-designing primers covering that particular VHL genomic region, optimizing the variant calling bioinformatics work-flow and employing recently proposed strategies to increase the accuracy of INDEL detection.26,28 Another limitation of the panel – related to the nature of its technology – is that it can only identify SNV and short INDEL but not other structural variants such as large INDEL or copy number variations. Also, variant detection in genes with high sequence similarity such as HBA1 and HBA2 is challenging and caution is needed for variant calling.

Currently, the clinical consensus for investigating erythrocytosis involves: establishing the diagnosis of absolute erythrocytosis, excluding systemic causes (e.g. hypoxic lung diseases or tumors) and then proceeding to focused genetic testing based on algorithms that attempt to predict the type of mutation that might be present. The procedures employed at different Centers vary (Online Supplementary Figure S1), but as a general rule if the patient’s erythropoietin level is low, variants in genes involved in erythropoiesis (EPOR, JAK2) are screened for. If the patient’s erythropoietin level is high or normal, the P50 (partial pressure of oxygen at which 50% of hemoglobin is saturated with oxygen) is calculated and if low, hemoglobin electrophoresis is performed and/or variants in oxygen-delivery pathways (globin genes, BPGM) are screened for; if P50 is normal or not available, variants in the oxygen-sensing HIF pathway (VHL, EPAS1, EGLN1) are screened for.2,29,30

Using our gene panel we were able to provide definitive genetic diagnoses in nine patients whose mutations had been previously missed. For example, a variant in EPAS1, p.G537R – a well-described gain-of-function mutation found in erythrocytosis patients31,32 – was detected. This was previously missed because the patient was not screened for EPAS1 variants, owing to the fact that the erythropoietin level was not high enough (and investigations were thus directed to a different branch of the diagnostic algorithm). Similarly, we identified a homozygous VHL variant (p.H191D) known to cause erythrocytosis.33 Interestingly, we found four variants in the HBB gene, all relating to high-affinity hemoglobinopathies associated with erythrocytosis: HBB p.H147P (Hb York), HBB p.H144Q (Hb Little Rock), HBB p.V110M (Hb San Diego) and HBB p.E102D (Hb Potomac).34–38 These were missed previously, either because conventional screening with hemoglobin electrophoresis can miss hemoglobinopathies38 or because of difficulties in obtaining optimal fresh venous blood samples for P50 measurements in all patients. In addition, we identified a heterozygous variant in JAK2 (p.K539L) and two in SH2B3 (p.E208Q and p.E400K), all known to associate with erythrocytosis.5,39,40 The patient with variant JAK2 p.K439L, originally classified as having idiopathic erythrocytosis as the conventional criteria for polycythemia vera, including JAK2 p.V617F screening, were not met, should now be considered as having polycythemia vera with a JAK2 exon 12 mutation. As highlighted in previous studies,5,6 the clinical picture of this subtype of polycythemia vera is indistinguishable from that of idiopathic erythrocytosis. This emphasizes that JAK2 exon 12 mutations should be actively screened for in patients with idiopathic erythrocytosis. Furthermore, the finding of SH2B3 variants highlights that this gene should also be surveyed, which is currently not done routinely. The erythrocytosis gene panel can successfully do both. Thus, we demonstrated that the panel allows reliable detection of known erythrocytosis-causing mutations, avoiding pitfalls that may occur when following existing algorithms.

In this study, four out of the 125 patients were heterozygous for VHL p.R200W. In the homozygous state, this variant causes Chuvash polycythemia.41,42 Congenital erythrocytosis also occurs in patients who are compound heterozygotes,43–45 but heterozygous carriers are usually unaffected. Nevertheless, VHL p.R200W heterozygous mutations feature significantly more frequently in erythrocytosis databases3 than in general populations,46 suggesting a causal role for this mutation. For one of the four patients here, the variant was newly identified. For the other three, previous genetic tests had also identified it. Thus, within this study we aimed to detect additional genetic changes that might explain the patients’ clinical phenotype. We did not detect any other variants within VHL, except for two single nucleotide polymorphisms in the 3′ untranslated region with high minor allelic frequencies (≥0.35 in dbSNP142). Alternatively, the co-occurrence of this heterozygous variant with another heterozygous variant in a separate gene of the same biological pathway could act in synergy to produce disease. We did not obtain conclusive evidence for this in our four patients: two did not have an additional variant and in the other two, the VHL p.R200W co-occurred with heterozygous missense variants classified as polymorphisms (Online Supplementary Table S6), i.e. with EGLN1 p.A157Q and EGLN2 p.T405M in one patient and with EPOR p.G46E and EGLN2 p.S58L in the other.

As this research panel provides full-gene sequencing instead of specific mutation screening, it allowed the detection of 22 novel variants. For some of these, there is a strong likelihood of causality, based on the location of the mutated residues on functional or regulatory domains and the expected disturbance they would cause to protein structure and function (as explained in the Results section for EGLN1 p.L279P, EPAS1 p.Y532H and VHL p.E52X), and based on genetic evidence of familial segregation (e.g. EPAS1 p.Y532H and EPO p.P7fs, which are dominantly inherited). For other variants, mostly found in known erythrocytosis-associated genes, there is strong consensus in the in silico prediction of deleterious effect, whereas for some there is less evidence of functional candidacy (Table 3). While the functional significance of newly identified variants cannot currently be confirmed – and indeed clinical causation cannot be concluded – future functional studies and screening of larger cohorts of erythrocytosis patients are needed to replicate the findings and to provide further evidence of causality.

In this study we explored some genes, not previously associated with erythrocytosis, because of their involvement in the HIF pathway or their discovery through WGS500 as potential candidates. Candidate variants were found in EGLN2 and HIF3A but not in key HIF pathway genes such as EGLN3, HIF1AN and importantly, HIF1A. This is consistent with existing literature in which variation in EPAS1, but not HIF1A, is associated with erythrocytosis. The precise WGS-identified variants in EPO, GFI1B, KDM6A and BHLHE41 were not found in this cohort of 125 cases, suggesting that larger cohorts of patients need to be sequenced before the significance of variation in these genes can be properly interpreted. However, in the case of EPO, other variants were identified suggesting that EPO should be actively surveyed as an erythrocytosis-associated candidate gene. Accrued use of the panel in further patients will provide insight into which novel genes play a role in erythrocytosis and will allow refinement of any future diagnostic panels.

One limitation of our study is the lack of DNA from a source other than blood to determine germline or somatic status. This would only be a concern for JAK2 and SH2B3, in which somatic mutations are associated with polycythemia vera and myeloproliferative diseases. When variants in JAK2 and SH2B3 are found by the panel, further studies in skin/nail DNA are probably warranted. For all other genes, variants detected in blood with the panel are most likely germline. While somatic mutations in EPAS1 can be found in tumors of patients with erythrocytosis,8 these would not be detectable in blood with our methodology.

Thus, despite the few technical limitations described, the erythrocytosis gene panel is useful in the genetic investigation of patients with erythrocytosis from a research perspective. Furthermore, following appropriate optimization and refinement, gene panel sequencing has the potential to improve the diagnostic work-up of erythrocytosis patients in clinical practice. A point to note is that the gene panel in our study was applied to a highly-selected group of patients who had undergone significant clinical and genetic “filtering” (Online Supplementary Figure S1) before inclusion in the study. Despite this, candidate variants – known causal and novel – were detected in 29% of patients. Thus, we propose that gene panel sequencing should be applied directly to “erythrocytosis cases where a genetic cause is suspected”, i.e. after clinical exclusion of acquired systemic causes and at the point where genetic testing is considered (Figure 4). This would undoubtedly increase the diagnostic yield and, because genetic testing would be conducted in an unbiased manner, it would improve diagnostic accuracy by decreasing the number of missed diagnoses. In conclusion, we hope to demonstrate the immediate utility of a targeted gene panel in the investigation of erythrocytosis at a time when next-generation sequencing is revolutionizing clinical medicine.

Proposed use of a gene panel in the investigation of erythrocytosis. A gene panel would make genetic testing more efficient and streamlined. It enables the simultaneous survey of the full length of 21 candidate genes, in a systematic and unbiased manner, allowing the detection of known causal variants as well as novel variants in known and novel genes.

Acknowledgments

The authors would like to thank the patients and their families who consented to this study, Melissa M. Pentony for the support provided with the management of Ion Torrent data and the Core and administration services at the Wellcome Trust Centre for Human Genetics, which are funded by the Wellcome Trust Core Award [090532/Z/09/Z]. This work was supported by the National Institute for Health Research (NIHR) Biomedical Research Centre Oxford with funding from the Department of Health’s NIHR Biomedical Research Centre’s funding scheme. The WGS500 study was funded by the Wellcome Trust Core Award (090532/Z/09/Z) and a Medical Research Council Hub grant (G0900747 91070) to Peter Donnelly (director of the Wellcome Trust Centre of Human Genetics), the NIHR Biomedical Research Centre Oxford, the UK Department of Health’s NIHR Biomedical Research Centres funding scheme and Illumina. NP is funded via a NIHR Clinical Lectureship. PJR is a member of the Ludwig Institute for Cancer Research.