Abstract

The molecular diagnosis of retinal dystrophies is difficult because of the very important number of genes implicated and is rarely helped by genotype-phenotype correlations. This prompted us to develop IROme, a custom designed in solution-based targeted exon capture assay (SeqCap EZ Choice library, Roche NimbleGen) for 60 retinitis pigmentosa-linked genes and three candidate genes (942 exons). Pyrosequencing was performed on a Roche 454 GS Junior benchtop high-throughput sequencing platform. In total, 23 patients affected by retinitis pigmentosa were analyzed. Per patient, 39.6 Mb were generated, and 1111 sequence variants were detected on average, at a median coverage of 17-fold. After data filtering and sequence variant prioritization, disease-causing mutations were identified in ABCA4, CNGB1, GUCY2D, PROM1, PRPF8, PRPF31, PRPH2, RHO, RP2, and TULP1 for twelve patients (55%), ten mutations having never been reported previously. Potential mutations were identified in 5 additional patients, and in only 6 patients no molecular diagnosis could be established (26%). In conclusion, targeted exon capture and next-generation sequencing are a valuable and efficient approach to identify disease-causing sequence variants in retinal dystrophies.

1. Introduction

Retinitis pigmentosa (RP) (MIM number 268000) is a group of genetically highly heterogeneous-inherited retinal dystrophies [1]. Typically, night blindness starts during adolescence, and patients progressively loose the rod photoreceptor-mediated peripheral vision. At later stages, the cone photoreceptors also become affected, constricting vision over time to the most central fovea and eventually resulting in complete blindness. To date, more than fifty genes have been linked to nonsyndromic RP (RetNet; http://www.sph.uth.tmc.edu/RetNet/). Inheritance can be autosomal dominant (AD), autosomal recessive (AR) or X-linked, and, rarely, mitochondrial or digenic [2]. Sporadic or simplex cases account for about 30% [3].

The molecular diagnosis of RP is difficult because (i) there is no genotype/phenotype correlation in a vast majority of patients, (ii) a high intra- and interfamilial variability of clinical phenotypes is observed in patients carrying the same causative mutation, (iii) different mutations in a same disease-linked gene cause highly variable clinical phenotypes if not clinically distinct retinal degenerations, and (iv) overlapping clinical phenotypes and disease-linked genes exist with additional retinal degenerations, that is, early-onset Leber congenital amaurosis (LCA), congenital stationary night blindness (CSNB), cone-rod dystrophies (CRD), enhanced S-cone syndrome (ESCS), or syndromic RP in Bardet-Biedl and Usher syndrome [2]. However, identification of RP-linked sequence variants is important for genetic counseling and patient management.

Similar to other Mendelian disorders, mutations in RP patients were identified until recently by linkage mapping and subsequent Sanger sequencing of candidate genes [4]. For molecular diagnosis, the validated RP mutations could be detected by arrayed primer extension (APEX) chip technology [5]. However, a low success rate in detecting mutations by APEX was inherent to the genetic heterogeneity of RP patients, and in a cohort of 272 Spanish families affected by ARRP, causative mutations were identified in only 11% of them [6].

The development of next-generation sequencing (NGS) tools in recent years has allowed the production of an enormous volume of sequencing data at low costs [7]. Whole genome sequencing and downstream data handling remains cost and labor intensive, limiting its use in routine mutation detection [8]. Targeted capture of the about 30 Mb of protein-coding regions in the human genome, the so-called exome, reduced the sequencing and data handling effort by a factor of 100 and allowed the identification of mutations in unrelated patients affected by the same syndrome [9]. Exome sequencing has since been widely used as a tool for Mendelian disease gene discovery [10, 11]. Initially array-based, targeted sequence capture has become easy-to-use, thanks to the development of in-solution capture methods [12]. Finally, benchtop high-throughput sequencers made exome sequencing available to small-size diagnostic laboratories [13].

These technological advances prompted us to develop a custom designed in solution-based targeted capture assay, called IROme, for the detection of mutations located in the exons, including complete 3′-untranslated regions (UTR), intron-exon boundaries and potential promoter, and 5′-UTR regions of 63 genes on a 454 GS Junior sequencing platform.

2. Material and Methods

2.1. Patients and DNA Samples

These studies were approved by the Swiss Federal Department of Health (authorization number 035.0003-48) and followed the principles of the Declaration of Helsinki. The 23 patients analyzed in this study were of Swiss, Algerian, and Tunisian origin. Blood samples were collected after informed consent. Genomic DNA was extracted from peripheral blood using a Nucleon BACC2 genomic DNA extraction kit (GE Healthcare, Glattbrugg, Switzerland). Four patients had been previously analyzed at Asper Biotech for known RP-linked mutations by APEX technology [5].

Exons of targeted genes were identified in the reference human genome version hg19 (http://www.ensembl.org/) (Table 1). For each exon 50 bp were added in both 5′ and 3′ of the exon, including the complete 3′UTR for each gene. Potential alternative transcripts were also considered in the design. To include potential proximal promoters, an additional 1000 bp in 5′ of the first exon of each gene, containing the complete 5′-UTR, were added. The resulting custom-designed SeqCap EZ Choice library (NimbleGen, Roche) was called IROme, version 1.

Table 1: List of genes enriched by targeted sequence capture (IROme).

2.3. GS Junior Sequencing

The workflow for GS Junior sequencing is summarized in Figure 1. DNA concentrations were measured on a NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE). 500 ng of gDNA were fragmented by nebulization, and size selected by Agencourt AMPure XP beads (Beckman-Coulter, Beverly, MA) to obtain fragments between 500 and 1200 bp. Adaptors provided in the GS Titanium Rapid Library Preparation Kit (Roche, Basel, Switzerland) were ligated to the fragmented DNA and then quantified by fluorometry (QuantiFluor, Promega, Madison, WI). This library was amplified by ligation-mediated (LM)-PCR using specific 454 primers. Then, 1 μg of the PCR amplification product was dried down with COT-DNA (Roche) and 454-Hybridization Enhancing Primer in a Speedvac. The pellet was resuspended in NimbleGen’s hybridization buffer and hybridized to the custom-designed SeqCap EZ Choice library (NimbleGen, Roche), called IROme v1, for 70 h at 47°C in a thermocycler. The captured DNA was bound to Streptavidin M-270 Beads (Invitrogen Dynal, Oslo, Norway) for 45 min at 47°C and, using a magnet support, washed with the 4 different NimbleGen buffers provided according to the manufacturer’s instructions. The captured DNA-Beads were amplified by LM-PCR using the same specific 454 primers as before. Captured and noncaptured DNA was subjected to quantitative PCR on a Lightcycler480II (Roche, Basel, Switzerland) to measure the relative fold enrichment of the targeted sequences. Postcapture samples with an enrichment higher than 200-fold were further processed. According to the 454 GS Junior protocol (Roche), an emulsion PCR was done on 2 molecules per beads. After PCR, the beads were collected, washed, and bound to the Enrichment Beads. The enriched DNA was then eluted and quantified with the provided bead counter. Sequencing was performed following the 454 GS Junior protocol. Briefly, 500′000 enriched DNA beads were mixed with Packing Beads. Then, the PicoTiterPlate (PTP) was sequentially loaded with Prelayer Beads, DNA-Packing Beads, Postlayer Beads, and PPiase Beads. Finally, the PTP was mounted in the 454 GS Junior Sequencer, and the program was run in full processing for shotgun sequencing.

2.4. Data Analysis

The workflow for data analysis and data validation is summarized in Figure 2. Sequencing data (.sff file) were analyzed with Roche 454 Reference Mapper program. Reference text (ref.txt) for gene annotations and the snp131 version of the single nucleotide polymorphism database (snp131.txt) were downloaded from the Golden Path database ( http://www.genome.ucsc.edu/). The sequence variants provided by the 454HCDiffs.txt file were filtered for known SNPs (http://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/), type of amino acid changes (http://genetics.bwh.harvard.edu/pph2/), and repetitive sequences. An additional in-house developed program was used to check the remaining SNPs against reference sequences obtained in Ensembl. Sequence variants were further prioritized according to inheritance, if family information was available, and to the percentage of reads containing a given sequence variant (threshold at 20%). To analyze the coverage, scripts were written to extract global coverage data from the 454AlignmentInfo.tsv file (unique depth, column 5) and the quality of coverage at each targeted nucleotide (column 4). Part of the sequencing data was analyzed by Sequence Pilot version 3.5 (JSI Medicals, Kippenheim, Germany).

Figure 2: Workflow of data analysis and filtering. The sff (sequence file format) files generated by 454 Roche GS Junior sequencing were imported either into Reference Mapper or Sequence Pilot software. The coding sequence variants were selected from the 454_HCDiffs.txt files that contained all sequence variants. During filtering, coding sequence variants reported in dbSNP were removed, and missense and nonsense mutations kept. The remaining coding sequence variants were prioritized according to known reported mutations, the mode of inheritance, the percentage of sequence reads reporting the variant (threshold of 20%), and the predicted effect on the protein (PolyPhen score).

3. Results and Discussion

3.1. IROme: Design and Validation of the Assay

The vast genetic heterogeneity of RP prompted us to develop a custom-designed hybridization-based targeted exon capture assay, called IROme. Enrichment was targeted towards a total of 63 genes (942 exons), of which 60 genes were linked to RP, LCA, and related retinal dystrophies (Table 1). The exon ORF15 of RPGR was not included in the assay because of the presence of repetitive sequences. Two RP- or LCA-linked genes, IDH3B and RD3, had been reported only in a single family so far and were not included in this version of IROme. Conversely, two candidate genes that were linked to retinal degeneration in mice, but not humans, were added to the assay (TUB and LPCAT1). A third candidate gene located on chromosome X, CNGA2, was included because of its homology to CNGA1. The total of targeted regions spans 394′758 bp.

Of note, after the design of IROme was completed, TTC8 (BBS8/RP51), C8ORF37, and MAK were linked to RP, and KCNJ13 and NMNAT1 to LCA. These latter genes, as well as IDH3B and RD3, will be included in a future version of IROme.

Patients 1–4 had previously been investigated by APEX technology for known RP-linked mutations [5]. All nucleotides tested by APEX were correctly detected by IROme, with a 98.9% accuracy of the sequence reads for nucleotides at a homozygous state (Table 2). A p.USH2A-V2562A mutation had been detected by APEX in patient 2 in a heterozygous state, and this was correctly validated by IROme (46.8% of the sequence reads at 47-fold coverage).

Table 2: Validation of IROme by APEX.

As an additional control, the IROme assay was tested on genomic DNA of a previously described family of Algerian origin, affected by LCA or early onset retinal degeneration [14]. The causative 6-base in-frame duplication c. TULP1-1593_1598dupTTCGCC was readily detected in exon 15 (Table 3, patient 5).

Table 3: Synopsis of molecular diagnostic on RP patients by IROme.

3.2. IROme: Variant Detection, Coverage, and Data Filtering

A total of 23 RP patients were analyzed by IROme (Table 3). Pyrosequencing generated an average of Mb per patient, with an average read length of bp. These long read lengths are comparable to published analyses, where the Roche 454 GS Junior generated the longest read lengths, in comparison to the other benchtop high-throughput sequencing platforms, MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) [13].

On average per patient, sequence variants were found (range: 736–1′826). Among these, were located in coding sequences, and a further were changing the amino acid sequence. By considering all patients, the median coverage was 17-fold, with a maximal 112-fold coverage in one exon of patient 16 (Figure 3). No coverage was observed for four exons (0.3%): exons 1 of RP9, IMPDH1, and LPCAT1 and an alternative exon 2 of CNGA2. These exons contained GC-rich and/or repetitive sequences impeding efficient probe design and targeting [15]. Another 15 exons were not covered in all patients (1.6%). Because these exons were not restricted to the 5′ regions, absence of coverage was attributed to technical limitations or, as observed for patient 9, to a deletion (see below).

Figure 3: Fold coverage of targeted sequences. For each patient the unique depth data provided by column 5 of the 454_AlignmentInfo.tsv file was used to estimate the coverage per targeted bp. The onefold coverage data corresponding to reference genome sequences used for alignment purposes, but not targeted by IROme, were removed. The coverage data is represented as cumulative percentage; that is, indicating what percentage of targeted bp has a minimal coverage of -fold ( axis represents the fold coverage). The average coverage for all patients is represented as a black dashed line, and the median coverage for all patients is 17-fold.

For patients 20 and 21, two potential heterozygote mutations had been detected at 22.6% (53-fold coverage) and 21.3% (61-fold coverage), respectively. However, these two sequence variants could not be validated by Sanger sequencing. For further patient analyses, a more stringent threshold up to 35% of sequence reads might be used for prioritization of sequence variants. Alternatively, a dynamic threshold could be implemented, starting at a high stringency and going down until one or two mutations are identified.

In conclusion, the design of IROme resulted in an over 98% coverage of the targeted exons. The variant detection workflow could be improved by further increasing the quality of the sequencing data, that is, by using a benchtop sequencer less prone to homopolymer-associated insertion/deletion errors (e.g., MiSeq, Illumina) [13] and high-fidelity DNA polymerases [16].

3.3. IROme: Molecular Diagnosis on RP Patients

IROme analysis yielded in definite diagnosis for 55% of the RP patients, that is, 12 out of 23 patients (Patients 4, 5, 8, 9, 10, 11, 12, 13, 16, 17, 19, and 23). This was in line with the approximately 60% success rate reported for exome capture strategies to identify Mendelian disease genes [4], but represented a 5-fold increase in mutation detection as compared to the APEX assay [6]. A solution-based targeted exon capture assay similar to IROme had also identified disease-causing mutations in 11 out of 17 families affected by various retinal degenerations (65%) [17]. In contrast, in a cohort of 100 RP patients, array-based targeted exon capture resulted in the identification of pathogenic mutations in 36 individuals (36%) [15]. Amplicon-based approaches identified potential mutations in 24% of patients affected by retinal degenerations (5/21) [18], in 79% of ADRP patients (15/19) [19], and 24% of LCA patients (4/17) [20].

In addition to the control (patient 5), only the p.PROM1-R373C mutation identified in patient 10 had been previously described [21], further underscoring the importance of screening RP-linked genes for the presence of new mutations.

The workflow for variant detection was not immediately successful for two patients. For patient 9, a deletion of exons 45–47 in ABCA4 was only found by analyzing the coverage data. For patient 16, the 33 bp insertion in PRPF31 was detected by Sequence Pilot, but not Reference Mapper software.

Potential mutations were found in three patients (13%). Patient 1 inherited from her healthy mother a heterozygous p.C2ORF71-R571delRTVVPP mutation and from her healthy father a heterozygous p.FSCN2-P231S mutation. Digenic RP has been linked so far to heterozygous PRPH2 and ROM1 mutations [2], and further analyses will be necessary to validate this molecular diagnostic. Patient 2 and 20 had, respectively, two and one potential mutation, but no family members were available to confirm the result.

Results were questionable for two additional patients. Patient 6 carried a p.RHO-R252P mutation that had been previously reported [22]. However, unaffected family members were not available to confirm this dominant mutation. Also, a heterozygous p.CRX-Q105X sequence variant was detected in patient 14, but his healthy mother was also carrying it.

Finally, no molecular diagnostic could be established for six patients (26%): in patients 18 and 21 no potential mutations were found by IROme analysis, in patients 7 and 15 the potential mutation did not segregate with disease in the family, and in patients 3 and 22 heterozygous mutations were found in genes only reported for recessive inheritance (CLRN1, EYS).

Of note, all these patients carry novel sequence variants in noncoding regions. To prioritize for potential disease-causing sequence variants in these regions, systematic annotation should not only cover splicing sites, 5′- and 3′-UTRs, but also implement detailed information about transcription factor binding sites and regulatory elements located in the potential proximal promoter regions. Promoter sequence variants could then be tested by reporter transactivation assays (e.g., luciferase reporter assays), but this time-consuming approach cannot be implemented in a routine molecular diagnostic lab.

4. Conclusions

The custom designed in solution-based targeted exon capture assay IROme efficiently detected disease-causing mutations in 55% of RP patients (12/23). A 99.7% coverage of the targeted regions was obtained. The first translated exon often contains sequences with a high GC content in its 5′-UTR that hinders an efficient capture [23]. Remarkably, more than 95% of exons 1 (60/63) were successfully enriched by IROme. In comparison, a pilot study carried out in our laboratory on 25 patients using whole exome sequencing (SureSelect, Agilent) resulted in no coverage of promoter regions, highly variable coverage of 3′-UTRs, and several genes had their first translated exon very poorly covered. For instance, the first exons of the following RP-linked genes could not be correctly analyzed: C2ORF71, CA4, CABP4, CERKL, CNGA1, FAM161A, FSCN2, GUCY2D, IMPDH1, LPCAT1, MERTK, RDH12, RP9, and RPGR (D. F. Schorderet, unpublished results). It is tempting to speculate that the additional sequences upstream of exon 1 included in IROme further enhanced the performance of the NimbleGen exome capture technology, that reportedly has more specific targeting and a higher percentage of on-target reads than competing products [23, 24]. However, because the costs for whole exome sequencing have dramatically decreased to about 1000 $ per patient, this method may in the future replace target enrichment and resequencing, providing that a new line of “whole exome” kits covering effectively all exons, including the first one, of all genes, will become commercially available [24].

Meanwhile, custom-designed target enrichment and subsequent next-generation sequencing are a cost-efficient approach for the molecular diagnosis of retinal dystrophies, also with respect to the relative ease of data handling and analysis [25]. Finally, the median global coverage of 17-fold observed with the IROme assay also indicated the possibility to include additional retinal degeneration-linked genes, newly discovered ones or candidate genes.

Acknowledgments

The authors thank Etienne Bagnoud for technical support in informatics. This paper is supported by Swiss National Science Foundation Grants 31003A-122269 (to P. Escher and D. F. Schorderet) and 31003A_138492 (to P. Escher).