A significant number of rare Mendelian diseases, especially those involving neuronal phenotypes, remain undiagnosed or misdiagnosed. Accurate genetic diagnosis may potentially benefit disease management or treatment. We have developed multiple collaborations with researchers on rare undiagnosed diseases to apply exome/genome sequencing for disease gene finding.

Genome/exome sequencing can quickly identify disease genes for known diseases, or identify novel candidate genes for novel syndromes. Several examples are given below:

1. IHA (known disease)

In collaboration with Dr. Gholson Lyon, we sequenced a pedigree segregating both a complex disease (ADHD) and a Mendelian disease (hemolytic anemia). While we did identify some rare variants that might predispose to ADHD, we have not yet proven the causality for any of them. However, over the course of the study, one subject was discovered to have idiopathic hemolytic anemia (IHA), which was suspected to be genetic in origin. Analysis of this subject’s exome readily identified two rare non-synonymous mutations in PKLR gene as the most likely cause of the IHA. We further confirmed the deficiency by functional biochemical testing, consistent with a diagnosis of red blood cell pyruvate kinase deficiency. For more details, see http://www.discoverymedicine.com/Gholson-J-Lyon/2011/07/15/exome-sequencing-and-unrelated-findings-in-the-context-of-complex-disease-research-ethical-and-clinical-implications/.

2. MPS3B (known disease)

In collaboration with Dr. Shi, a former member of the lab, we encountered a case in China where two siblings both began to develop idiopathic progressive cognitive decline starting from age six, and were suspected to have an undiagnosed neurological disease. Exome sequencing identified NAGLU as the most likely candidate gene with compound heterozygous mutations. Sanger sequencing confirmed the recessive patterns of inheritance, leading to a genetic diagnosis of Sanfilippo syndrome (mucopolysaccharidosis IIIB). Biochemical tests confirmed the complete loss of activity of alpha-N-acetylglucosaminidase (encoded by NAGLU) in blood, as well as significantly elevated dermatan sulfate and heparan sulfate in urine. Structure modeling revealed the mechanism on how the two variants affect protein structural stability. For more details, see https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-014-0066-9.

3. Ogden syndrome (novel syndrome)

One of the early examples to identify genes for "novel syndrome" was Ogden syndrome, a previously unreported infantile lethal disorder, involving a mutation in NAA10. Together with Dr. Gholson, we identified the disease gene by chromosome exon X capture and next-generation sequencing. For more details, see https://www.cell.com/ajhg/fulltext/S0002-9297(11)00210-2.

4. RBCK1 deficiency syndrome (novel syndrome)

A more recent example is a novel genetic disease which we refer to as "Bookman syndrome", a pediatric onset disease with neuromuscular and cardiac involvement and with clinical features similar to Glycogen Storage Disease Type IV. Although exome sequencing failed to identify the causes for the disease due to technical reasons, we applied genome and transcriptome sequencing and identified a disease-contributory mutation in RBCK1, which was further replicated by another group. This disease is now referred to as RBCK1 deficiency syndrome nowadays. This is a study that started while I was a postdoc at CAG at CHOP, and it took a few years to finally solve it, which is typical for novel syndromes. For more details, see https://genomemedicine.biomedcentral.com/articles/10.1186/gm471.

5. TAF1 deficiency syndrome (novel syndrome)

In collaboration with Dr. Lyon, we analyzed an extended family with three generations, and sequenced them by Illumina WGS and Complete Genomics WGS. Two affecteds in the third generation are both affected with severe intellectual disability, autistic behaviors, ADHD, and very distinctive facial features. Family-based analysis pinpointed TAF1 as the most likely candidate gene. Interestingly, the X-linked non-synonymous mutation in TAF1 was detected as a de novo mutation arising in the mother of the two affecteds. For more details, see https://www.cell.com/ajhg/fulltext/S0002-9297(15)00450-4.

6. KBG syndrome (known disease)

KBG syndrome is a rare autosomal dominant genetic condition characterized by neurological involvement and distinct facial, hand, and skeletal features. More than 70 cases have been reported; however, it is likely that KBG syndrome is underdiagnosed because of lack of comprehensive characterization of the heterogeneous phenotypic features. We describe the clinical manifestations in a male currently 13 years of age, who exhibited symptoms including epilepsy, severe developmental delay, distinct facial features, and hand anomalies, without a positive genetic diagnosis. Subsequent exome sequencing identified a novel de novo heterozygous single base pair duplication (c.6015dupA) in ANKRD11, which was validated by Sanger sequencing. This single-nucleotide duplication is predicted to lead to a premature stop codon and loss of function in ANKRD11, thereby implicating it as contributing to the proband's symptoms and yielding a molecular diagnosis of KBG syndrome. Before molecular diagnosis, this syndrome was not recognized in the proband, as several key features of the disorder were mild and were not recognized by clinicians, further supporting the concept of variable expressivity in many disorders. Although a diagnosis of cerebral folate deficiency has also been given, its significance for the proband's condition remains uncertain. For more details, see http://molecularcasestudies.cshlp.org/content/2/6/a001131.

The locus for familial cortical myoclonic tremor with epilepsy (FCMTE) has long been mapped to 8q24 in linkage studies, but the causative mutations remain unclear. Recently, expansions of intronic TTTCA and TTTTA repeat motifs within SAMD12 were found to be involved in the pathogenesis of FCMTE in Japanese pedigrees. We aim to identify the causative mutations of FCMTE in Chinese pedigrees.We performed genetic linkage analysis by microsatellite markers in a five-generation Chinese pedigree with 55 members. We also used array-comparative genomic hybridisation (CGH) and next-generation sequencing (NGS) technologies (whole-exome sequencing, capture region deep sequencing and whole-genome sequencing) to identify the causative mutations in the disease locus. Recently, we used low-coverage (~10×) long-read genome sequencing (LRS) on the PacBio Sequel and Oxford Nanopore platforms to identify the causative mutations, and used repeat-primed PCR for validation of the repeat expansions.Linkage analysis mapped the disease locus to 8q23.3-24.23. Array-CGH and NGS failed to identify causative mutations in this locus. LRS identified the intronic TTTCA and TTTTA repeat expansions in SAMD12 as the causative mutations, thus corroborating the recently published results in Japanese pedigrees. We identified the pentanucleotide repeat expansion in SAMD12 as the causative mutation in Chinese FCMTE pedigrees. Our study also suggested that LRS is an effective tool for molecular diagnosis of genetic disorders, especially for neurological diseases that cannot be positively diagnosed by conventional clinical microarray and NGS technologies. For more details, see https://jmg.bmj.com/content/early/2018/09/07/jmedgenet-2018-105484.long.

8. Glycogen storage disease type Ia (GSD-Ia) (known disease)

For a proportion of individuals judged clinically to have a recessive Mendelian disease, only one heterozygous pathogenic variant can be found from clinical whole exome sequencing (WES), posing a challenge to genetic diagnosis and genetic counseling. One possible reason is the limited ability to detect disease causal structural variants (SVs) from short reads sequencing technologies. Long reads sequencing can produce longer reads (typically 1000 bp or longer), therefore offering greatly improved ability to detect SVs that may be missed by short-read sequencing. Here we describe a case study, where WES identified only one heterozygous pathogenic variant for an individual suspected to have glycogen storage disease type Ia (GSD-Ia), which is an autosomal recessive disease caused by bi-allelic mutations in the G6PC gene. Through Nanopore long-read whole-genome sequencing, we identified a 7.1 kb deletion covering two exons on the other allele, suggesting that complex structural variants (SVs) may explain a fraction of cases when the second pathogenic allele is missing from WES on recessive diseases. Both breakpoints of the deletion are within Alu elements, and we designed Sanger sequencing and quantitative PCR assays based on the breakpoints for preimplantation genetic diagnosis (PGD) for the family planning on another child. Four embryos were obtained after in vitro fertilization (IVF), and an embryo without deletion in G6PC was transplanted after PGD and was confirmed by prenatal diagnosis, postnatal diagnosis, and subsequent lack of disease symptoms after birth. In summary, we present one of the first examples of using long-read sequencing to identify causal yet complex SVs in exome-negative patients, which subsequently enabled successful personalized PGD. For more details, see https://hereditasjournal.biomedcentral.com/articles/10.1186/s41065-018-0069-1.

9. Detection of balanced translocations by long-read sequencing

Structural variants (SVs) in genomes, including translocations, inversions, insertions, deletions and duplications, remain difficult to be detected reliably by traditional genomic technologies. In particular, balanced translocations and inversions cannot be detected by microarrays since they do not alter chromosome copy numbers; they cannot be reliably detected by short-read sequencing either, since many breakpoints are located within repetitive regions of the genome that are unmappable by short reads. However, the detection and the precise localization of breakpoints at the nucleotide level are important to study the genetic causes in patients carrying balanced translocations or inversions. Long-read sequencing techniques, such as the Oxford Nanopore Technology (ONT), may detect these SVs in a more direct, efficient and accurate manner. In this study, we applied whole-genome long-read sequencing on the Oxford Nanopore GridION sequencer to detect the breakpoints from 6 carriers of balanced translocations and one carrier of inversion, where SVs had initially been detected by karyotyping at the chromosome level. The results showed that all the balanced translocations were detected with ~10X coverage and were consistent with the karyotyping results. PCR and Sanger sequencing confirmed 8 of the 14 breakpoints to single base resolution, yet other breakpoints cannot be refined to single-base due to their localization at highly repetitive regions or pericentromeric regions, or due to the possible presence of local deletions/duplications. Our results indicate that low-coverage whole-genome sequencing is an ideal tool for the precise localization of most translocation breakpoints and may provide haplotype information on the breakpoint-linked SNPs, which may be widely applied in SV detection, therapeutic monitoring, assisted reproduction technology (ART) and preimplantation genetic diagnosis (PGD). For more details, see https://www.biorxiv.org/content/early/2018/09/18/419531.

These examples clearly illustrated the power of genome seuqencing in uncovering genetic basis for rare undiagnosed diseases.