Genomic mutation and genetic disease

We develop and apply novel experimental and analytical methods using the latest genomic technologies to characterise
genetic variation in families and populations, and discover disease-causing variants, with a particular focus on
childhood developmental diseases.

We are using our understanding of genetic variation in populations to improve clinical diagnosis of genetic diseases.
We are also characterising the fundamental mutational mechanisms that underpin all genetic variation, and their
relative contributions to disease. We collaborate internationally with clinicians and researchers to achieve these
aims, and consider carefully the ethical challenges raised by our research.

Background

One to two percent of all children are born with a developmental disorder, such as a learning disability or a heart
defect, as a result of errors in embryogenesis and early neurodevelopment. These disorders make a major contribution
to paediatric hospital admissions and mortality. Surgical repair of congenital heart defects has decreased mortality
rates considerably, but as a result of passing on faulty genes from one generation to the next, the prevalence of
congenital heart disease is increasing. Accurate diagnosis is essential to inform patient management, prognosis and
medical care, and give families the opportunity for prenatal diagnosis.

Many of these disorders are genetic in origin and the increasing resolution of genomic technologies has enabled
substantial increases in diagnosis rates in recent years. Nevertheless, most children with developmental disorders do
not currently receive a genetic diagnosis.

The advent of cost-efficient genome sequencing technologies raises the prospect of discovering many currently unknown
disease genes and increasing diagnosis rates dramatically. However, our ability to discover millions of genetic
variants in every genome, is running far ahead of our ability to accurately identify individual disease-causing
variants. This interpretation gap is the fundamental challenge in human genetics today.

Research

Over the past five years we have focused on characterising genomic structural variation in human populations and its
role in disease. During the next five years we are expanding our focus to all forms of genetic variation made
accessible by new sequencing technologies, and to translate our better understanding into improved clinical diagnosis
rates. We find that a deep appreciation of population variation is key to interpreting the pathogenicity of variants
seen in patients.

In collaboration with UK and international clinicians, over the next five years, we are investigating the genetic
causes of developmental disorders in over 15,000 children and their families. The scale of this research enables us
to identify genetic variants that can cause a variety of different developmental disorders. In addition to a general
interest in diverse developmental disorders, our group has a specific goal to elucidate the genetic architecture of
congenital heart disease.

Many developmental disorders are caused by one or more of the 50-100 new mutations that arise in each genome every
generation. The mutation rates of the handful of different molecular processes that generate this variation can be
thought of as quantitative traits, like height, that vary among members of the population as a result of genetic and
environmental factors. We also know that mutation rates can vary dramatically on the basis of age and sex. For
example, it has been proposed that, on average, the majority (more than 80 per cent) of new mutations arise in the
sperm of the father, with only a minority (less than 20 per cent) coming from the egg of the mother. Despite the
clinical importance of achieving a fuller understanding of the factors influencing mutation rates, little is known
about how and why mutation rates vary from gamete to gamete and from individual to individual. Through novel
statistical analyses of genome sequences of parents and children we are increasing our comprehension of how these
fundamental mutation processes operate, and how they underpin the variation we see in patients in the clinic and in
the population as a whole.

Resources

CNV Project - aims to characterise structural variation in the human
genome, and integrate this knowledge into disease and population genetic studies

Collaborations

UK10K - aims to uncover the role of rare genetic variants in health and disease
by studying the genetic code of 10,000 people in fine detail

Saeed Al Turki

- PhD Student

I finished my medical degree from King Saud University in Riyadh, Saudi Arabia and worked in the Molecular Pathology at the KAMC hospital for three years as a medical resident. I joined the Wellcome Trust Sanger Institute in 2009 as a registered PhD student at the University of Cambridge.

Research

My PhD project aims to find the genetic causes in Congenital Heart Defects (CHD) using next-generation sequencing. As a side project, I developed a suite of tools for Family-based Exome Variant Analyses (FEVA) to report and prioritise candidate variants and genes in a systematic fashion. These tools have been used in different projects including some of the Rare UK10K studies. I also developed a pipeline to call and annotate de novo mutations which is used partially by the DDD project.

Molecular Medicine Unit, University College London Institute of Child Health, London, UK.

Ciliopathies are genetically heterogeneous disorders characterized by variable expressivity and overlaps between different disease entities. This is exemplified by the short rib-polydactyly syndromes, Jeune, Sensenbrenner, and Mainzer-Saldino chondrodysplasia syndromes. These three syndromes are frequently caused by mutations in intraflagellar transport (IFT) genes affecting the primary cilia, which play a crucial role in skeletal and chondral development. Here, we identified mutations in IFT140, an IFT complex A gene, in five Jeune asphyxiating thoracic dystrophy (JATD) and two Mainzer-Saldino syndrome (MSS) families, by screening a cohort of 66 JATD/MSS patients using whole exome sequencing and targeted resequencing of a customized ciliopathy gene panel. We also found an enrichment of rare IFT140 alleles in JATD compared with nonciliopathy diseases, implying putative modifier effects for certain alleles. IFT140 patients presented with mild chest narrowing, but all had end-stage renal failure under 13 years of age and retinal dystrophy when examined for ocular dysfunction. This is consistent with the severe cystic phenotype of Ift140 conditional knockout mice, and the higher level of Ift140 expression in kidney and retina compared with the skeleton at E15.5 in the mouse. IFT140 is therefore a major cause of cono-renal syndromes (JATD and MSS). The present study strengthens the rationale for IFT140 screening in skeletal ciliopathy spectrum patients that have kidney disease and/or retinal dystrophy.

Centre for Human Genetics, St. George's University of London, Cranmer Terrace, London SW17 0RE, UK.

The neuromuscular junction (NMJ) is a specialized synapse with a complex molecular architecture that provides for reliable transmission between the nerve terminal and muscle fiber. Using linkage analysis and whole-exome sequencing of DNA samples from subjects with distal hereditary motor neuropathy type VII, we identified a mutation in SLC5A7, which encodes the presynaptic choline transporter (CHT), a critical determinant of synaptic acetylcholine synthesis and release at the NMJ. This dominantly segregating SLC5A7 mutation truncates the encoded product just beyond the final transmembrane domain, eliminating cytosolic-C-terminus sequences known to regulate surface transporter trafficking. Choline-transport assays in both transfected cells and monocytes from affected individuals revealed significant reductions in hemicholinium-3-sensitive choline uptake, a finding consistent with a dominant-negative mode of action. The discovery of CHT dysfunction underlying motor neuropathy identifies a biological basis for this group of conditions and widens the spectrum of disorders that derive from impaired NMJ transmission. Our findings compel consideration of mutations in SLC5A7 or its functional partners in relation to unexplained motor neuronopathies.

Early Diagnosis of Werner's Syndrome Using Exome-Wide Sequencing in a Single, Atypical Patient.

Institute of Metabolic Science, University of Cambridge Metabolic Research Laboratories Cambridge, UK.

Genetic diagnosis of inherited metabolic disease is conventionally achieved through syndrome recognition and targeted gene sequencing, but many patients receive no specific diagnosis. Next-generation sequencing allied to capture of expressed sequences from genomic DNA now offers a powerful new diagnostic approach. Barriers to routine diagnostic use include cost, and the complexity of interpreting results arising from simultaneous identification of large numbers of variants. We applied exome-wide sequencing to an individual, 16-year-old daughter of consanguineous parents with a novel syndrome of short stature, severe insulin resistance, ptosis, and microcephaly. Pulldown of expressed sequences from genomic DNA followed by massively parallel sequencing was undertaken. Single nucleotide variants were called using SAMtools prior to filtering based on sequence quality and existence in control genomes and exomes. Of 485 genetic variants predicted to alter protein sequence and absent from control data, 24 were homozygous in the patient. One mutation - the p.Arg732X mutation in the WRN gene - has previously been reported in Werner's syndrome (WS). On re-evaluation of the patient several early features of WS were detected including loss of fat from the extremities and frontal hair thinning. Lymphoblastoid cells from the proband exhibited a defective decatenation checkpoint, consistent with loss of WRN activity. We have thus diagnosed WS some 15 years earlier than average, permitting aggressive prophylactic therapy and screening for WS complications, illustrating the potential of exome-wide sequencing to achieve early diagnosis and change management of rare autosomal recessive disease, even in individual patients of consanguineous parentage with apparently novel syndromes.

Keren Carss

- PhD Student

I obtained a Bachelor of Medical Sciences from the University of Birmingham in 2007, where I specialised in cellular and molecular biology (specifically genetics and reproduction). From 2008-2010 I worked as a research assistant at the Clinical Pharmacology Unit of the University of Cambridge, using sequencing to try to find the genetic cause of Familial Hyperaldosteronism Type II, which causes severe hypertension. I started my PhD at the Wellcome Trust Sanger Institute in October 2010.

Research

I use exome sequencing to identify genes that cause developmental disease, and zebrafish embryos to model those genes. Phenotypes I have studied include microcephalic osteodysplastic primordial dwarfism, congenital muscular dystrophy and prenatal structural abnormalities. I am also interested in scientific publishing, and I am an editor for the Cambridge Student Journal of Genetics, and for BlueSci magazine.

Mutations in several known or putative glycosyltransferases cause glycosylation defects in α-dystroglycan (α-DG), an integral component of the dystrophin glycoprotein complex. The hypoglycosylation reduces the ability of α-DG to bind laminin and other extracellular matrix ligands and is responsible for the pathogenesis of an inherited subset of muscular dystrophies known as the dystroglycanopathies. By exome and Sanger sequencing we identified two individuals affected by a dystroglycanopathy with mutations in β-1,3-N-acetylgalactosaminyltransferase 2 (B3GALNT2). B3GALNT2 transfers N-acetyl galactosamine (GalNAc) in a β-1,3 linkage to N-acetyl glucosamine (GlcNAc). A subsequent study of a separate cohort of individuals identified recessive mutations in four additional cases that were all affected by dystroglycanopathy with structural brain involvement. We show that functional dystroglycan glycosylation was reduced in the fibroblasts and muscle (when available) of these individuals via flow cytometry, immunoblotting, and immunocytochemistry. B3GALNT2 localized to the endoplasmic reticulum, and this localization was perturbed by some of the missense mutations identified. Moreover, knockdown of b3galnt2 in zebrafish recapitulated the human congenital muscular dystrophy phenotype with reduced motility, brain abnormalities, and disordered muscle fibers with evidence of damage to both the myosepta and the sarcolemma. Functional dystroglycan glycosylation was also reduced in the b3galnt2 knockdown zebrafish embryos. Together these results demonstrate a role for B3GALNT2 in the glycosylation of α-DG and show that B3GALNT2 mutations can cause dystroglycanopathy with muscle and brain involvement.

Background: Vomeronasal receptors (VRs), expressed in sensory neurons of the vomeronasal organ, are thought to bind pheromones and mediate innate behaviours. The mouse reference genome has over 360 functional VRs arranged in highly homologous clusters, but the vast majority are of unknown function. Differences in these receptors within and between closely related species of mice are likely to underpin a range of behavioural responses. To investigate these differences, we interrogated the VR gene repertoire from 17 inbred strains of mice using massively parallel sequencing.

Results: Approximately half of the 6222 VR genes that we investigated could be successfully resolved, and those that were unambiguously mapped resulted in an extremely accurate dataset. Collectively VRs have over twice the coding sequence variation of the genome average; but we identify striking non-random distribution of these variants within and between genes, clusters, clades and functional classes of VRs. We show that functional VR gene repertoires differ considerably between different Mus subspecies and species, suggesting these receptors may play a role in mediating behavioural adaptations. Finally, we provide evidence that widely-used, highly inbred laboratory-derived strains have a greatly reduced, but not entirely redundant capacity for differential pheromone-mediated behaviours.

Conclusions: Together our results suggest that the unusually variable VR repertoires of mice have a significant role in encoding differences in olfactory-mediated responses and behaviours. Our dataset has expanded over nine fold the known number of mouse VR alleles, and will enable mechanistic analyses into the genetics of innate behavioural differences in mice.

Further study of chromosome 7p22 to identify the molecular basis of familial hyperaldosteronism type II.

Familial hyperaldosteronism type II (FH-II) is an inherited form of hyperaldosteronism associated with hypertension in most patients. The mutations that cause FH-II are unknown, but linkage analysis has mapped them to chromosome 7p22. As FH-II is clinically indistinguishable from sporadic primary aldosteronism, a common and treatable condition, unravelling the cause of FH-II has important implications for these sporadic cases. To investigate whether FH-II is caused by large deletions or insertions, we examined the virtual karyotype of four pairs of affected and unaffected individuals using high-density bead chips. We also sequenced the coding regions of five 7p22 candidate genes that were prioritized because of their putative role in cell growth. We found no evidence of single-nucleotide polymorphism (SNP) copy number variation between pairs, and from the widest gap on the chip, chromosome 7p22 deletions or insertions exceeding ∼50 kb in these pedigrees can be excluded. We found 15 SNPs (two of which were novel), but none of them were non-synonymous and segregated with the disease in the FH-II pedigrees. We have been able to exclude large genomic deletions or insertions at 7p22 and refine the candidate gene list for this locus, but the mutations causing FH-II remain elusive.

Stephen Clayton

- Senior Bioinformatician

I studied Biochemistry at the University of York and went on to do a MRes in Bioinformatics.

Research

I am developing pipelines and tools for the DDD project.

Sarah Fowler

- Staff Scientist

My undergraduate degree is in Archaeology and Prehistory, after the completion of which I came to the Sanger Centre to work on the C.elegans genome sequencing project as a technical assistant. After 18 months, I left to do an MSc in Biomolecular Archaeology which enabled me to later enroll on a Phd in Biomolecular Sciences at UMIST. I returned to the Wellcome Trust Institute as a Research Associate firstly in Sequencing Research and Development, and then in Human Genetics. In 2007 I began my current role as a Staff Scientist in the Genome Dynamics and Mutation Disease group.

Research

I currently manage and analyse my own research projects in population genetics, sequence analysis, and human variation, with particular reference to mutation dynamics. I also handle sample logistics for the team and support internal and external collaborative projects. This may involve laboratory work, sequence analysis, or scripting and use of statistical and population genetic analysis programs. In addition, I manage the human genetics laboratory, health and safety, and support the laboratory needs of the team.

References

The rate of nonallelic homologous recombination in males is highly variable, correlated between monozygotic twins and independent of age.

Nonallelic homologous recombination (NAHR) between highly similar duplicated sequences generates chromosomal deletions, duplications and inversions, which can cause diverse genetic disorders. Little is known about interindividual variation in NAHR rates and the factors that influence this. We estimated the rate of deletion at the CMT1A-REP NAHR hotspot in sperm DNA from 34 male donors, including 16 monozygotic (MZ) co-twins (8 twin pairs) aged 24 to 67 years old. The average NAHR rate was 3.5 × 10(-5) with a seven-fold variation across individuals. Despite good statistical power to detect even a subtle correlation, we observed no relationship between age of unrelated individuals and the rate of NAHR in their sperm, likely reflecting the meiotic-specific origin of these events. We then estimated the heritability of deletion rate by calculating the intraclass correlation (ICC) within MZ co-twins, revealing a significant correlation between MZ co-twins (ICC = 0.784, p = 0.0039), with MZ co-twins being significantly more correlated than unrelated pairs. We showed that this heritability cannot be explained by variation in PRDM9, a known regulator of NAHR, or variation within the NAHR hotspot itself. We also did not detect any correlation between Body Mass Index (BMI), smoking status or alcohol intake and rate of NAHR. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of NAHR and are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.

Common and rare variants associated with body mass index (BMI) and obesity account for <5% of the variance in BMI. We performed SNP and copy number variation (CNV) association analyses in 1,509 children with obesity at the extreme tail (>3 s.d. from the mean) of the BMI distribution and 5,380 controls. Evaluation of 29 SNPs (P < 1 × 10(-5)) in an additional 971 severely obese children and 1,990 controls identified 4 new loci associated with severe obesity (LEPR, PRKCH, PACS1 and RMST). A previously reported 43-kb deletion at the NEGR1 locus was significantly associated with severe obesity (P = 6.6 × 10(-7)). However, this signal was entirely driven by a flanking 8-kb deletion; absence of this deletion increased risk for obesity (P = 6.1 × 10(-11)). We found a significant burden of rare, single CNVs in severely obese cases (P < 0.0001). Integrative gene network pathway analysis of rare deletions indicated enrichment of genes affecting G protein-coupled receptors (GPCRs) involved in the neuronal regulation of energy homeostasis.

J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.

Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 10% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.

Objective: To describe a considerably advanced method of array painting, which allows the rapid, ultra-high resolution mapping of translocation breakpoints such that rearrangement junction fragments can be amplified directly and sequenced.

Method: Ultra-high resolution array painting involves the hybridisation of probes generated by the amplification of small numbers of flow-sorted derivative chromosomes to oligonucleotide arrays designed to tile breakpoint regions at extremely high resolution.

Results and discussion: How ultra-high resolution array painting of four balanced translocation cases rapidly and efficiently maps breakpoints to a point where junction fragments can be amplified easily and sequenced is demonstrated. With this new development, breakpoints can be mapped using just two array experiments: the first using whole-genome array painting to tiling resolution large insert clone arrays, the second using ultra-high-resolution oligonucleotide arrays targeted to the breakpoint regions. In this way, breakpoints can be mapped and then sequenced in a few weeks.

A chromosomal rearrangement hotspot can be identified from population genetic variation and is coincident with a hotspot for allelic recombination.

Insights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)-repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots.

Haplotypic sequences contain significantly more information than genotypes of genetic markers and are critical for studying disease association and genome evolution. Current methods for obtaining haplotypic sequences require the physical separation of alleles before sequencing, are time consuming and are not scaleable for large surveys of genetic variation. We have developed a novel method for acquiring haplotypic sequences from long PCR products using simple, high-throughput techniques. This method applies modified shotgun sequencing protocols to sequence both alleles concurrently, with read-pair information allowing the two alleles to be separated during sequence assembly. Although the haplotypic sequences can be assembled manually from the resultant data using pre-existing sequence assembly software, we have devised a novel heuristic algorithm to automate assembly and remove human error. We validated the approach on two long PCR products amplified from the human genome and confirmed the accuracy of our sequences against full-length clones of the same alleles. This method presents a simple high-throughput means to obtain full haplotypic sequences potentially up to 20 kb in length and is suitable for surveying genetic variation even in poorly-characterized genomes as it requires no prior information on sequence variation.

Finishing the euchromatic sequence of the human genome.

International Human Genome Sequencing Consortium

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

Genome sequence of the nematode C. elegans: a platform for investigating biology.

C. elegans Sequencing Consortium

The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.

Sebastian Gerety

- Senior Staff Scientist

Senior Staff Scientist
Sebastian completed his PhD in Developmental Biology at the California Institute of Technology under the HHMI investigator Dr. David Anderson. While there, he studied the role of artery and vein specific genes in mouse embryonic angiogenesis, identifying a crucial role for the EphB4/EphrinB2 receptor/ligand pair in blood vessel development.

Sebastian pursued a Human Frontiers Science Program funded post-doctoral fellowship under Dr. David Wilkinson at the National Institute for Medical Research, in London. Using Zebrafish embryos as a model, he studied the role of notch modulators and novel genes in the patterning of brain development.

Research

Sebastian joined the Hurles group in 2011 as Senior Staff Scientist in charge of animal and cellular modeling. He brings his background and expertise in mouse and zebrafish genetics and embryology, applying these to establish causal links between candidate disease mutations and the phenotypes observed in affected patients. Exome sequencing in a number of studies, including Deciphering Developmental Disorders, UK10K, and a Congenital Heart Disease project, are translated into animal models using the resources of the Sanger Institute, using zebrafish (ZMP), mouse (MGP), and cellular approaches.

References

An inducible transgene expression system for zebrafish and chick.

We have generated an inducible system to control the timing of transgene expression in zebrafish and chick. An estrogen receptor variant (ERT2) fused to the GAL4 transcriptional activator rapidly and robustly activates transcription within 3 hours of treatment with the drug 4-hydroxy-tamoxifen (4-OHT) in tissue culture and transgenic zebrafish. We have generated a broadly expressed inducible ERT2-GAL4 zebrafish line using the ubiquitin (ubi) enhancer. In addition, use of ERT2-GAL4 in conjunction with tissue-specific enhancers enables the control of transgene expression in both space and time. This spatial restriction and the ability to sustain forced expression are important advantages over the currently used heat-shock promoters. Moreover, in contrast to currently available TET and LexA systems, which require separate constructs with their own unique recognition sequences, ERT2-GAL4 is compatible with the growing stock of UAS lines being generated in the community. We also applied the same inducible system to the chick embryo and find that it is fully functional, suggesting that this strategy is generally applicable.

During central nervous system development, neural progenitors are patterned to form discrete neurogenic and non-neurogenic zones. In the zebrafish hindbrain, neurogenesis is organised by Fgf20a emanating from neurons located at each segment centre that inhibits neuronal differentiation in adjacent progenitors. Here, we have identified a molecular mechanism that clusters fgf20a-expressing neurons in segment centres and uncovered a requirement for this positioning in the regulation of neurogenesis. Disruption of hindbrain boundary cell formation alters the organisation of fgf20a-expressing neurons, consistent with a role of chemorepulsion from boundaries. The semaphorins Sema3fb and Sema3gb, which are expressed by boundary cells, and their receptor Nrp2a are required for clustering of fgf20a-expressing neurons at segment centres. The dispersal of fgf20a-expressing neurons that occurs following the disruption of boundaries or of Sema3fb/Sema3gb signalling leads to reduced FGF target gene expression in progenitors and an increased number of differentiating neurons. Sema3 signalling from boundaries thus links hindbrain segmentation to the positioning of fgf20a-expressing neurons that regulates neurogenesis.

Morpholino artifacts provide pitfalls and reveal a novel role for pro-apoptotic genes in hindbrain boundary development.

Morpholino antisense oligonucleotides (MOs) are widely used as a tool to achieve loss of gene function, but many have off-target effects mediated by activation of Tp53 and associated apoptosis. Here, we re-examine our previous MO-based loss-of-function studies that had suggested that Wnt1 expressed at hindbrain boundaries in zebrafish promotes neurogenesis and inhibits boundary marker gene expression in the adjacent para-boundary regions. We find that Tp53 is highly activated and apoptosis is frequently induced by the MOs used in these studies. Co-knockdown of Tp53 rescues the decrease in proneural and neuronal marker expression, which is thus an off-target effect of MOs. While loss of gene expression can be attributed to cell loss through apoptotic cell death, surprisingly we find that the ectopic expression of hindbrain boundary markers is also dependent on Tp53 activity and its downstream apoptotic effectors. We examine whether this non-specific activation of hindbrain boundary gene expression provides insight into the endogenous mechanisms underlying boundary cell specification. We find that the pro-apoptotic Bcl genes puma and bax-a are required for hindbrain boundary marker expression, and that gain of function of the Bcl-caspase pathway leads to ectopic boundary marker expression. These data reveal a non-apoptotic role for pro-apoptotic genes in the regulation of gene expression at hindbrain boundaries. In light of these findings, we discuss the precautions needed in performing morpholino knockdowns and in interpreting the data derived from their use.

Lunatic fringe promotes the lateral inhibition of neurogenesis.

Previous studies have identified roles of the modulation of Notch activation by Fringe homologues in boundary formation and in regulating the differentiation of vertebrate thymocytes and Drosophila glial cells. We have investigated the role of Lunatic fringe (Lfng) expression during neurogenesis in the vertebrate neural tube. We find that in the zebrafish hindbrain, Lfng is expressed by progenitors in neurogenic regions and downregulated in cells that have initiated neuronal differentiation. Lfng is required cell autonomously in neural epithelial cells to limit the amount of neurogenesis and to maintain progenitors. By contrast, Lfng is not required for the role of Notch in interneuronal fate choice, which we show is mediated by Notch1a. The expression of Lfng does not require Notch activity, but rather is regulated downstream of proneural genes that are widely expressed by neural progenitors. These findings suggest that Lfng acts in a feedback loop downstream of proneural genes, which, by promoting Notch activation, maintains the sensitivity of progenitors to lateral inhibition and thus limits further proneural upregulation.

EphrinB2, a transmembrane ligand of EphB receptor tyrosine kinases, is specifically expressed in arteries. In ephrinB2 mutant embryos, there is a complete arrest of angiogenesis. However, ephrinB2 expression is not restricted to vascular endothelial cells, and it has been proposed that its essential function may be exerted in adjacent mesenchymal cells. We have generated mice in which ephrinB2 is specifically deleted in the endothelium and endocardium of the developing vasculature and heart. We find that such a vascular-specific deletion of ephrinB2 results in angiogenic remodeling defects identical to those seen in the conventional ephrinB2 mutants. These data indicate that ephrinB2 is required specifically in endothelial and endocardial cells for angiogenesis, and that ephrinB2 expression in perivascular mesenchyme is not sufficient to compensate for the loss of ephrinB2 in these vascular cells.

Expression of ephrinB2 identifies a stable genetic difference between arterial and venous vascular smooth muscle as well as endothelial cells, and marks subsets of microvessels at sites of adult neovascularization.

The transmembrane ligand ephrinB2 and its receptor tyrosine kinase EphB4 are molecular markers of embryonic arterial and venous endothelial cells, respectively, and are essential for angiogenesis. Here we show that expression of ephrinB2 persists in adult arteries where it extends into some of the smallest diameter microvessels, challenging the classical view that capillaries have neither arterial nor venous identity. EphrinB2 also identifies arterial microvessels in several settings of adult neovascularization, including tumor angiogenesis, contravening the dogma that tumor vessels arise exclusively from postcapillary venules. Unexpectedly, expression of ephrinB2 also defines a stable genetic difference between arterial and venous vascular smooth muscle cells. These observations argue for revisions of classical concepts of capillary identity and the topography of neovascularization. They also imply that ephrinB2 may be functionally important in neovascularization and in arterial smooth muscle, as well as in embryonic angiogenesis.

Human genome sequencing is accelerating rapidly. Multiple genome maps link this sequence to problems in biology and clinical medicine. Because each map represents a different aspect of the structure, content, and behavior of human chromosomes, these fundamental properties must be integrated with the genome to understand disease genes, cancer instability, and human evolution. Cytogenetic maps use 400-850 visible band landmarks and are the primary means for defining prenatal defects and novel cancer breakpoints, thereby providing simultaneous examination of the entire genome. Recent genetic, physical, and transcript maps use PCR-based landmarks called sequence-tagged sites (STSs). We have integrated these genome maps by anchoring the human cytogenetic to the STS-based genetic and physical maps with 1021 STS-BAC pairs at an average spacing of approximately 1 per 3 Mb. These integration points are represented by 872 unique STSs, including 642 polymorphic markers and 957 bacterial artificial chromosomes (BACs), each of which was localized on high resolution fluorescent banded chromosomes. These BACs constitute a resource that bridges map levels and provides the tools to seamlessly translate questions raised by genomic change seen at the chromosomal level into answers based at the molecular level. We show how the BACs provide molecular links for understanding human genomic duplications, meiosis, and evolution, as well as reagents for conducting genome-wide prenatal diagnosis at the molecular level and for detecting gene candidates associated with novel cancer breakpoints.

Symmetrical mutant phenotypes of the receptor EphB4 and its specific transmembrane ligand ephrin-B2 in cardiovascular development.

Ephrin-B2 is a transmembrane ligand that is specifically expressed on arteries but not veins and that is essential for cardiovascular development. However, ephrin-B2 is also expressed in nonvascular tissues and interacts with multiple EphB class receptors expressed in both endothelial and nonendothelial cell types. Thus, the identity of the relevant receptor for ephrin-B2 and the site(s) where these molecules interact to control angiogenesis were not clear. Here we show that EphB4, a specific receptor for ephrin-B2, is exclusively expressed by vascular endothelial cells in embryos and is preferentially expressed on veins. A targeted mutation in EphB4 essentially phenocopies the mutation in ephrin-B2. These data indicate that ephrin-B2-EphB4 interactions are intrinsically required in vascular endothelial cells and are consistent with the idea that they mediate bidirectional signaling essential for angiogenesis.

A physical map has been constructed of the human genome containing 15,086 sequence-tagged sites (STSs), with an average spacing of 199 kilobases. The project involved assembly of a radiation hybrid map of the human genome containing 6193 loci and incorporated a genetic linkage map of the human genome containing 5264 loci. This information was combined with the results of STS-content screening of 10,850 loci against a yeast artificial chromosome library to produce an integrated map, anchored by the radiation hybrid and genetic maps. The map provides radiation hybrid coverage of 99 percent and physical coverage of 94 percent of the human genome. The map also represents an early step in an international project to generate a transcript map of the human genome, with more than 3235 expressed sequences localized. The STSs in the map provide a scaffold for initiating large-scale sequencing of the human genome.

Marc Hitz

- unknown

I studied medicine at the University of Göttingen, Germany and have been trained in paediatrics and paediatric cardiology at the University Children’s Hospital in Hannover, Germany and Sainte-Justine Hospital, Montreal, Canada. To become an academic clinician I have joined the group of Matthew Hurles at the Sanger Institute to study the genetic architecture of congenital heart disease. I will continue having a clinical commitment throughout my time in Cambridge and eventually envisage a research career alongside a clinical appointment.

Research

Heart defects are among the most common birth defects. In particular, abnormalities of the left heart account for a significant number of hospital admissions in infants and adults. The discovery of genetic variation in whole human genomes will allow me to identify the underlying genetic causes in affected children. To prove that these genomic changes are causal for the development of heart defects I will further investigate them in model organisms such as zebrafish and mouse. The understanding of the effects of these genetic variations will benefit patients and families by providing early identification of potentially harmful genetic lesions.

Left-sided congenital heart disease (CHD) encompasses a spectrum of malformations that range from bicuspid aortic valve to hypoplastic left heart syndrome. It contributes significantly to infant mortality and has serious implications in adult cardiology. Although left-sided CHD is known to be highly heritable, the underlying genetic determinants are largely unidentified. In this study, we sought to determine the impact of structural genomic variation on left-sided CHD and compared multiplex families (464 individuals with 174 affecteds (37.5%) in 59 multiplex families and 8 trios) to 1,582 well-phenotyped controls. 73 unique inherited or de novo CNVs in 54 individuals were identified in the left-sided CHD cohort. After stringent filtering, our gene inventory reveals 25 new candidates for LS-CHD pathogenesis, such as SMC1A, MFAP4, and CTHRC1, and overlaps with several known syndromic loci. Conservative estimation examining the overlap of the prioritized gene content with CNVs present only in affected individuals in our cohort implies a strong effect for unique CNVs in at least 10% of left-sided CHD cases. Enrichment testing of gene content in all identified CNVs showed a significant association with angiogenesis. In this first family-based CNV study of left-sided CHD, we found that both co-segregating and de novo events associate with disease in a complex fashion at structural genomic level. Often viewed as an anatomically circumscript disease, a subset of left-sided CHD may in fact reflect more general genetic perturbations of angiogenesis and/or vascular biology.

University of Miami Miller School of Medicine, Miami, Florida 33136, USA.

A novel double deletion in cardiac troponin T (cTnT) of two highly conserved amino acids (Asn-100 and Glu-101) was found in a restrictive cardiomyopathic (RCM) pediatric patient. Clinical evaluation revealed the presence of left atrial enlargement and marked left ventricle diastolic dysfunction. The explanted heart examined by electron microscopy revealed myofibrillar disarray and mild fibrosis. Pedigree analysis established that this mutation arose de novo. The patient tested negative for six other sarcomeric genes. The single and double recombinant cTnT mutants were generated, and their functional consequences were analyzed in porcine skinned cardiac muscle. In the adult Tn environment (cTnT3 + cardiac troponin I), the single cTnT3-ΔN100 and cTnT3-ΔE101 mutations had opposing effects on the Ca(2+) sensitivity of force development compared with WT, whereas the double deletion cTnT3-ΔN100/ΔE101 increased the Ca(2+) sensitivity + 0.19 pCa units. In addition, cTnT3-ΔN100/ΔE101 decreased the cooperativity of force development, suggesting alterations in intrafilament protein-protein interactions. In the fetal Tn environment, (cTnT1 + slow skeletal troponin I), the single (cTnT1-ΔN110) and double (cTnT1-ΔN110/ΔE111) deletions did not change the Ca(2+) sensitivity compared with control. To recreate the patient's heterozygous genotype, we performed a reconstituted ATPase activity assay. Thin filaments containing 50:50 cTnT3-ΔN100/ΔE101:cTnT3-WT also increased the myofilament Ca(2+) sensitivity compared with WT. Co-sedimentation of thin filament proteins indicated that no significant changes occurred in the binding of Tn containing the RCM cTnT mutation to actin-Tm. This report reveals the protective role of Tn fetal isoforms as they rescue the increased Ca(2+) sensitivity produced by a cTnT-RCM mutation and may account for the lack of lethality during gestation.

Aims: Although ventricular septal defects (VSD) are the most common congenital heart lesion, familial clustering has been described only in rare instances. The aim of this study was to identify genetic factors and chromosomal regions contributing to VSD.

Methods and results: A unique, large kindred segregating various forms of septal pathologies-including VSD, ventricular septal aneurysms, and atrial septal defects (ASD)-was ascertained and characterized clinically and genetically. Eighteen family members in three generations could be studied, out of whom 10 are affected (2 ASD, 3 septal aneurysm, 4 VSD, and 1 tetralogy of Fallot). Parametric multipoint LOD scores reach significance on chromosome 10p15.3-10p15.2 (max. 3.29). The LOD score support interval is in a gene-poor region where deletions have been reported to associate with septal defects, but that is distinct from the DiGeorge syndrome 2 region on 10p. Multiple linkage analysis scenarios suggest that tetralogy of Fallot is a phenocopy and genetically distinct from the autosomal dominant form of septal pathologies observed in this family.

Conclusion: This study maps a rare familial form of VSD/septal aneurysms to chromosome 10p15 and extends the spectrum of the genetic heterogeneity of septal pathologies. Fine mapping, haplotype construction, and resequencing will provide a unique opportunity to study the pathogenesis of septal defects and shed light on molecular mechanisms of septal development.

Dan King

- PhD Student

I grew up on Long Island, New York, just minutes from Cold Spring Harbor Laboratory. I completed my bachelors degree in Molecular and Cellular Biology at the University of Michigan, in Ann Arbor, Michigan and my medical degree at Wayne State University in Detroit, Michigan. During medical school, I joined a project to develop better treatment strategies for patients with glioblastoma (brain cancer). This experience lead me to the HHMI Research Scholars Program at the National Institutes of Health, where I worked for two years with Leslie Biesecker on exome-based copy number variation analysis before coming to the Sanger.

Research

Working with my Ph.D. supervisor, Matt Hurles, my focus has mainly centered on the Deciphering Developmental Disorders project, a large study of rare disease patient trios. Specifically, I developed a software tool, UPDio, which can identify probands of trios with uniparental disomy events. In addition, I'm interested in large-scale mosaic events, such as large UPD and CNVs that affect only a subset of tissues in an individual and have been actively working to detect these events from SNP and exome data. Lastly, I'm is interested in using RNAseq data to assist in highlighting causative genetic variation.

References

Inhibition of HSP27 alone or in combination with pAKT inhibition as therapeutic approaches to target SPARC-induced glioma cell survival.

Background: The current treatment regimen for glioma patients is surgery, followed by radiation therapy plus temozolomide (TMZ), followed by 6 months of adjuvant TMZ. Despite this aggressive treatment regimen, the overall survival of all surgically treated GBM patients remains dismal, and additional or different therapies are required. Depending on the cancer type, SPARC has been proposed both as a therapeutic target and as a therapeutic agent. In glioma, SPARC promotes invasion via upregulation of the p38 MAPK/MAPKAPK2/HSP27 signaling pathway, and promotes tumor cell survival by upregulating pAKT. As HSP27 and AKT interact to regulate the activity of each other, we determined whether inhibition of HSP27 was better than targeting SPARC as a therapeutic approach to inhibit both SPARC-induced glioma cell invasion and survival.

Results: Our studies found the following. 1) SPARC increases the expression of tumor cell pro-survival and pro-death protein signaling in balance, and, as a net result, tumor cell survival remains unchanged. 2) Suppressing SPARC increases tumor cell survival, indicating it is not a good therapeutic target. 3) Suppressing HSP27 decreases tumor cell survival in all gliomas, but is more effective in SPARC-expressing tumor cells due to the removal of HSP27 inhibition of SPARC-induced pro-apoptotic signaling. 4) Suppressing total AKT1/2 paradoxically enhanced tumor cell survival, indicating that AKT1 or 2 are poor therapeutic targets. 5) However, inhibiting pAKT suppresses tumor cell survival. 6) Inhibiting both HSP27 and pAKT synergistically decreases tumor cell survival. 7) There appears to be a complex feedback system between SPARC, HSP27, and AKT. 8) This interaction is likely influenced by PTEN status. With respect to chemosensitization, we found the following. 1) SPARC enhances pro-apoptotic signaling in cells exposed to TMZ. 2) Despite this enhanced signaling, SPARC protects cells against TMZ. 3) This protection can be reduced by inhibiting pAKT. 4) Combined inhibition of HSP27 and pAKT is more effective than TMZ treatment alone.

Conclusions: We conclude that inhibition of HSP27 alone, or in combination with pAKT inhibitor IV, may be an effective therapeutic approach to inhibit SPARC-induced glioma cell invasion and survival in SPARC-positive/PTEN-wildtype and SPARC-positive/PTEN-null tumors, respectively.

Raheleh Rahbari

- Postdoctoral Fellow

- Postdoctoral Research Associate, June 2012 – January 2013, University of Leicester
Investigating the role of copy number variation in autoimmune disease

- Bioinformatics, October 2011- April 2012, University of Leicester
Data identification import and curation for the GWAS cetral database

- PhD research, October 2007 – September 2011, University of Leicester
Investigating the endogenous activity of L1 retrotransposons during human embryogenesis, by estimating the rate of de novo L1 retrotransposition in early human development

Research

One of the aim of our team is to characterise the fundamental mutational mechanisms that underpin genetic variation. My project aims to estimate the effect of age and sex on mutation rates by identifying de novo mutations in children that are not found in either parent, ascribing these mutations to the paternal or maternal germline, and investigating whether the numbers of paternal or maternal germline mutations changes with increasing parental age. The outcome will help us to control for potential genetic or environmental influences on mutation rate and maximise our power to detect the influence of age on mutation rate.

References

Department of Genetics, University of Leicester, University Road, Leicester, UK.

Long INterspersed Element-1 (LINE-1 or L1) retrotransposons are the only autonomously active transposable elements in the human genome. The average human genome contains ∼80-100 active L1s, but only a subset of these L1s are highly active or 'hot'. Human L1s are closely related in sequence, making it difficult to decipher progenitor/offspring relationships using traditional phylogenetic methods. However, L1 mRNAs can sometimes bypass their own polyadenylation signal and instead utilize fortuitous polyadenylation signals in 3' flanking genomic DNA. Retrotransposition of the resultant mRNAs then results in lineage specific sequence "tags" (i.e., 3' transductions) that mark the descendants of active L1 progenitors. Here, we developed a method (Transduction-Specific Amplification Typing of L1 Active Subfamilies or TS-ATLAS) that exploits L1 3' transductions to identify active L1 lineages in a genome-wide context. TS-ATLAS enabled the characterization of a putative active progenitor of one L1 lineage that includes the disease causing L1 insertion L1RP , and the identification of new retrotransposition events within two other "hot" L1 lineages. Intriguingly, the analysis of the newly discovered transduction lineage members suggests that L1 polyadenylation, even within a lineage, is highly stochastic. Thus, TS-ATLAS provides a new tool to explore the dynamics of L1 lineage evolution and retrotransposon biology.

Department of Genetics, University of Leicester, University Road, Leicester LE17RH, UK.

Intracisternal A-type particle (IAP) elements are high copy number long terminal repeat (LTR) rodent retrotransposons. Some IAP elements can transpose, and are responsible for ~12% of spontaneous mouse mutations. Inbred mouse strains show variation in genomic IAP distribution, contributing to inter-strain genetic variability. Additionally IAP elements can influence the transcriptional regulation of neighbouring genes through their strong LTR promoter, effecting phenotypic variation. This genetic and phenotypic variability can translate into experimental variability between mouse strains. For example, it has been demonstrated that strain-specific genetic/epigenetic factors can interact to yield variable responses to drugs. Therefore, in experimental contexts it is essential to unequivocally identify mouse strains. Recently it was estimated that any two inbred strains share only ~40% of their IAP insertions. Of the remaining 60%, some insertions will be strain specific, fixed during inbreeding. These fixed insertions can be exploited as genetic markers to identify inbred strains, if they can be identified simply and efficiently. Here, we report the development of a PCR-based system allowing direct acquisition of strain-specific IAP insertions. In a pilot study, we identified 21 IAP loci, genotyped IAP insertions at 9 loci, and discovered two strain-specific insertions that could reliably identify these strains.

A novel L1 retrotransposon marker for HeLa cell line identification.

Rahbari R, Sheahan T, Modes V, Collier P, Macfarlane C and Badge RM

Department of Genetics, University of Leicester, Leicester, UK.

The HeLa cell line is the oldest, most widely distributed, permanent human cell line. As a nearly ubiquitous inhabitant of laboratories using tissue culture techniques, its aggressive growth characteristics make it a problematic contaminant that can overgrow less robust cell lines. Consequently, HeLa contamination is common in both the research laboratory and cell line repository contexts, and its detection is hampered by the lack of a rapid, sensitive and robust assay. Here we report the development of a HeLa-specific DNA diagnostic test: a single duplex detection PCR assay targeting an L1 retrotransposon insertion. All HeLa clones from a geographically diverse panel were positive by this assay, and the particular L1 insertion we identified appears to be unique to the HeLa cell line. The assay can detect very low levels of HeLa contamination (<1%), and can be performed on un-purified cell pellets, allowing rapid routine screening.

Jawahar Swaminathan

I have a PhD in Macromolecular Crystallography from the National Institute of Immunology, New Delhi, India. Following post-doctoral research at the University of Bath on an MRC Funded Project on Inflammation, I moved to the Protein Data Bank in Europe (http://pdbe.org), European Bioinformatics Institute as Head of PDBe Depositions, Annotation and Outreach. I have also developed Biobar - a bioinformatics search toolbar for Firefox.

Three-dimensional protein structures determined by me can be found at http://tinyurl.com/jawaharpdb

Research

As Project Manager for DECIPHER - a clinical database of chromosomal imbalance and phenotype in humans, I oversee the day-to-day management of the database and liaise closely with consortium members who deposit and access confidential patient data.
I am also responsible for the strategic planning, implementation and delivery of new developments in DECIPHER.

References

DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.

The DECIPHER database (https://decipher.sanger.ac.uk/) is an accessible online repository of genetic variation with associated phenotypes that facilitates the identification and interpretation of pathogenic genetic variation in patients with rare disorders. Contributing to DECIPHER is an international consortium of >200 academic clinical centres of genetic medicine and ≥1600 clinical geneticists and diagnostic laboratory scientists. Information integrated from a variety of bioinformatics resources, coupled with visualization tools, provides a comprehensive set of tools to identify other patients with similar genotype-phenotype characteristics and highlights potentially pathogenic genes. In a significant development, we have extended DECIPHER from a database of just copy-number variants to allow upload, annotation and analysis of sequence variants such as single nucleotide variants (SNVs) and InDels. Other notable developments in DECIPHER include a purpose-built, customizable and interactive genome browser to aid combined visualization and interpretation of sequence and copy-number variation against informative datasets of pathogenic and population variation. We have also introduced several new features to our deposition and analysis interface. This article provides an update to the DECIPHER database, an earlier instance of which has been described elsewhere [Swaminathan et al. (2012) DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum. Mol. Genet., 21, R37-R44].

Patients with developmental disorders often harbour sub-microscopic deletions or duplications that lead to a disruption of normal gene expression or perturbation in the copy number of dosage-sensitive genes. Clinical interpretation for such patients in isolation is hindered by the rarity and novelty of such disorders. The DECIPHER project (https://decipher.sanger.ac.uk) was established in 2004 as an accessible online repository of genomic and associated phenotypic data with the primary goal of aiding the clinical interpretation of rare copy-number variants (CNVs). DECIPHER integrates information from a variety of bioinformatics resources and uses visualization tools to identify potential disease genes within a CNV. A two-tier access system permits clinicians and clinical scientists to maintain confidential linked anonymous records of phenotypes and CNVs for their patients that, with informed consent, can subsequently be shared with the wider clinical genetics and research communities. Advances in next-generation sequencing technologies are making it practical and affordable to sequence the whole exome/genome of patients who display features suggestive of a genetic disorder. This approach enables the identification of smaller intragenic mutations including single-nucleotide variants that are not accessible even with high-resolution genomic array analysis. This article briefly summarizes the current status and achievements of the DECIPHER project and looks ahead to the opportunities and challenges of jointly analysing structural and sequence variation in the human genome.

The Protein Data Bank in Europe (PDBe; pdbe.org) is a partner in the Worldwide PDB organization (wwPDB; wwpdb.org) and as such actively involved in managing the single global archive of biomacromolecular structure data, the PDB. In addition, PDBe develops tools, services and resources to make structure-related data more accessible to the biomedical community. Here we describe recently developed, extended or improved services, including an animated structure-presentation widget (PDBportfolio), a widget to graphically display the coverage of any UniProt sequence in the PDB (UniPDB), chemistry- and taxonomy-based PDB-archive browsers (PDBeXplore), and a tool for interactive visualization of NMR structures, corresponding experimental data as well as validation and analysis results (Vivaldi).

Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom.

The eosinophil major basic protein (EMBP), a constituent of the eosinophil secondary granule, is implicated in cytotoxicity and mediation of allergic disorders such as asthma. It is a member of the C-type lectin family, but lacks a Ca(2+)- and carbohydrate-binding site as seen in other members of this family. Here, we report the crystal structure of EMBP in complex with a heparin disaccharide and in the absence of Ca(2+), the first such report of any C-lectin with this sugar. We also provide direct evidence of binding of EMBP to heparin and heparin disaccharide by surface plasmon resonance. We propose that the sugars recognized by EMBP are likely to be proteoglycans such as heparin, leading to new interpretations for EMBP function.

Roles of individual enzyme-substrate interactions by alpha-1,3-galactosyltransferase in catalysis and specificity.

The retaining glycosyltransferase, alpha-1,3-galactosyltransferase (alpha3GT), is mutationally inactivated in humans, leading to the presence of circulating antibodies against its product, the alpha-Gal epitope. alpha3GT catalyzes galactose transfer from UDP-Gal to beta-linked galactosides, such as lactose, and in the absence of an acceptor substrate, to water at a lower rate. We have used site-directed mutagenesis to investigate the roles in catalysis and specificity of residues in alpha3GT that form H-bonds as well as other interactions with substrates. Mutation of the conserved Glu(317) to Gln weakens lactose binding and reduces the k(cat) for galactosyltransfer to lactose and water by 2400 and 120, respectively. The structure is not perturbed by this substitution, but the orientation of the bound lactose molecule is changed. The magnitude of these changes does not support a previous proposal that Glu(317) is the catalytic nucleophile in a double displacement mechanism and suggests it acts in acceptor substrate binding and in stabilizing a cationic transition state for cleavage of the bond between UDP and C1 of the galactose. Cleavage of this bond also linked to a conformational change in the C-terminal region of alpha3GT that is coupled with UDP binding. Mutagenesis indicates that His(280), which is projected to interact with the 2-OH of the galactose moiety of UDP-Gal, is a key residue in the stringent donor substrate specificity through its role in stabilizing the bound UDP-Gal in a suitable conformation for catalysis. Mutation of Gln(247), which forms multiple interactions with acceptor substrates, to Glu reduces the catalytic rate of galactose transfer to lactose but not to water. This mutation is predicted to perturb the orientation or environment of the bound acceptor substrate. The results highlight the importance of H-bonds between enzyme and substrates in this glycosyltransferase, in arranging substrates in appropriate conformations and orientation for efficient catalysis. These factors are manifested in increases in catalytic rate rather than substrate affinity.

Crystal structures of oligomeric forms of the IP-10/CXCL10 chemokine.

Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom.

We have determined the structure of wild-type IP-10 from three crystal forms. The crystals provide eight separate models of the IP-10 chain, all differing substantially from a monomeric IP-10 variant examined previously by NMR spectroscopy. In each crystal form, IP-10 chains form conventional beta sheet dimers, which, in turn, form a distinct tetrameric assembly. The M form tetramer is reminiscent of platelet factor 4, whereas the T and H forms feature a novel twelve-stranded beta sheet. Analytical ultracentrifugation indicates that, in free solution, IP-10 exists in a monomer-dimer equilibrium with a dissociation constant of 9 microM. We propose that the tetrameric structures may represent species promoted by the binding of glycosaminoglycans. The binding sites for several IP-10-neutralizing mAbs have also been mapped.

Human eosinophil-derived neurotoxin (EDN) is a small, basic protein that belongs to the ribonuclease A superfamily. EDN displays antiviral activity and causes the neurotoxic Gordon phenomenon when injected into rabbits. Although EDN and ribonuclease A have appreciable structural similarity and a conserved catalytic triad, their peripheral substrate-binding sites are not conserved. The crystal structure of recombinant EDN (rEDN) has been determined at 0.98 A resolution from data collected at a low temperature (100 K). We have refined the crystallographic model of the structure using anisotropic displacement parameters to a conventional R-factor of 0.116. This represents the highest resolution structure of rEDN determined to date and is only the second ribonuclease structure to be determined at a resolution greater than 1.0 A. The structure provides a detailed picture of the conformational freedom at the various subsites of rEDN, and the water structure accounts for more than 50% of the total solvent content of the unit cell. This information will be crucial for the design of tight-binding inhibitors to restrain the ribonucleolytic activity of rEDN.

Structure of UDP complex of UDP-galactose:beta-galactoside-alpha -1,3-galactosyltransferase at 1.53-A resolution reveals a conformational change in the catalytically important C terminus.

Boix E, Swaminathan GJ, Zhang Y, Natesh R, Brew K and Acharya KR

Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom.

UDP-galactose:beta-galactosyl alpha-1,3-galactosyltransferase (alpha3GT) catalyzes the transfer of galactose from UDP-alpha-d-galactose into an alpha-1,3 linkage with beta-galactosyl groups in glycoconjugates. The enzyme is expressed in many mammalian species but is absent from humans, apes, and old world monkeys as a result of the mutational inactivation of the gene; in humans, a large fraction of natural antibodies are directed against its product, the alpha-galactose epitope. alpha3GT is a member of a family of metal-dependent retaining glycosyltransferases including the histo-blood group A and B synthases. A crystal structure of the catalytic domain of alpha3GT was recently reported (Gastinel, L. N., Bignon, C., Misra, A. K., Hindsgaul, O., Shaper, J. H., and Joziasse, D. H. (2001) EMBO J. 20, 638-649). However, because of the limited resolution (2.3 A) and high mobility of the atoms (as indicated by high B-factors) this structure (form I) does not provide a clear depiction of the catalytic site of the enzyme. Here we report a new, highly ordered structure for the catalytic domain of alpha3GT at 1.53-A resolution (form II). This provides a more accurate picture of the details of the catalytic site that includes a bound UDP molecule and a Mn(2+) cofactor. Significantly, in the new structure, the C-terminal segment (residues 358-368) adopts a very different, highly structured conformation and appears to form part of the active site. The properties of an Arg-365 to Lys mutant indicate that this region is important for catalysis, possibly reflecting its role in a donor substrate-induced conformational change.

The crystal structure of an antibacterial protein of immune origin (TSWAB), purified from tasar silkworm (Antheraea mylitta) larvae after induction by Escherichia coli infection, has been determined. This is the first insect lysozyme structure and represents induced lysozymes of innate immunity. The core structure of TSWAB is similar to c-type lysozymes and alpha-lactalbumins. However, TSWAB shows significant differences with respect to the other two proteins in the exposed loop regions. The catalytic residues in TSWAB are conserved with respect to the chicken lysozyme, indicating a common mechanism of action. However, differences in the noncatalytic residues in the substrate binding groove imply subtle differences in the specificity and the level of activity. Thus, conformational differences between TSWAB and chicken lysozyme exist, whereas functional mechanisms appear to be similar. On the other hand, alpha-lactalbumins and c-type lysozymes exhibit drastically different functions with conserved molecular conformation. It is evident that a common molecular scaffold is exploited in the three enzymes for apparently different physiological roles. It can be inferred on the basis of the structure-function comparison of these three proteins having common phylogenetic origin that the conformational changes in a protein are minimal during rapid evolution as compared with those in the normal course of evolution.

Crystal structure of the eosinophil major basic protein at 1.8 A. An atypical lectin with a paradigm shift in specificity.

Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom.

The eosinophil major basic protein (EMBP) is the predominant constituent of the crystalline core of the eosinophil primary granule. EMBP is directly implicated in epithelial cell damage, exfoliation, and bronchospasm in allergic diseases such as asthma. Here we report the crystal structure of EMBP at 1.8 A resolution, and show that it is similar to that of members of the C-type lectin superfamily with which it shares minimal amino acid sequence identity (approximately 15--28%). However, this protein lacks a Ca(2+)/carbohydrate-binding site. Our analysis suggests that EMBP specifically binds heparin. Based on our results, we propose a possible new function for this protein, which is likely to have implications for EMBP function.

Art Wuster

- unknown

Art has a PhD in Computational Biology from the MRC Laboratory of Molecular Biology in Cambridge and a BSc in Biology from Sussex University. Before joining the Sanger Institute, Art worked at an international firm of management consultants.

Department of Cellular and Molecular Pharmacology, QB3 Institute, University of California, San Francisco, CA 94158, USA.

We present a cross-species chemogenomic screening platform using libraries of haploid deletion mutants from two yeast species, Saccharomyces cerevisiae and Schizosaccharomyces pombe. We screened a set of compounds of known and unknown mode of action (MoA) and derived quantitative drug scores (or D-scores), identifying mutants that are either sensitive or resistant to particular compounds. We found that compound-functional module relationships are more conserved than individual compound-gene interactions between these two species. Furthermore, we observed that combining data from both species allows for more accurate prediction of MoA. Finally, using this platform, we identified a novel small molecule that acts as a DNA damaging agent and demonstrate that its MoA is conserved in human cells.

Spial: analysis of subtype-specific features in multiple sequence alignments of proteins.

Motivation: Spial (Specificity in alignments) is a tool for the comparative analysis of two alignments of evolutionarily related sequences that differ in their function, such as two receptor subtypes. It highlights functionally important residues that are either specific to one of the two alignments or conserved across both alignments. It permits visualization of this information in three complementary ways: by colour-coding alignment positions, by sequence logos and optionally by colour-coding the residues of a protein structure provided by the user. This can aid in the detection of residues that are involved in the subtype-specific interaction with a ligand, other proteins or nucleic acids. Spial may also be used to detect residues that may be post-translationally modified in one of the two sets of sequences.

Availability: http://www.mrc-lmb.cam.ac.uk/genomes/spial/; supplementary information is available at http://www.mrc-lmb.cam.ac.uk/genomes/spial/help.html.

Transcriptional control of the quorum sensing response in yeast.

Quorum sensing is a process of intercellular communication. It allows individual cells to assess population density and to co-ordinate behaviour by secreting and sensing communication molecules. In the yeast Saccharomyces cerevisiae, the communication molecules are the aromatic alcohols tryptophol and phenylethanol, and quorum sensing regulates the transition between the solitary yeast form and the filamentous form. Though it is known that addition of these communication molecules to yeast cultures causes large changes in gene expression, how these changes are orchestrated and whether this system is conserved in related fungal species is still unknown. In this work, by employing an integrated computational approach that makes use of large-scale genomics datasets, such as ChIP-ChIP and expression analysis upon deletion and over-expression of transcriptional factors, we predict CAT8 and MIG1 as key transcriptional regulators that control the differential expression of the genes affected by aromatic alcohol communication. In addition, through a comparative genomic analysis involving 31 fungal species, we show that the S. cerevisiae quorum sensing system is a recent evolutionary innovation and that the genes which are differentially expressed upon treatment with these molecules are distributed across the genome in a highly non-random manner. The identified transcription factors will aid in further unravelling the molecular mechanisms of S. cerevisiae quorum sensing and may facilitate the engineering of regulatory circuits for applications such as the expression of heterologous proteins via aromatic alcohols.

Although several studies have provided important insights into the general principles of biological networks, the link between network organization and the genome-scale dynamics of the underlying entities (genes, mRNAs, and proteins) and its role in systems behavior remain unclear. Here we show that transcription factor (TF) dynamics and regulatory network organization are tightly linked. By classifying TFs in the yeast regulatory network into three hierarchical layers (top, core, and bottom) and integrating diverse genome-scale datasets, we find that the TFs have static and dynamic properties that are similar within a layer and different across layers. At the protein level, the top-layer TFs are relatively abundant, long-lived, and noisy compared with the core- and bottom-layer TFs. Although variability in expression of top-layer TFs might confer a selective advantage, as this permits at least some members in a clonal cell population to initiate a response to changing conditions, tight regulation of the core- and bottom-layer TFs may minimize noise propagation and ensure fidelity in regulation. We propose that the interplay between network organization and TF dynamics could permit differential utilization of the same underlying network by distinct members of a clonal cell population.

Chemogenomics and biotechnology.

A robust knowledge of the interactions between small molecules and specific proteins aids the development of new biotechnological tools and the identification of new drug targets, and can lead to specific biological insights. Such knowledge can be obtained through chemogenomic screens. In these screens, each small molecule from a chemical library is applied to each cell type from a library of cells, and the resulting phenotypes are recorded. Chemogenomic screens have recently become very common and will continue to generate large amounts of data. The interpretation of this data will occupy biologists and chemists alike for some time to come. This review discusses methods for the acquisition and interpretation of chemogenomic data, in addition to possible applications of chemogenomics in biotechnology.

Conservation and evolutionary dynamics of the agr cell-to-cell communication system across firmicutes.

We present evidence that the agr cell-to-cell communication system is present across firmicutes, including the human pathogen Clostridium perfringens. Although we find that the agr system is evolutionarily conserved and that the general functions which it regulates are similar in different species, the individual regulated genes are not the same. This suggests that the regulatory network controlled by agr is dynamic and evolves rapidly.

Margriet van Kogelenberg

In 2004 Margriet obtained a Bachelors degree in Molecular Biology from the Institute of Life Sciences and Chemistry, Utrecht, the Netherlands which was followed by a Masters degree in Oncology from the VU University Amsterdam, the Netherlands in 2006. Margriet undertook a PhD in clinical genetics under supervision of Professor Stephen Robertson at the University of Otago, New Zealand, which was successfully completed in 2010.

Research

In 2011 Margriet started as a Postdoctoral Fellow at the Wellcome Trust Sanger Institute where she works on the Deciphering Developmental Disorders (DDD) project. This study aims to delineate the genetic architecture in children with undiagnosed severe developmental disorders and employs both array and sequencing technologies to enhance the detection rate of rare causative mutations and to gain a holistic view of the genomic architecture in developmental disorders. Margriet is primarily focussed on the exome sequence analysis.

References

A novel Xp22.11 deletion causing a syndrome of craniosynostosis and periventricular nodular heterotopia.

Department of Paediatrics and Child Health, Dunedin School of Medicine, Otago University, Dunedin, New Zealand.

We report on a follow-up evaluation of a male with a phenotype including craniosynostosis, periventricular nodular heterotopia, and neurodevelopmental delay. He was initially assigned a clinical diagnosis of Fontaine-Farriaux syndrome (FFS) as an infant although now, with improved delineation of this entity, it is evident that this diagnosis is not applicable to this individual. Array comparative genomic hybridization has demonstrated a 300 kb interstitial deletion on Xp22.11 affecting all or part of three annotated genes, ZFX, PDK3, and PCYT1B in this subject. The deletion was inherited from the phenotypically normal mother who also exhibited markedly skewed X-inactivation. These findings implicate hemizygosity for one or all three of these genes as the cause of this phenotype.

Periventricular heterotopia in common microdeletion syndromes.

Department of Paediatrics and Child Health, Dunedin School of Medicine, Otago University, Dunedin, New Zealand.

Periventricular heterotopia (PH) is a brain malformation characterised by heterotopic nodules of neurons lining the walls of the cerebral ventricles. Mutations in FLNA account for 20-24% of instances but a majority have no identifiable genetic aetiology. Often the co-occurrence of PH with a chromosomal anomaly is used to infer a new locus for a Mendelian form of PH. This study reports four PH patients with three different microdeletion syndromes, each characterised by high-resolution genomic microarray. In three patients the deletions at 1p36 and 22q11 are conventional in size, whilst a fourth child had a deletion at 7q11.23 that was larger in extent than is typically seen in Williams syndrome. Although some instances of PH associated with chromosomal deletions could be attributed to the unmasking of a recessive allele or be indicative of more prevalent subclinical migrational anomalies, the rarity of PH in these three microdeletion syndromes and the description of other non-recurrent chromosomal defects do suggest that PH may be a manifestation of multiple different forms of chromosomal imbalance. In many, but possibly not all, instances the co-occurrence of PH with a chromosomal deletion is not necessarily indicative of uncharacterised underlying monogenic loci for this particular neuronal migrational anomaly.

Departments of Paediatrics, Dunedin School of Medicine, Otago University, Dunedin 9054, New Zealand.

Abnormalities in WNT signaling are implicated in a broad range of developmental anomalies and also in tumorigenesis. Here we demonstrate that germline mutations in WTX (FAM123B), a gene that encodes a repressor of canonical WNT signaling, cause an X-linked sclerosing bone dysplasia, osteopathia striata congenita with cranial sclerosis (OSCS; MIM300373). This condition is typically characterized by increased bone density and craniofacial malformations in females and lethality in males. The mouse homolog of WTX is expressed in the fetal skeleton, and alternative splicing implicates plasma membrane localization of WTX as a factor associated with survival in males with OSCS. WTX has also been shown to be somatically inactivated in 11-29% of cases of Wilms tumor. Despite being germline for such mutations, individuals with OSCS are not predisposed to tumor development. The observed phenotypic discordance dependent upon whether a mutation is germline or occurs somatically suggests the existence of temporal or spatial constraints on the action of WTX during tumorigenesis.