International Award for Human Genome Project

Reporter and Curator: Dr. Sudipta Saha, Ph.D.

The Thai royal family awarded its annual prizes in Bangkok, Thailand, in late January 2018 in recognition of advances in public health and medicine – through the Prince Mahidol Award Foundation under the Royal Patronage. This foundation was established in 1992 to honor the late Prince Mahidol of Songkla, the Royal Father of His Majesty King Bhumibol Adulyadej of Thailand and the Royal Grandfather of the present King. Prince Mahidol is celebrated worldwide as the father of modern medicine and public health in Thailand.

The Human Genome Project has been awarded the 2017 Prince Mahidol Award for revolutionary advances in the field of medicine. The Human Genome Project was completed in 2003. It was an international, collaborative research program aimed at the complete mapping and sequencing of the human genome. Its final goal was to provide researchers with fundamental information about the human genome and powerful tools for understanding the genetic factors in human disease, paving the way for new strategies for disease diagnosis, treatment and prevention.

The resulting human genome sequence has provided a foundation on which researchers and clinicians now tackle increasingly complex problems, transforming the study of human biology and disease. Particularly it is satisfying that it has given the researchers the ability to begin using genomics to improve approaches for diagnosing and treating human disease thereby beginning the era of genomic medicine.

National Human Genome Research Institute (NHGRI) is devoted to advancing health through genome research. The institute led National Institutes of Health’s (NIH’s) contribution to the Human Genome Project, which was successfully completed in 2003 ahead of schedule and under budget. NIH, is USA’s national medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases.

Building on the foundation laid by the sequencing of the human genome, NHGRI’s work now encompasses a broad range of research aimed at expanding understanding of human biology and improving human health. In addition, a critical part of NHGRI’s mission continues to be the study of the ethical, legal and social implications of genome research.

NHGRI’s Green Sees ‘Tragic’ Sequestration Impact on NHGRI Programs

NEW YORK (GenomeWeb News) – The funding squeeze from the sequestration of the US federal budget, now more than half-a-year old, has already had a sizable impact at the National Human Genome Research Institute, leading to cuts to ongoing programs, scaling back of new ones, and the deferring of efforts that have not yet launched.

The five percent cut in funding this year at NHGRI has led not only to trimmed-down renewal grants and fewer, smaller awards broadly, but also has chopped the budget for some of the institute’s important programs, according to NHGRI Director Eric Green.

The programs that have either had their funding reduced, and in one case delayed, include the ENCODE (Encyclopedia of DNA Elements) program, projects focused on using genome sequencing in newborns and in clinical medicine, and other initiatives, Green said in his Director’s Report to the National Advisory Council on Human Genomics Research this week.

In addition, many renewal grants have been trimmed, and there are “numerous examples of detrimental cuts” to the institute’s intramural research program, said Green. These cuts to large and small NHGRI programs come at a pivotal time for genomics, he noted, as the products of such research are beginning to translate into clinical possibilities.

“It is tragic. [That] is the word I would use,” Green told GenomeWeb Daily News this week.

“[The field of genomics] is just so exciting. There are so many opportunities,” he said. “This is precisely the time that we should be pushing the accelerator hard, and we just cannot do it because we don’t have enough fuel in our fuel tank.

“It’s frustrating. I think the opportunities now are just spectacular,” said Green. “It’s tragic because it is just so obvious that we could do some remarkable things in genomics and we are not being able to do it.”

ENCODE, a decade-old flagship project at NIH that aims to identify all of the functional elements in the human genome, had its budget reduced by 16 percent.

The Genomic Sequencing and Newborn Screening Disorders program was cut by half, which left the program to fund fewer research projects than planned and its research consortium to go forward without the benefit of a data coordinating center. This new initiative, an effort to support pioneering studies on how sequencing might be used in the care of newborns and in neonatal care that was created jointly with the Eunice Kennedy Shriver National Institute of Child Health and Human Development, had its budget cut from $10 million to $5 million.

The Genomic Medicine Pilot Demonstration Projects program had its budget cut by 20 percent, and NHGRI’s Bioinformatics Resources and Analysis Research Portfolio had $5 million sliced out of its budget. The new Genomics of Gene Regulation (GGR)request for applications was bumped out of this funding year entirely, and has been delayed until 2014, according to Green.

Because the sequestration plan was concocted and agreed to well in advance of its arrival earlier this year, Green told GWDN that the institute did have some time to try to react to the sequestration and mitigate the pain from the cuts, spreading them around fairly and evenly while maintaining priorities. He said leadership at the institute tried to prepare for the possibility of sequestration by being conservative in its planning.

Programs that were already ongoing, like ENCODE, were likely to take priority over those that were not yet launched, like GGR, in part because the infrastructure is already in place for ongoing projects and because it is easier to plan for how they operate and generate outputs, like data.

“With ENCODE you know for every million dollars you invest you get so much back,” said Green. “With a program like newborn sequencing … we don’t totally know what it’s going to look like or play out like. We won’t know what we are missing because we won’t be able to launch it to the scale that we wanted to launch it originally.”

Green said some of the projects being cut or delayed were created under NHGRI’sstrategic plan, a program it laid out in 2011 that involves restructuring of the institute’s divisions and some shifting in its research portfolio to include more efforts in applying genomics to medicine and healthcare.

“Some of these RFAs that we delayed really represent key elements that we started to anticipate two years ago,” said Green. “We knew we wanted to do more in sequencing, we knew we wanted to do some pilot projects in genomic medicine. We knew we wanted to continue to accelerate efforts in understanding how the genome works … ENCODE, GGR, and so forth. It just had to be slowed down,” he said.

Anastasia Wise, program director for the Genomic Sequencing and Newborn Screening Disorders program, told GWDN that the program was supposed to be much larger than the $5 million in awards unveiled last week, which funded a consortium of four research projects.

Wise said NHGRI and NICHD were each initially planning to provide double the amount of funding they were actually awarded, which is now expected to be a total of $25 million over five years, although that total could be subject to the availability of funding.

“There were definitely more scientifically meritorious applications than we were able to fund,” she said. “Even the four awards that we made ended up being cut an additional five percent because of the sequestration.”

She said the program “wanted to be able to make more awards, and we wanted to be able to fund a coordinating center to be able to bring the network together and help provide some harmonization of data and coordination of logistics between the different members of the consortium,” but it was unable to fund that part of the effort.

Although the fractured fiscal culture in Washington engenders caution at NHGRI as the agency looks forward, Green sees many scientific opportunities right now, as genomics begins to hit the clinic.

“Some people are saying we are not even going fast enough,” he said. “Lots of people have been discussing what the world is going to look like when somebody gets their genome sequenced in the newborn period, and [they] think about what the implications of that are for the patient for the rest of their lives. We want to start studying this,” he said.

“And we are starting to … but we’re not starting as aggressively as we wanted to,” Green said. “I mean, we took a big hit this year.”

Matt Jones is a staff reporter for GenomeWeb Daily News. He covers public policy, legislation, and funding issues that affect researchers in the genomics field, as well as the operations of research institutes. E-mail Matt Jones or follow GWDN’s headlines at @DailyNewsGW.

Centers of Excellence in Genomic Sciences (CEGS): NHGRI to Fund New Center (CEGS) on the Brain: Mental Disorders and the Nervous System

NEW YORK (GenomeWeb News) – The National Human Genome Research Institute plans to fund new Centers of Excellence in Genomic Sciences, or CEGS, to create interdisciplinary teams that pursue innovative genome-based approaches to address biomedical problems and to understanding the basis of biological systems.

NHGRI, along with support from the National Institute of Mental Health, expects to provide up to $2 million per year for each of the new CEGS it funds, and plans to award up to four new awards each year.

Although these CEGS may pursue a wide range of research objectives, NIMH will support the program because it wants to fund research using novel genomic approaches that can accelerate the understanding of the genetic basis of mental disorders and the nervous system, NHGRI said on Friday.

The CEGS program was created to use the new knowledge and technologies that resulted from the Human Genome Project and subsequent genomics research to develop new tools, methods, and concepts that apply to human biology and disease.
CEGS grantees are expected to be innovative, to focus on a critical issue in genomic science, to use multiple investigators working under one leader, to work toward a specific outcome, and to tackle challenging aspects of problems that may have impeded previous research efforts.

Further, they are supposed to bolster the pool of professional scientists and engineers who are trained in genomics through offering educational programs, and they are expected to address the shortage of scientists from underrepresented minority communities by developing recruiting programs that encourage minority community members to become independent genomics investigators.

The technologies and methods the CEGS investigators develop should be applicable to a wide range of cell types and organisms, and they should be scalable and expandable so they may apply to other model systems, according to NHGRI’s funding opportunity announcement.

Center for In Toto Genomic Analysis of Vertebrate Development

This Center of Excellence in Genomic Science (CEGS) assembles a multidisciplinary group of investigators to develop innovative technologies with the goal of imaging and mutating every developmentally important vertebrate gene. Novel “in toto imaging” tools make it possible to use a systems-based approach for analysis of gene function in developing vertebrate embryos in real time and space. These tools can digitize in vivo data in a systematic, high-throughput, and quantitative fashion. Combining in toto imaging with novel gene traps permits a means to rapidly screen for developmentally relevant expression patterns, followed by the ability to immediately mutagenize genes of interest. Initially, key technologies will be developed and tested in the zebrafish embryo due to its transparency and the ability to obtain rapid feedback. Once validated, these techniques will be applied to an amniote, the avian embryo, due to several advantages including accessibility and similarity to human embryogenesis. Finally, to monitor alterations in gene expression in normal and mutant embryos, we will develop new techniques for in situ hybridization that permit simultaneous analysis of multiple marker genes in a sensitive and potentially quantitative manner. Our goal is to combine real time analysis of gene expression on a genome-wide scale coupled with the ability to mutate genes of interest and examine global alterations in gene expression as a result of gene loss. Much of the value will come from the development of new and broadly applicable technologies. In contrast to a typical technology development grant, however, there will be experimental fruit emerging from at least two vertebrate systems (zebrafish and avian). The following aims will be pursued: Specific Aim 1: Real-time “in toto” image analysis of reporter gene expression; Specific Aim 2: Comprehensive spatiotemporal analysis of gene function of the developing vertebrate embryo using the FlipTrap approach for gene trapping; Specific Aim 3: Design of quantitative, multiplexed ‘hybridization chain reaction’ (HCR) amplifiers for in vivo imaging with active background suppression; Specific Aim 4: Data analysis and integration of data sets to produce a “digital” fish and a “digital” bird. The technologies and the resulting atlases will be made broadly available via electronic publication.

Causal Transcriptional Consequences of Human Genetic Variation

P50 HG005550
George M. Church
Harvard University, Cambridge, Mass.

The Center for Transcriptional Consequences of Human Genetic Variation (CTCHGV) will develop innovative and powerful genetic engineering methods and use them to identify genetic variations that causally control gene transcription levels. Genome Wide Association Studies (GWAS) find many variations associated with disease and other phenotypes, but the variations that may actually cause these conditions are hard to identify because nearby variations in the same haplotype blocks consistently co-occur with them in human populations, so that specifically causative ones cannot be distinguished. About 95% of GWAS variations are not in gene coding regions, and many of these presumably associate with altered gene expression levels. CTCHGV will identify the variations that directly control gene expression by engineering precise combinations of changes to gene regulatory regions that break down the haplotype blocks, allowing each variations’ effect on gene expression to be discerned independently of the others. To perform this analysis, CTCHGV will extract ~100kbps gene regulatory regions from human cell samples, create precise variations in them in E. coli, and re-introduce the altered regions back into human cells, using zinc finger nucleases (ZFNs) to efficiently induce recombination. CTCHGV will target 1000 genes for this analysis (Aim 1), and will use human induced Pluripotent Stem cells (iPS) to study the effects of variations in diverse human cell types (Aim 2). To explore the effects of variations in complex human tissues, CTCHGV will develop methods of measuring gene expression at transcriptome-wide levels in many single cells, including in situ in structured tissues (Aim 3). Finally, CTCHGV will develop novel advanced technologies that integrate DNA sequencing and synthesis to construct thousands of large DNA constructs from oligonucleotides, that enable very precise targeting and highly efficient performance of ZFNs, and that enable cells to be sorted on the basis of morphology as well as fluorescence and labeling (Aim 4). CTCHGV will also develop direct oligo-mediated engineering of human cells, and create “marked allele” iPS that will enable easy ascertainment of complete exon distributions for many pairs of gene alleles in many cell types.

Center for the Epigenetics of Common Human Disease

Epigenetics, the study of non-DNA sequence-related heredity, is at the epicenter of modern medicine because it can help to explain the relationship between an individual’s genetic background, the environment, aging, and disease. The Center for the Epigenetics of Common Human Disease was created in 2004 to begin to develop the interface between epigenetics and epidemiologic-based phenotype studies, recognizing that epigenetics requires new ways of thinking about disease. We created a highly interdisciplinary group of faculty and trainees, including molecular biologists, biostatisticians, epidemiologists, and clinical investigators. We developed novel approaches to genome-wide DNA methylation (DNAm) analysis, allele-specific expression, and new statistical epigenetic tools. Using these tools, we discovered that most variable DNAm is in neither CpG islands nor promoters, but in what we term “CpG island shores,” regions of lower CpG density up to several kb from islands, and we have found altered DNAm in these regions in cancer, depression and autism. In the renewal period, we will develop the novel field of epigenetic epidemiology, the relationship between epigenetic variation, genetic variation, environment and phenotype. We will continue to pioneer genome-wide epigenetic technology that is cost effective for large scale analysis of population-based samples, applying our knowledge from the current period to second-generation sequencing for epigenetic measurement, including DNAm and allele-specific methylation. We will continue to pioneer new statistical approaches for quantitative and binary DNAm assessment in populations, including an Epigenetic Barcode. We will develop Foundational Epigenetic Epidemiology, examining: time-dependence, heritability and environmental relationship of epigenetic marks; heritability in MZ and DZ twins; and develop an epigenetic transmission disequilibrium test. We will then pioneer Etiologic Epigenetic Epidemiology, by integrating novel genome-wide methylation scans (GWMs) with existing Genome-Wide Association Study (GWAS) and epidemiologic phenotype data, a design we term Genome-Wide Integrated Susceptibility (GWIS), focusing on bipolar disorder, aging, and autism as paradigms for epigenetic studies of family-based samples, longitudinal analyses, and parent-of-origin effects, respectively. This work will be critical to realizing the full value of previous genetic and phenotypic studies, by developing and applying molecular and statistical tools necessary to integrate DNA sequence with epigenetic and environmental causes of disease.

Genomic Basis of Vertebrate Diversity

P50 HG002568
David M. Kingsley
Stanford University, Stanford, Calif.

The long-term goal of this project is to understand the genomic mechanisms that generate phenotypic diversity in vertebrates. Rapid progress in genomics has provided nearly complete sequences for several organisms. Comparative analysis suggests many fundamental pathways and gene networks are conserved between organisms. And yet, the morphology, physiology, and behavior of different species are obviously and profoundly different. What are the mechanisms that generate these key differences? Are unique traits controlled by few or many genetic changes? What kinds of changes? Are there particular genes and mechanisms that are used repeatedly when organisms adapt to new environments? Can better understanding of these mechanisms help explain dramatic differences in disease susceptibility that also exist between groups? The Stanford CEGS will use an innovative combination of approaches in fish, mice, and humans to identify the molecular basis of major phenotypic change in natural populations of vertebrates. Specific aims include: 1) cross stickleback fish and develop a genome wide map of the chromosomes, genes, and mutations that control a broad range of new morphological, physiological, and behavioral traits in natural environments; 2) test which population genetic measures provide the most reliable “signatures of selection” surrounding genes that are known to have served as the basis of parallel adaptive change in many different natural populations around the world; 3) assemble the stickleback proto Y chromosome and test whether either sex or autosomal rearrangements play an important role in generating phenotypic diversity, or are enriched in genomic regions that control phenotypic change; 4) test whether particular genes and mechanisms are used repeatedly to control phenotypic change in many different vertebrates. Preliminary data suggests that mechanisms identified as the basis of adaptive change in natural fish populations may be broadly predictive of adaptive mechanisms across a surprisingly large range of animals, including humans. Genetic regions hypothesized to be under selection in humans will be compared to genetic regions under selection in fish. Regions predicted to play an important role in natural human variation and disease susceptibility will be modeled in mice, generating new model systems for confirming functional variants predicted from human population genetics and comparative genomics.

Microscale Life Sciences Center

P50 HG002360
Deirdre R. Meldrum
Arizona State University, Tempe

Increasingly, it is becoming apparent that understanding, predicting, and diagnosing disease states is confounded by the inherent heterogeneity of in situ cell populations. This variation in cell fate can be dramatic, for instance, one cell living while an adjacent cell dies. Thus, in order to understand fundamental pathways involved in disease states, it is necessary to link preexisting cell state to cell fate in the disease process at the individual cell level.

The Microscale Life Sciences Center (MLSC) at the University of Washington is focused on solving this problem, by developing cutting-edge microscale technology for high throughput genomic-level and multi-parameter single-cell analysis, and applying that technology to fundamental problems of biology and health. Our vision is to address pathways to disease states directly at the individual cell level, at increasing levels of complexity that progressively move to an in vivo understanding of disease. We propose to apply MLSC technological innovations to questions that focus on the balance between cell proliferation and cell death. The top three killers in the United States, cancer, heart disease and stroke, all involve an imbalance in this cellular decision-making process. Because of intrinsic cellular heterogeneity in the live/die decision, this fundamental cellular biology problem is an example of one for which analysis of individual cells is essential for developing the link between genomics, cell function, and disease. The specific systems to be studied are proinflammatory cell death (pyroptosis) in a mouse macrophage model, and neoplastic progression in the Barrett’s Esophagus (BE) precancerous model. In each case, diagnostic signatures for specific cell states will be determined by measuring both physiological (cell cycle, ploidy, respiration rate, membrane potential) and genomic (gene expression profiles by single-cell proteomics, qRT-PCR and transcriptomics; LOH by LATE-PCR) parameters. These will then be correlated with cell fate via the same sets of measurements after a challenge is administered, for instance, a cell death stimulus for pyroptosis or a predisposing risk factor challenge (acid reflux) for BE. Ultimately, time series will be taken to map out the pathways that underlie the live/die decision.

Finally, this information will be used as a platform to define cell-cell interactions at the single-cell level, to move information on disease pathways towards greater in vivo relevance. New technology will be developed and integrated into the existing MLSC Living Cell Analysis cassette system to support these ambitious biological goals including 1) automated systems for cell placement, off-chip device interconnects, and high throughput data analysis with user friendly interfaces; 2) new optical and electronic sensors based on a new detection platform, new dyes and nanowires; and 3) new micromodules for single-cell qRT-PCR, LATE-PCR for LOH including single-cell pyrosequencing, on-chip single-cell proteomics, and single-cell transcriptomics using barcoded nanobeads.

Wisconsin Center of Excellence in Genomics Science

P50 HG004952
Michael Olivier
Medical College of Wisconsin, Milwaukee

The successful completion of the human genome and model organism sequences has ushered in a new era in biological research, with attention now focused on understanding the way in which genome sequence information is expressed and controlled. The focus of this proposed Wisconsin Center of Excellence in Genomics Science is to facilitate understanding of the complex and integrated regulatory mechanisms affecting gene transcription by developing novel technology for the comprehensive characterization and quantitative analysis of proteins interacting with DNA. This new technology will help provide for a genome-wide functional interpretation of the underlying mechanisms by which gene transcriptional regulation is altered during biological processes, development, disease, and in response to physiological, pharmacological, or environmental stressors. The development of chromatin immunoprecipitation approaches has allowed identification of the specific DNA sequences bound by proteins of interest. We propose to reverse this strategy and develop an entirely novel technology that will use oligonucleotide capture to pull down DNA sequences of interest, and mass spectrometry to identify and characterize the proteins and protein complexes bound and associated with particular DNA regions. This new approach will create an invaluable tool for deciphering the critical control processes regulating an essential biological function. The proposed interdisciplinary and multi-institutional Center of Excellence in Genomics Science combines specific expertise at the Medical College of Wisconsin, the University of Wisconsin Madison, and Marquette University. Technological developments in four specific areas will be pursued to develop this new approach: (1) cross-linking of proteins to DNA and fragmentation of chromatin; (2) capture of the protein-DNA complexes in a DNA sequence-specific manner; (3) mass spectrometry analysis to identify and quantify bound proteins; and (4) informatics to develop tools enabling the global analysis of the relationship between changes in protein-DNA interactions and gene expression. The Center will use carefully selected biological systems to develop and test the technology in an integrated genome-wide analysis platform that includes efficient data management and analysis tools. As part of the Center mission, we will combine our technology development efforts with an interdisciplinary training program for students and fellows designed to train qualified scientists experienced in cutting-edge genomics technology. Data, technology, and software will be widely disseminated by multiple mechanisms including licensing and commercialization activities.

Collaborating Institutions: University of Wisconsin-Madison, Marquette University

CISGen

P50 MH090338
Fernando Pardo-Manuel de Villena
University of North Carolina, Chapel Hill

p>In this application, we propose a highly ambitious yet realistically attainable goal: to align existing expertise at UNC-Chapel Hill into a CEGS called CISGen. The overarching purpose of CISGen is to develop as a resource and to exploit the utility of the murine Collaborative Cross (CC) mouse model of the heterogeneous human population to delineate genetic and environmental determinants of complex phenotypes drawn from psychiatry, which are among the most intractable set of problems in all of biomedicine. Psychiatric disorders present a paradox – the associated morbidity, mortality, and costs are enormous and yet, despite over a century of scientific study, there are few hard facts about the etiology of the core diseases. Although our GWAS meta- analyses are in progress, early results suggest that strong and replicable findings may be elusive. Therefore, our proposal provides a complementary approach to the study of fundamental psychiatric phenotypes.

We propose a particularly challenging definition of success – we will identify high probability etiological models (which can be realistically complex) and then prove the predictive capacity of these models by generating novel strains of mice predicted to be at very high risk for the phenotype. Once validated, these high confidence models can then be tested in subsequent human studies – we do not propose human extension studies in CISGen but this is achievable for the investigators and their colleagues. Data collected in CISGen would be a valuable resource to the wider scientific community and could be applied to a large set of biological problems and these data can rapidly add to the knowledge base for any new genomewide association study (GWAS) finding. Delivery of sophisticated and user-friendly databases are a key component of CISGen.

Accomplishing this overarching goal requires an exceptional diversity of scientific expertise – psychiatry, human genetics, mouse behavior, mouse genetics, statistical genetics, computational biology, and systems biology. Experts in these disciplines are deeply involved in CISGen and are committed to the projects described herein. Successful integration of these diverse fields is non-trivial; however, all scientists on this application have had extensive interactions over the past five years, already know how to work together, and have a working knowledge of their colleagues’ expertise. UNC-Chapel Hill has an intense commitment to inter- disciplinary genomics research and provides a fertile backdrop for 21st century projects like CISGen.

Collaborating Institutions: The Jackson Laboratory, North Carolina State University, University of Texas at Arlington

Center for Cell Circuits

P50 HG006193
Aviv Regev
The Broad Institute, Inc., Cambridge, Mass.

Systematic reconstruction of genetic and molecular circuits in mammalian cells remains a significant, largescale and unsolved challenge in genomics. The urgency to address it is underscored by the sizeable number of GWAS-derived disease genes whose functions remain largely obscure, limiting our progress towards biological understanding and therapeutic intervention. Recent advances in probing and manipulating cellular circuits on a genomic scale open the way for the development of a systematic method for circuit reconstruction. Here, we propose a Center for Cell Circuits to develop the reagents, technologies, algorithms, protocols and strategies needed to reconstruct molecular circuits. Our preliminary studies chart an initial path towards a universal strategy, which we will fully implement by developing a broad and integrated experimental and computational toolkit. We will develop methods for comprehensive profiling, genetic perturbations and mesoscale monitoring of diverse circuit layers (Aim 1). In parallel, we will develop a computational framework to analyze profiles, derive provisional models, use them to determine targets for perturbation and monitoring, and evaluate, refine and validate circuits based on those experiments (Aim 2). We will develop, test and refine this strategy in the context of two distinct and complementary mammalian circuits. First, we will produce an integrated, multi-layer circuit of the transcriptional response to pathogens in dendritic cells (Aim 3) as an example of an acute environmental response. Second, we will reconstruct the circuit of chromatin factors and non-coding RNAs that control chromatin organization and gene expression in mouse embryonic stem cells (Aim 4) as an example of the circuitry underlying stable cell states. These detailed datasets and models will reveal general principles of circuit organization, provide a resource for scientists in these two important fields, and allow computational biologists to test and develop algorithms. We will broadly disseminate our tools and methods to the community, enabling researchers to dissect any cell circuit of interest at unprecedented detail. Our work will open the way for reconstructing cellular circuits in human disease and individuals, to improve the accuracy of both diagnosis and treatment.

Analysis of Human Genome Using Integrated Technologies

P50 HG002357
Michael P. Snyder
Yale University, New Haven, Conn.

We propose to establish a center to build genomic DNA arrays and develop novel technologies that will use these arrays for the large-scale functional analysis of the human genome. 0.3-1.4 kb fragments of nonrepetitive DNA from each of chromosomes 22, 21, 20, 19,7, 17, and perhaps the X chromosome will be prepared by PCR and attached to microscope slides. The arrays will be used to develop technologies for the large-scale mapping of 1) Transcribed sequences. 2) Binding sites of chromosomal proteins. 3) Origins of replication. 4) Genetic mutation and variation. A web-accessible database will be constructed to house the information generated in this study; data from other studies will also be integrated into the database. The arrays and technologies will be made available throughout both the Yale University and the larger scientific community. They will be integrated into our training programs for postdoctoral fellows, graduate students and undergraduates at Yale. We expect these procedures to be applicable to the analysis of the entire human genome and the genomes of many other organisms.

Genomic Analysis of the Genotype-Phenotype Map

Our Center, which started in 2003, focused on implications of haplotype structure in the human genome. Since that time, there have been extraordinary advances in genomics: Genome-wide association studies using single nucleotide polymorphisms and copy number variants are now commonplace, and we are rapidly moving towards whole-genome sequence data for large samples of individuals. Our Center has undergone similar dramatic changes. While the underlying theme remains the same — making sense of genetic variation — our focus is now explicitly on how we can use the heterogeneous data produced by modern genomics technologies to achieve such an understanding. The overall goal of our proposal is to develop an intellectual framework, together with computational and statistical analysis tools, for illuminating the path from genotype to phenotype, and for predicting the latter from the former. We will address three broad questions related to this problem: 1) How do we infer mechanisms by which genetic variation leads to changes in phenotype? 2) How do we improve the design, understanding and interpretation of association studies by exploiting prior information? 3) How do we identify general principles about the genotype-phenotype map? We will approach these questions through a series of interrelated projects that combine computational and experimental methods, explored in Arabidopsis, Drosophila and human, and involve a wide range of researchers including molecular biologists, population geneticists, genetic epidemiologists, statisticians, computer scientists, and mathematicians.

Genomic Analysis of Network Perturbations in Human Disease

P50 HG004233
Marc Vidal
Dana-Farber Cancer Institute, Boston

Genetic differences between individuals can greatly influence their susceptibility to disease. The information originating from the Human Genome Project (HGP), including the genome sequence and its annotation, together with projects such as the HapMap and the Human Cancer Genome Project (HCGP) have greatly accelerated our ability to find genetic variants and associate genes with a wide range of human diseases. Despite these advances, linking individual genes and their variations to disease remains a daunting challenge. Even where a causal variant has been identified, the biological insight that must precede a strategy for therapeutic intervention has generally been slow in coming. The primary reason for this is that the phenotypic effects of functional sequence variants are mediated by a dynamic network of gene products and metabolites, which exhibit emergent properties that cannot be understood one gene at a time. Our central hypothesis is that both human genetic variations and pathogens such as viruses influence local and global properties of networks to induce “disease states.” Therefore, we propose a general approach to understanding cellular networks based on environmental and genetic perturbations of network structure and readout of the effects using interactome mapping, proteomic analysis, and transcriptional profiling. We have chosen a defined model system with a variety of disease outcomes: viral infection. We will explore the concept that one must understand changes in complex cellular networks to fully understand the link between genotype, environment, and phenotype. We will integrate observations from network-level perturbations caused by particular viruses together with genome-wide human variation datasets for related human diseases with the goal of developing general principles for data integration and network prediction, instantiation of these in open-source software tools, and development of testable hypotheses that can be used to assess the value of our methods. Our plans to achieve these goals are summarized in the following specific aims: 1. Profile all viral-host protein-protein interactions for a group of viruses with related biological properties. 2. Profile the perturbations that viral proteins induce on the transcriptome of their host cells. 3. Combine the resulting interaction and perturbation data to derive cellular network-based models. 4. Use the developed models to interpret genome-wide genetic variations observed in human disease, 5. Integrate the bioinformatics resources developed by the various CCSG members within a Bioinformatics Core for data management and dissemination. 6. Building on existing education and outreach programs, we plan to develop a genomic and network centered educational program, with particular emphasis on providing access for underrepresented minorities to internships, workshop and scientific meetings.

Recently studies on structural abnormalities of chromosomes (Mosaicism) were conducted by two consortia, one led by scientists at the National Cancer Institute (NCI), and one by Gene Environment Association Studies (GENEVA). This study was sponsored by the National Human Genome Research Institute (NHGRI). These studies have found that mosaicism can be detected in a small fraction of people without a prior history of cancer. Mosaicism results from a DNA alteration that is present in some of the body’s cells but not in others. A person with mosaicism has a mixture of normal and mutated cells. “These two studies provide large population-based evidence that genetic mosaicism increases with age and could be a risk factor for cancer” which may mean that detection of genetic mosaicism could be an early marker for detecting cancer, or perhaps other chronic diseases,” said Stephen Chanock, M.D., co-author and chief, Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, NCI.

Scientists began observing an unexpected frequency of structural abnormalities in chromosomes during quality control checks of data from genome-wide association studies (GWAS) conducted in the GENEVA consortium and similar programs at NCI. These studies involve comparing hundreds of thousands of common differences across individual patients’ DNA to see if any of those variants are associated with a known trait, such as cancer. At first, these abnormalities were thought to be errors or outcomes of laboratory procedures. But they were found consistently at a low frequency, so the scientists wondered with what frequency these structural abnormalities occurred in the general population.

The NCI-led study observed that genetic mosaic abnormalities were more frequent in individuals with solid tumors (0.97 percent vs. 0.74 percent in cancer-free individuals). The NCI study also observed mosaic chromosomal abnormalities in slightly less than 1 percent of the study participants, but noted that the frequency of detectable genetic mosaicism increased with age. This was consistent with GENEVA results that found genetic mosaicism increased in those over the age of 50.

In both studies, scientists observed an increase in the detection of genetic mosaicism in patients with hematological cancers (leukemia, lymphoma and myeloma), for which DNA was collected at least one year prior to diagnosis, compared to cancer-free individuals. Results from the NCI study showed that risk of leukemia was also substantially higher among people with these chromosomal alterations while the GENEVA study showed that the risk of acquiring a hematological cancer diagnosis was 10 times higher for people who had mosaic chromosomal abnormalities. The results of both studies suggest that mosaicism, observed in older people, may be an asymptomatic condition — not often causing overt illness — that may predispose them to hematological cancer. However, GENEVA and NCI scientists stress that the event numbers analyzed are small, and additional studies are needed across a broader diversity of populations to establish the clinical significance of these findings.

NIH scientists say these findings will have important implications for the design and analysis of molecular studies of cancer, as well as ongoing studies looking at the characterization of cancer genomes, such as NIH’s The Cancer Genome Atlas and the International Cancer Genome Consortium.

NIH scientists recommended that additional analyses be conducted in groups of currently healthy people so that investigators may follow them over time for health outcomes.

The results of the studies were published online May 6, 2012, in Nature Genetics.