In a three part series:
Part IIA. CRACKING THE CODE OF HUMAN LIFE: Milestones along the Way
Part IIB. CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics
Part IIC. CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease

Part III will conclude with Ubiquitin, it’s Role in Signaling and Regulatory Control.Part I reviewed the huge expansion of the biological research enterprise after the Second World War. It concentrated on the

discovery of cellular structures,

metabolic function, and

creation of a new science of Molecular Biology.

Part II follows the race to delineation of the Human Genome, discovery methods and fundamental genomic patterns that are ancient in both animal and plant speciation. But it explores both the complexity and the systems view of the architecture that underlies and understanding of the genome.

These articles review a web-like connectivity between inter-connected scientific discoveries, as significant findings have led to novel hypotheses and many expectations over the last 75 years. This largely post WWII revolution has driven our understanding of biological and medical processes at an exponential pace owing to successive discoveries of

chemical structure,

the basic building blocks of DNA and proteins,

nucleotide and protein-protein interactions,

protein folding, allostericity,

genomic structure,

DNA replication,

nuclear polyribosome interaction, and

metabolic control.

In addition, the emergence of methods for

copying,

removal,

insertion,

improvements in structural analysis

developments in applied mathematics that have transformed the research framework.

Part IIA:

CRACKING THE CODE OF HUMAN LIFE:

Milestones along the Way

A NOVA interview with Francis Collins (NHGRI) (FC), J. Craig Venter (CELERA)(JCV), and Eric Lander (EL).
RK: For the past ten years, scientists all over the world have been painstakingly trying to read the tiny instructions buried inside our DNA. And now, finally, the “Human Genome” has been decoded.
EL: The genome is a storybook that’s been edited for a couple billion years.
The following will address the odd similarity of genes between man and yeast

EL: In the nucleus of your cell the DNA molecule resides that is about 10 angstroms wide curled up, but the amount of curling is limited by the negative charges that repel one another, but there are folds upon folds. If the DNA is stretched the length of the DNA would be thousands of feet.
EL: We have known for 2000 years that your kids look a lot like you. Well it’s because you must pass them instructions that give them the eyes, the hair color, and the nose shape they have. RK: Cracking the code of those minuscule differences in DNA that influence health and illness is what the Human Genome Project is all about. Since 1990, scientists all over the world have been involved in the effort to read all three billion As, Ts, Gs, and Cs of human DNA. It took 10 years to find the one genetic mistake that causes cystic fibrosis. Another 10 years to find the gene for Huntington’s disease. Fifteen years to find one of the genes that increase the risk for breast cancer. One letter at a time, painfully slowly… And then came the revolution. In the last ten years the entire process has been computerized. The computations can do a thousand every second and that has made all the difference. EL: This is basically a parts list with a lot of parts. If you take an airplane, a Boeing 777, I think it has like 100,000 parts. If I gave you a parts list for the Boeing 777 in one sense you’d know 100,000 components, screws and wires and rudders and things like that. But you wouldn’t know how to put it together, or why it flies. We now have a parts list, and that’s not enough to understand why it flies.

The Human Genome (Photo credit: dullhunk)

A Quest For Clarity

Tracy Vence is a senior editor of Genome Technology
Tracy Vence @GenomeTechMag
Projects supported by the US National Institutes of Health will have produced 68,000 total human genomes — around 18,000 of those whole human genomes — through the end of this year, National Human Genome Research Institute estimates indicate. And in his book, The Creative Destruction of Medicine, the Scripps Research Institute’s Eric Topol projects that 1 million human genomes will have been sequenced by 2013 and 5 million by 2014.Daniel MacArthur, a group leader in Massachusetts General Hospital’s Analytic and Translational Genetics Unit estimates that “From a capacity perspective … millions of genomes are not that far off. If you look at the rate that we’re scaling, we can certainly achieve that.” The prospect of so many genomes has brought clinical interpretation into focus. But there is an important distinction to be made between the interpretation of an apparently healthy person’s genome and that of an individual who is already affected by a disease.
In an April Science Translational Medicine paper, Johns Hopkins University School of Medicine‘s Nicholas Roberts and his colleagues reported that personal genome sequences for healthy monozygotic twin pairs are not predictive of significant risk for 24 different diseases in those individuals. The researchers concluded that whole-genome sequencing was not likely to be clinically useful. Ambiguities have clouded even the most targeted interpretation efforts.

Technological challenges,

meager sample sizes,

a need for increased,

fail-safe automation and most important

a lack of community-wide standards for the task.

have hampered researchers’ attempts to reliably interpret the clinical significance of genomic variation.

How signals from the cell surface affect transcription of genes in the nucleus.

James Darnell, Jr., MD, Astor Professor, Rockefeller
After graduation from Washington University School of Medicine he worked with Francois Jacob at the Pasteur Institute in Paris and served as Vice President for Academic Affairs at Rockefeller in 1990-91. He is the coauthor with S.E. Luria of General Virology and the founding author with Harvey Lodish and David Baltimore of Molecular Cell Biology, now in its sixth edition. His book RNA, Life’s Indispensable Molecule was published in July 2011 by Cold Spring Harbor Laboratory Press. A member of the National Academy of Sciences since 1973, recipient of numerous awards, including the 2003 National Medal of Science, the 2002 Albert Lasker Award.
Using interferon as a model cytokine, the Darnell group discovered that cell transcription was quickly changed by binding of cytokines to the cell surface. The bound interferon led to the tyrosine phosphorylation of latent cytoplasmic proteins now called STATs (signal transducers and activators of transcription) that dimerize by

reciprocal phosphotyrosine-SH2 interchange.

accumulate in the nucleus,

bind DNA and drive transcription.

This pathway has proved to be of wide importance with seven STATs now known in mammals that take part in a wide variety of developmental and homeostatic events in all multicellular animals. Crystallographic analysis defined functional domains in the STATs, and current attention is focused on two areas:

how the STATs complete their cycle of activation and inactivation, which requires regulated tyrosine dephosphorylation; and how

persistent activation of STAT3 that occurs in a high proportion of many human cancers contributes to blocking apoptosis in cancer cells.

Current efforts are devoted to inhibiting STAT3 with modified peptides that can enter cells.

Cell cycle regulation and the cellular response to genotoxic stress

Stephen J Elledge, PhD, Gregor Mendel Professor of Genetics and Medicine, Investigator, Howard Hughes Medical Institute, Harvard Medical School
As a postdoctoral fellow at Stanford working on eukaryotic homologous recombination, he serendipitously found a family of genes known as ribonucleotide reductases. He subsequently showed that

these genes are activated by DNA damage and

could serve as tools to help scientists dissect the signaling pathways

through which cells sense and respond to DNA damage and replication stress.

At Baylor College of Medicine he made a second major breakthrough with the discovery of the cyclin-dependent kinase 2 gene (Cdk2), which

controls the G1-to-S cell cycle transition,

an entry checkpoint for the cell proliferation cycle and

a critical regulatory step in tumorigenesis.

From there, using a novel “two-hybrid” cloning method he developed, Elledge and Wade Harper, PhD, proceeded to

isolate several members of the Cdk2-inhibitory family.

Their discoveries included the p21 and p57 genes, mutations in the latter (responsible for Beckwith-Wiedemann syndrome), characterized by somatic overgrowth and increased cancer risk. Elledge is also recognized for his work in understanding

proteome remodeling through ubiquitin-mediated proteolysis.

they identified F-box proteins that regulate protein degradation in the cell by

binding to specific target protein sequences and then

marking them with ubiquitin for destruction by the cell’s proteasome machinery.

This breakthrough resulted in

the elucidation of the cullin ubiquitin ligase family,

which controls regulated protein stability in eukaryotes.

Elledge’s recent research has focused on the cellular mechanisms underlying DNA damage detection and cancer using genetic technologies. In collaboration with Cold Spring Harbor Laboratory researcher Gregory Hannon, PhD, Elledge has generated complete human and mouse short hairpin RNA (shRNA) libraries for genome-wide loss-of-function studies. Their efforts have led to

the identification of a number of tumor suppressor proteins

genes upon which cancer cells uniquely depend for survival.

This work led to the development of the “non-oncogene addiction” concept. This is noted as follows:

Playing the dual roles of inventor and investigator, Elledge developed original techniques to define

what drives the cell cycle and

how cells respond to DNA damage.

By using these tools, he and his colleagues have identified multiple genes involved in cell-cycle regulation.

Elledge’s work has earned him many awards, including a 2001 Paul Marks Prize for Cancer Research and a 2003 election to the National Academy of Sciences. In his Inaugural Article (1), published in this issue of PNAS, Elledge and his colleagues describe the function of Fbw7, a protein involved in controlling cell proliferation (see below). Elledge studied the error-prone DNA repair mechanism in E-Coli (Escherichia coli) called SOSmutagenesis for his PhD thesis at MIT. His work identified and described

the regulation of a group of enzymes now known as error-prone polymerases,

the first members of which were the umuCD genes in E. coli.

It was then that he developed a new cloning tool. Elledge invented a technique that allowed him to approach future cloning problems of this type with great rapidity. With the new technique, “you could make large libraries in lambda that behave like plasmids. We called them `phasmid’ vectors, like plasmid and phage together”. The phasmid cloning method was an early cornerstone for molecular biology research.

Elledge began working on homologous recombination in postdoctoral fellowship at Stanford University, an important niche in the field of eukaryotic genetics. Working with the yeast genome, Elledge searched for rec A, a gene that allows DNA to recombine homologously. Although he never located rec A, he discovered a family of genes known as ribonucleotide reductases (RNRs), which are involved in DNA production. Rec A and RNRs share the same last 4 amino acids, which caused an antibody crossreaction in one of Elledge’s experiments. Initially disappointed with the false positives in his hunt for rec A, Elledge was later delighted with his luck. He found that

RNRs are turned on by DNA damage, and

these genes are regulated by the cell cycle.

Prior to leaving Stanford, Elledge attended a talk at the University of California, San Francisco, by Paul Nurse, a leader in cell-cycle research who would later win the 2001 Nobel Prize in medicine. Nurse described his success in isolating the homolog of a key human cell-cycle kinase gene, Cdc2, by using a mutant strain of yeast (8). Although Nurse’s methods were primitive, Elledge was struck by the message he carried: that

cell-cycle regulation was functionally conserved, and

many human genes could be isolated by looking for complimentary genes in yeast.

Elledge then took advantage of his past successes in building phasmid vectors to build a versatile human cDNA library that could be expressed in yeast. After setting up a laboratory at Baylor, he introduced this library into yeast, screening for complimentary cell-cycle genes. He quickly identified the same Cdc2 gene isolated by Nurse. However, Elledge also discovered a related gene known as Cdk2. Elledge subsequently found that

Cdk2 controlled the G1 to S cell-cycle transition, a step that often goes awry in cancer. These results were published in the EMBO Journal in 1991.

He then continued to use

RNRs to perform genetic screens to

identify genes involved in sensing and responding to DNA damage.

He subsequently worked out the

signal transduction pathways in both yeast and humans that recognize damaged DNA and replication problems.

These “checkpoint” pathways are central to the

prevention of genomic instability and a key to understanding tumorigenesis.

This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected on April 29, 2003.

The mammalian F-box protein Fbw7 and its Caenorhabditis elegans counterpart Sel-10 have been implicated in

the ubiquitin-mediated turnover of cyclin E

as well as the Notch Lin-12 family of transcriptional activators. Both unregulated

Notch and cyclin E

promote tumorigenesis, and

inactivate mutations in human

Fbw7 studies suggest that it may be a tumor suppressor. To generate an in vivo system to assess the consequences of such unregulated signaling, we generated mice deficient for Fbw7. Fbw7-null mice die around 10.5 days post coitus because of a combination of deficiencies in hematopoietic and vascular development and heart chamber mutations. The absence of Fbw7 results in elevated levels of cyclin E, concurrent with inappropriate DNA replication in placental giant trophoblast cells. Moreover, the levels of both Notch 1 and Notch 4 intracellular domains were elevated, leading to stimulation of downstream transcriptional pathways involving Hes1, Herp1, and Herp2. These data suggest essential functions for Fbw7 in controlling cyclin E and Notch signaling pathways in the mouse.

Gene Switches

Nipam Patel is a professor in the Departments of Molecular and Cell Biology and Integrative Biology at UC Berkeley and runs a research laboratory that studies the role, during embryonic development, of homeotic genes (the genetic switches described in this feature). “Ghost in Your Genes” focuses on epigenetic “switches” that turn genes “on” or “off.” But not all switches are epigenetic; some are genetic. That is, other genes within the chromosome turn genes on or off. In an animal’s embryonic stage, these gene switches play a predominant role in laying out the animal’s basic body plan and perform other early functions;

the epigenome begins to take over during the later stages of embryogenesis.

Beginning as a fertilized single egg that egg becomes many different kinds of cells. Altogether, multicellular organisms like humans have thousands of differentiated cells. Each is optimized for use in the brain, the liver, the skin, and so on. Remarkably, the DNA inside all these cells is exactly the same. What makes the cells differ from one another is that different genes in that DNA are either turned on or off in each type of cell.

Take a typical cell, such as a red blood cell. Each gene within that cell has a coding region that encodes the information used to make a particular protein. (Hemoglobin shuttles oxygen to the tissues and carbon dioxide back out to the lungs—or gills, if you’re a fish.) But another region of the gene, called “regulatory DNA,” determines whether and when the gene will be expressed, or turned on, in a particular kind of cell. This precise transcribing of genes is handled by proteins known as transcription factors, which bind to the regulatory DNA, thereby generating instructions for the coding region.

One important class of transcription factors is encoded by the so called homeotic, or Hox, genes. Found in all animals, Hox genes act to “regionalize” the body along the embryo’s anterior-to-posterior (head-to-tail) axis. In a fruit fly, for example, Hox genes lay out the various main body segments—the head, thorax, and abdomen. Amazingly, all animals, from fruit flies to mice to people, rely on the same basic Hox-gene complex. Using different-colored antibody stains, we can see exactly where and to what degree Hox genes are expressed. Each Hox gene is expressed in a specific region along the anterior-to-posterior axis of the embryo.

A fly’s body has three main divisions: head, thorax, and abdomen. We’ll focus on the thorax, which itself has three main segments. In a normal adult fly, the second thoracic segment features a pair of wings, while the third thoracic segment has a pair of small, balloon-shaped structures called halteres. A modified second wing, the haltere serves as a flight stabilizer. In order for the pair of wings and the pair of halteres (as well as all other parts of the fly) to develop properly, the fly’s suite of

Hox genes must be expressed in a precise way and at precise times.

During development, the fly’s two wings grow from a structure in the larva known as the wing imaginal disk. (An imago is an insect in its final, adult state.) The haltere grows from the larval haltere imaginal disk. Remember the Ubx Hox gene? Using staining again, we can detect the gene product of Ubx. This reveals that

the Ubx gene is naturally “off” in the wing disk—

and is “on” in the haltere disk.

Now you’ll see what happens when the Ubx gene—just one of a large number of Hox genes—is turned off in the haltere disk. What if a genetic mutation caused the Ubx gene to be turned off, during the larval stage, in the third thoracic segment, the segment that normally produces the haltere? Instead of a pair of halteres, the fly has a second set of wings. With the switch of that single Hox gene, Ubx, from on to off, the third thoracic segment becomes an additional second thoracic segment and the pair of halteres became a second pair of wings. This illustrates the remarkable ability of transcription factors like Ubx to control patterning as well as cell type during development.

ENCODE

A. Data Suggests “Gene” Redefinition

As part of a huge collaborative effort called ENCODE (Encyclopedia of DNA Elements), a research team led by Cold Spring Harbor Laboratory (CSHL) Professor Thomas Gingeras, PhD, publishes a genome-wide analysis of RNA messages, called transcripts, produced within human cells.
Their analysis—one component of a massive release of research results by ENCODE teams from 32 institutes in 5 countries, with 30 papers appearing in 3 different high-level scientific journals—shows that three-quarters of the genome is capable of being transcribed. This indicates that nearly all of our genome is dynamic and active. It stands in marked contrast to consensus views prior to ENCODE’s comprehensive research efforts, which suggested that

only the small protein-encoding fraction of the genome was transcribed.

The vast amount of data generated with advanced technologies by Gingeras’ group and others in the ENCODE project changes the prevailing understanding of what defines a gene. The current outstanding question concerns

the nature and range of those functions. It is thought that these

“non-coding” RNA transcripts act something like components of a giant, complex switchboard, controlling a network of many events in the cell by

regulating the processes of

replication,

transcription

and translation

– that is, the copying of DNA and the making of proteins is based on information carried by messenger RNAs. With the understanding that so much of our DNA can be transcribed into RNA comes the realization that there is much less space between what we previously thought of as genes, Gingeras points out.

The full ENCODE Consortium data sets can be freely accessed through

the ENCODE project portal as well as at the University of California at Santa Cruz genome browser,

the National Center for Biotechnology Information, and

the European Bioinformatics Institute.

Topic threads that run through several different papers can be explored via the ENCODE microsite page at http://Nature.com/encode. Date: September 5, 2012 Source: Cold Spring Harbor Laboratory

1000 Genomes Project Team Reports on Variation Patterns

(from Phase I Data) October 31, 2012 GenomeWeb

In a study appearing online today in Nature, members of the 1000 Genomes Project Consortium presented an integrated haplotype map representing the genomic variation present in more than 1,000 individuals from 14 human populations. Using data on 1,092 individuals tested by

low-coverage whole-genome sequencing,

deep exome sequencing, and/or

dense genotyping,

the team looked at the nature and extent of the rare and common variation present in the genomes of individuals within these populations. In addition to population-specific differences in common variant profiles, for example, the researchers found distinct rare variant patterns within populations from different parts of the world — information that is expected to be important in interpreting future disease studies. They also encountered a surprising number of the variants that are expected to impact gene function, such as

non-synonymous changes,

loss-of-function variants, and, in some cases,

potentially damaging mutations.

ENCODE was designed to pick up where the Human Genome Project left off.
Although that massive effort revealed the blue­print of human biology, it quickly became clear that the instruction manual for reading the blueprint was sketchy at best. Researchers could identify in its 3 billion letters many of the regions that code for proteins, but they make up little more than 1% of the genome, contained in around 20,000 genes. ENCODE, which started in 2003, is a massive data-collection effort designed to catalogue the

‘functional’ DNA sequences,

learn when and in which cells they are active and

trace their effects on how the genome is

packaged,

regulated and

read.

After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. That phase came to a close with the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including

more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression —

and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes (see page 57)1. But the job is far from done.

Overall, the Encode data define regulatory switches that are scattered all over the three billion nucleotides of the genome. In fact, the data suggests,

the regions that lie between gene-coding sequences contain a wealth of previously unrecognized functional elements,Including

nonprotein-coding RNA transcribed sequences,

transcription factor binding sites,

chromatin structural elements, and

DNA methylation sites.

The combined results suggest that 95% of the genome lies within 8 kb of a DNA-protein interaction, and 99% lies within 1.7 kb of at least one of the biochemical events, the researchers say. Importantly, given the complex three-dimensional nature of DNA, it’s also apparent that

a regulatory element for one gene may be located quite some ‘linear’ distance from the gene itself.

“The information processing and the intelligence of the genome reside in the regulatory elements,” explains Jim Kent, director of the University of California, Santa Cruz Genome Browser project and head of the Encode Data Coordination Center. “With this project, we probably went from understanding less than 5% to now around 75% of them.”
The ENCODE results also identified SNPs within regulatory regions that are associated with a range of diseases, providing new insights into the roles that

noncoding DNA plays in disease development.

“As much as nine out of 10 times, disease-linked genetic variants are not in protein-coding regions,” comments Mike Pazin, Encode program director at the National Human Genome Research Institute. “Far from being junk DNA, this regulatory DNA clearly makes important contributions to human disease.”

Other Related Articles on this Open Access Online Scientific Journal, include the following:

5 Responses

Many thanks for ths comprehensive review of the milestones along the way of Cracking the Human Code of Life. It is very creatively laid out and very thoroughly examined for the significance each milestone has contributed to the discovery process. It reads like fine poetry, one can reread it and each time discover something new hidden in this text.

Thank you,for Part IIA.

We, the Readers, are impatiently waiting for Part IIB, Part IIC and, yes, Part III.

ONLY, then I will go back to reread Part I and look for the American Chemical Heritage documentary on the pre double helix discovery milieu in the UK.